For any of the sampling schemes outlined above, it should be remembered that although the target distribution is the invariant distribution, and that the sequence generated by the algorithms will tend in distribution to the invariant distribution, issues of rate of convergence will be important.
Specifically there are two main important considerations:
The first question refers to the fact that is just some
(operator chosen), possible value for X and is unlikely to come
from the target distribution. Indeed, it may be some time before
is from
, (call this time J), only after which time
the samples may be used. This time is called burn-in. The chain is
said to have converged after time J.
While it is possible to determine burn-in exactly in principle, certainly for a limited number of cases [35], analytical methods of determining J are tedious if not wholly impractical. Even in such cases the question then arises as to whether one should use the outputs of multiple chains or a single long chain [36].
For practical applications, time series plots of the chain can give an idea of J. In the literature, a review of a number of diagnostic tools is provided in [11] and [7] to assess convergence.
The second question is as to how many samples should be taken. This depends on what the samples are being used for, that is, what is being estimated, and how accurate the estimator needs to be. Of course, N depends on J also, since only N-J samples come from the target distribution.
Again, diagnostics exist for determining how many samples are needed. A comparison of estimates based on two different chains started at different points is one method of checking the variance of the estimators used.
The choice of the proposal distribution is fundamental to the rate
of convergence. Common choices for the proposal density include
the normal, centred on , choice of variance to be decided;
uniform, centred on
; normal centred on
; uniform
centred on
. In the case of the last two of these, the
proposal,
is independent of
, and hence they are
known as independence samplers [54].
As well as the question of when the chain has converged, of interest is the rate of mixing of the chain. Mixing is the speed at which the chain explores the target distribution. If the chain mixes slowly, then it requires very many samples to explore the whole support of the target. In the case of the first proposal mentioned, mixing depends upon the variance. The acceptance rate is the number of times a move is made divided by the total number of steps in the chain. If the acceptance rate is too high, this indicates that the chain does not have the opportunity to sample from the tails of the distribution. If the acceptance rate is too low, this indicates that the chain is too stationary, and thus does not move around much. Both these cases would indicate insufficient mixing. Experience has shown that an optimum acceptance rate is between 0.25 and 0.5 for the case of normal target and proposals, with lower rates acceptable for higher dimensions [9].