next up previous contents
Next: Nature of Multidimensional Posterior Up: Statistical Methodology Previous: Gibbs Sampling

Issues of Convergence

For any of the sampling schemes outlined above, it should be remembered that although the target distribution is the invariant distribution, and that the sequence generated by the algorithms will tend in distribution to the invariant distribution, issues of rate of convergence will be important.

Specifically there are two main important considerations:

  1. When will the samples be independent of the initial value, tex2html_wrap_inline2487 ?
  2. What number of samples, N are needed?

The first question refers to the fact that tex2html_wrap_inline2487 is just some (operator chosen), possible value for X and is unlikely to come from the target distribution. Indeed, it may be some time before tex2html_wrap_inline2511 is from tex2html_wrap_inline2431 , (call this time J), only after which time the samples may be used. This time is called burn-in. The chain is said to have converged after time J.

While it is possible to determine burn-in exactly in principle, certainly for a limited number of cases [35], analytical methods of determining J are tedious if not wholly impractical. Even in such cases the question then arises as to whether one should use the outputs of multiple chains or a single long chain [36].

For practical applications, time series plots of the chain can give an idea of J. In the literature, a review of a number of diagnostic tools is provided in [11] and [7] to assess convergence.

The second question is as to how many samples should be taken. This depends on what the samples are being used for, that is, what is being estimated, and how accurate the estimator needs to be. Of course, N depends on J also, since only N-J samples come from the target distribution.

Again, diagnostics exist for determining how many samples are needed. A comparison of estimates based on two different chains started at different points is one method of checking the variance of the estimators used.

The choice of the proposal distribution is fundamental to the rate of convergence. Common choices for the proposal density include the normal, centred on tex2html_wrap_inline2571 , choice of variance to be decided; uniform, centred on tex2html_wrap_inline2571 ; normal centred on tex2html_wrap_inline2487 ; uniform centred on tex2html_wrap_inline2487 . In the case of the last two of these, the proposal, tex2html_wrap_inline2579 is independent of tex2html_wrap_inline2291 , and hence they are known as independence samplers [54].

As well as the question of when the chain has converged, of interest is the rate of mixing of the chain. Mixing is the speed at which the chain explores the target distribution. If the chain mixes slowly, then it requires very many samples to explore the whole support of the target. In the case of the first proposal mentioned, mixing depends upon the variance. The acceptance rate is the number of times a move is made divided by the total number of steps in the chain. If the acceptance rate is too high, this indicates that the chain does not have the opportunity to sample from the tails of the distribution. If the acceptance rate is too low, this indicates that the chain is too stationary, and thus does not move around much. Both these cases would indicate insufficient mixing. Experience has shown that an optimum acceptance rate is between 0.25 and 0.5 for the case of normal target and proposals, with lower rates acceptable for higher dimensions [9].


next up previous contents
Next: Nature of Multidimensional Posterior Up: Statistical Methodology Previous: Gibbs Sampling

Cathal Walsh
Sat Jan 22 17:09:53 GMT 2000