email me

Coupling constructions for Markov chains are closely related to the study of convergence to the stationary distribution and so called mixing times. On this page, I shall discuss briefly a universal coupling method which applies to any Markov chain whose transition probabilities are known. This includes all Metropolis-Hastings samplers used in practice. An accompanying paper may be found here (ps/pdf).

Let X_t and Y_t be two Markov chains with the same transition density p(x,y). A common way of simulating such chains is by the rules X_{t+1} = F_t(X_t) and Y_{t+1} = F_t(Y_t), where F_t refers to some (non unique) computer program which implements one simulation step (iteration) at time t. It is exceedingly unlikely that, for a typical F, we can get X_t = Y_t, at any time t. Why should we look for ways of simulating chains which can couple in a finite time? Theory tells us that if we wait until two suitably chosen chains meet, then we have waited long enough for proper mixing. More precisely, the coupling inequality says that, if the Y process is stationary, then

\max_A | \text{Prob}(X_t \in A) - \pi(A) | \leq \text{Prob}(T > t)

where T is the first time that X_t = Y_t occurs, and \pi is the stationary distribution. In effect, the two chains have lost all dependence upon their initial positions, which is useful for Markov chain Monte Carlo. Moreover, knowing how to make Markov chains come together is a fundamental ingredient of perfect simulation. One easy way to construct coalescing Markov chains consists in modifying an existing transition rule F in such a way that nearby chains have a chance of coming together in one step, the transition density p(x,y) is preserved.

Such a rule is described in the accompagnying paper (ps/pdf), but can also be described briefly in the following words. At each time t, we select at random a point Z in the state space, called a catalyst. Next, rather than update X_t = F_t(X_t) (and similarly for Y), we let X_t = G_t(X_t) (and similarly Y_t = G_t(Y_t)), where G_t(x) = Z with a suitably chosen probability \alpha(x,Z) and G_t(x) = F_t(x) otherwise. The right choice for a means that we do not destroy the transition probabilities, and if both X and Y simultaneously accept the move to Z then they must meet. The coupling method outlined above does not just work for two chains, but can be applied to as many processes as desired simultaneously. Moreover, as many catalysts as desired can be introduced, since the transition density is preserved every time. The applet below illustrates this in the case when the underlying chains are Metropolized random walks with Gaussian increments (as seen in M-H algorithms) and the catalysts are chosen uniformly in the unit square. To distinguish the catalysts from the underlying chains, the former are surrounded by circles. You can choose the number of catalysts as well as the number of Markov chains to couple, together with the corresponding jump sizes denoted s. If you select zero catalysts, you see that the chains come closer together but do not meet eventually (except due to numerical errors). If you click once inside the image, the simulation is paused. Clicking again resumes it.

To view this applet, please install the Java Plugin.
Number of chains
chain \sigma
Number of catalysts

An interesting question to ponder here is the following: what is the optimal number of catalysts to ensure that all test chains coalesce in as few steps as possible? Theory tells us that it doesn't matter how many catalysts we use, since in each case we get coalescence in a finite time. However, a bit of experimentation might convince you that, firstly, two catalysts are better than one, and secondly, a great many catalysts just makes things worse. This last statement can be backed up by theory, see