Markov Chain Monte Carlo

 

Generate a sequence of random variables  where at each time  the next state  is sampled from

 

 

* is called the transition kernel of the chain. Subject to regularity condition, the chain “forgets” its starting position. That is,  will converge to a stationary distribution that does not depend on t or . Denote the stationary distribution by .

After some “burn-in”, points  will be dependent samples approximately from .

 

We can use output from the Markov Chain to estimate  where :

 

(convergence is ensured by the “ergodic theorem”)

 


Metropolis-Hastings Algorithm

 

At each time t,  is chosen as follows:

(1)   Sample a candidate point Y from a proposal distribution

(2)   Accept Y with probability:

 

 

(3)   If accepted, , else .

 

Why does this work?

 

The transition kernel for the M-H sampler is:

 


Note:

 „detailed balance”

 

So, if , then .

 

Note: q must result in an irreducible and aperiodic process

 


Canonical forms for

 

Metropolis Algorithm

 

If  then M-H becomes:

 

 

Example:

 where  is constant. Note that  needs careful choice…

 

Independence Sampler

 

If

 

 

(works best if is a good approximation to )


Single Component M-H

 

We can divide into components and update these components one by one.

 

Gibbs Sampling (special case of single component M-H)

 

 conditional distribution of the i-th component given all the others.

 

Example:

 

Consider a study of the relationship between smoking and lung cancer and suppose we want to make inference about the odds ratio:

 

 

odds ratio

 

Suppose you collect data, Y, on a large number, n, of subjects:

 

 

To make inference about ­ (and subsequently ), compute:

 

, where ­– likelihood,  – prior

 

If the data are complete, e.g.  , likelihood is:

 

          

 

Let denote the missing data. Have:

 

 

If there are  missing entries, the summation has  terms! Resort to memo.

 

We want to draw from:

Instead sample from:

By Gibbs sampling alternately from:

(I)   and   (II)

 

(I)                 is easy because conditional on  the data points are independent e.g. fill the “?” in column three by sampling from  with probability

(II)              is easy if the prior has a convenient form e.g. with a Dirichlet prior,  is also Dirichlet and easy to sample from.