BAYESIAN DATA ANALYSIS

Spring 2006


NEWS

4/27/06: Thanks for a great semester!

4/27/06: Here are some links for last night's class:

slides used:
http://www.gatsby.ucl.ac.uk/~zoubin/talks/uai05tutorial-b.pdf
http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Gaussian processes:
http://www.gaussianprocess.org/

Dirichlet processes:
http://aluminum.cse.buffalo.edu:8079/npbayes/nipsws05/resources

4/21/06: Here is my R code for Homework 10.

4/21/06: Here is a link to the R software

4/17/06: Reminder: No class on Tuesday April 25th.

4/13/06: The sequential monte carlo homepage has lots of material that relates to tonight's class.

4/11/06: Homework 10 is below.



Homework.

Homework 1
Any two questions from Chapter 2 of Gelman et al.
Due January 26th

Homework 2
Any two questions from Chapter 3 of Gelman et al.
%Due February 2nd

Homework 3
Consider a univariate normal model with mean mu and variance tau. Suppose I use a Beta(2,2) prior for mu (somehow I know mu is between zero and one) and a log-normal(1,10) prior for tau (recall that if a random variable X is log-normal(m,v) then log X is N(m,v) - the textbook has an expression for the log-normal density). I assume a priori that mu and tau are independent. Use a grid-based approximation to compute and plot the contours of the posterior density of mu and tau. Also compute the posterior probability that mu is bigger than 0.5.

Here are the data:

2.3656491  2.4952035  1.0837817  0.7586751  0.8780483  1.2765341
1.4598699  0.1801679 -1.0093589  1.4870201 -0.1193149  0.2578262

Note, this is a lot like the bioassay example we did in class.
Due February 9th

Homework 4
Devise and implement a rejection sampler for a mixture of two beta distributions (e.g., 0.3*beta(5,2)+0.7*beta(2,8)).
Due February 23rd

Homework 5
First, read this influential paper by David Draper. Write a paragraph describing the main message of the paper. Second, use WinBUGS to show that accounting for model uncertainty can sometimes lead to smaller posterior variances. More specifically, consider these "data:"

list(y=c(2.378904,1.501323,0.1017967,-0.5339061,1.307549,2.481263,0.1615935,-2.0631,0.7965437,0.7886678,0.5935791,-0.1616948,-0.3136181,0.2718773,-0.1892056,-1.652616,0.1197718,1.475736,1.228587,1.035448,0.6934467,0.7090875,-4.883614,-2.257672,-0.5472821,-5.732531,0.4991969,-2.191875,0.6320497,1.424922,2.493219,0.7975915,1.366736,-1.597541,-0.4619099,0.9218167,0.2856587,-2.326181,-3.417495,-0.4844028,1.020224,-2.158777,1.639708,-1.319268,-0.7553072,1.129763,1.189073,-1.819093,-0.00982173,0.1378047,1.657811,0.03610128,-0.1615246,-5.18512,0.02565921,-1.256585,2.828195,0.6442164,0.3720463,-0.8553085,1.117819,2.164714,-0.4149535,0.1431652,0.1027453,-0.1122402,-0.851703,0.3240696,-2.048862,0.8833195,-0.952759,0.5364987,-0.91266,-0.937551,0.2982001,0.9885361,3.028904,4.335712,-2.247662,0.6154127,1.217397,-2.168122,0.1326164,1.133149,0.8435582,1.630196,-1.257059,-0.3080003,0.4325004,0.04434237,1.879653,-1.456739,-0.009232163,-0.3858429,0.005481957,0.692194,0.572383,-0.5584705,-1.938
509,-2.332977)) 
Consider a normal model for these data and use conjugate priors. Contrast this with a t-distribution model with the same priors for the mean and variance but now also placing a prior on the degrees of freedom. Which model has the largest posterior variance for the mean?
Due March 9th

Homework 6
Bayesian binary regression with a probit model using BUGS.
Q1. Finney (1947) describes a binary regression problem with two continuous valued predictors and a binary response. Here are the data in BUGS-ready format:

list(n=39,x1=c(3.7,3.5,1.25,0.75,0.8,0.7,0.6,1.1,0.9,0.9,0.8,0.55,0.6,1.4,0.75,2.3,3.2,
0.85,1.7,1.8,0.4,0.95,1.35,1.5,1.6,0.6,1.8,0.95,1.9,1.6,2.7,2.35,1.1,1.1,1.2,0.8,
0.95,0.75,1.3),x2=c(0.825,1.09,2.5,1.5,3.2,3.5,0.75,1.7,0.75,0.45,0.57,2.75,3.0,
2.33,3.75,1.64,1.6,1.415,1.06,1.8,2.0,1.36,1.35,1.36,1.78,1.5,1.5,1.9,0.95,0.4,
0.75,0.03,1.83,2.2,2.0,3.33,1.9,1.9,1.625),y=c(1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,
1,1,1,0,1,0,0,0,0,1,0,1,0,1,0,1,0,0,1,1,1,0,0,1))
The objective here is to build a predictive model that predicts y using x1 and x2. One approach is the so called probit model: Pr(y=1|x1,x2) = g(b0 + b1*x1 + b2*x2) where g is the standard normal cumulative distribution function. Use BUGS to compute posterior distributions for b0, b1, and b2 using diffuse normal priors for each. Please provide your BUGS code as well as the posterior distributions.

Q2. Consider this Bayes net:

Is 6 independent of 35 given 26? Explain.


Due March 21st

Homework 7
The assignment is to build a Bayesian predictive model for these data (in WinBUGS format). The data pertain to 233 branches of a New York bank. For 33 of the branches, I have deleted the value for the variable "Number of New Accounts." Your mission is to estimate those 33 values. You may use any Bayesian statistical approach you like. I assume most of you will use WinBUGS and build a linear regression model but certainly many other possibilities exist. Pay attention to convergence issues - make sure you run the MCMC for enough iterations. Later I will reveal the actual values for the 33 branches and ask you to compute your mean squared error.
Due March 28th

Homework 8
Read this paper. Write a summary, one page or less.
Due April 4th

Homework 9
Here are the homework 6 data in BBR format. This time I want you to fit logistic regression models to these data using the BBR software. The assignment is to find a good model that includes just one predictor variable. Describe and execute a strategy that uses BBR to do this.
Due April 11th

Homework 10
Consider n observations y1, y2, ..., yn from a univariate normal model with known variance. Assume a Gaussian prior for the mean.
(a) derive an algebraic expresion for the marginal likelihood of the data
(b) simulate some data and compute the marginal likelihood
(c) compute the marginal likelihood via Monte Carlo by sampling from the prior
(d) compute the marginal likelihood via Monte Carlo by sampling from the posterior (i.e., the harmonic mean estimator)
(e) for (c) and (d) do the simulations a few times and compare your results with the exact answer from (b)
Due April 18th


Class Topics.
I will post links to materials we use in the class here. This is the 2004 schedule - I'll update it later this week.

DATE TOPICS LINKS
January Introduction to the Bayesian approach Mostly based on Sujit Ghosh's notes
Unrelated Tutorial containing some other slides I used
January Introduction to the Bayesian approach (cont.) PPT
February Multiparameter models
First-cut Bayesian Computation
PPT (mostly based on Kate Cowles' notes and Francesca Dominici's notes)
PPT
February Large-Sample Bayes
Hierarchical Models
Bayes Factors
PPT
PPT
Mostly based on Sujit Ghosh's notes
February Monte Carlo
MCMC
DOC HTML
DOC HTML
March More Monte Carlo Some Gibbs Sampling examples.
March Adaptive Rejection Sampling
Probabilistic Graphical Models
PPT
PPT
March Probabilistic Graphical Models
WinBUGS
PPT
Software Tutorial Material
April WinBUGS continued
MCMC Diagnostics
-
PPT
March Case Study: Locating Users in Wireless Networks PPT
April Sequential Monte Carlo PPT
April Computing the Marginal Likelihood PPT
April Case Study: Text Categorization PDF
April Bayesian Hidden Markov Models PPT bugs
April Bayesian Causal Modeling PPT


Other Bayesian Courses


General Pointers to Bayesian and related Web pages


[]

[]

OLD NEWS

3/28/06: Homework 8 is below.

3/28/06: Here is a spreadsheet containing the actual values of "New Accounts" for the 33 test branches. Please enter your predicted values in the second column and then email me the mean squared error.

3/23/06: I can't find an online version of the classic Cooper and Herskovits learning paper but David Heckerman's tutorial is excellent. This paper from the mid-90's looks at MCMC and both directed and undirected graphical models.

3/22/06: Homework 7 is below.

3/21/06: Here's a link to the Cowles and Carlin paper I mentioned on MCMC diagnostics.

3/12/06: Here are some handwritten notes on Bayesian theory that we will go over at some point

3/9/06: I just posted Homework 6.

3/1/05: Here is the pascal website I mentioned in class yesterday.

2/25/05: No class this Thursday (March 2nd) or on the following Tuesday (March 7th). We'll have an extra long class on March 9th.

2/25/05: Homework 5 is posted below.

2/20/06: No class this Tuesday. We'll have an extra long class this Thursday.

2/20/06: More SAS code from Jue Wang. Here's the MCMC version of the bioassay example. Here's the two-dimensional Gaussian example.

2/18/06: Here is Jue Wang's SAS code for the simple metropolis example.

2/16/06: Here's the MCMC version of the bioassay example.

2/14/06: Here's the simple Metropolis example for R that I went over today in class. Here's a two-dimensional example based on Francesca Dominici's code.

2/14/06: I just posted Homework 4 - its due Feb 23.

2/12/06: We need to arrange some extra classes. Due to trips arranged long before the class schedule, I will be away on February 21, March 2, March 7, and April 25. Lets talk about this in class.

2/11/06: Jue Wang's SAS version of the beta rejection sampler.

2/8/06: Here's a mildly cleaned up version of last night's rejection sampling example. Note it returns a random number of draws. It would be better if it returned n draws.

myrbeta<-function(n,a,b) {
  c <- dbeta((a-1)/((a-1)+(b-1)),a,b) # density value at the mode
  u <- runif(n);  # random numbers
  g <- runif(n);  # candidate draws
b  acceptYN <- (u <= (dbeta(g,a,b)/c)); # f/cg
  return(g[acceptYN]);
}


par(mfrow=c(2,1))
plot(density(rbeta(10000,3,2)))
plot(density(myrbeta(10000,3,2)))

2/5/06: Here's a simplified version of the bioassay R code that omits the sampling step. Note the contours relect the height of the density. That is, the 90% contour (for instance) is where the density is at 90% of its maximum value - that's why you need to divide by the max value.

2/5/06: Here's SAS code for the bioassay grid approximation (complements of Jue Wang)

2/5/06: Here's an interesting article on informative priors for the binomial

2/5/06: Here's the WinBUGS hospital example that I was playing with in class.

2/2/06: I posted Homework 3 below.

2/2/06: No class on February 21 because I will be out of town (giving a talk at U. Chicago).

1/28/06: Homework 2 is any two questions from Chapter 3 of the textbook.

1/27/06: Here is the R code for last night's bioassay example. This is a modified version of Francesca Dominici's program.

1/25/06 ERROR: in class last night I said "dnorm" in R was the distribution function - not correct! Its "pnorm." Thanks to Jane Z. for pointing this out.

1/24/06 The textbook should now be on reserve in the Math library

1/20/06 Last night somebody asked if the Jeffreys' prior always leads to a proper posterior. The answer is no. Here's a paper by Berger et al. with an counterexample.

1/19/06 Please send me email so I can put together a class mailing list.

1/15/06 Just getting things started here

4/30/04 Here's a paper comparing different estimators of the marginal likelihood. The harmonic mean estimator doesn't work very well although its not as bad as the example in Homework 10.

4/17/04 I posted Homework 10.

4/9/04 I posted Homework 9. If you need extra time, that's fine.

4/3/04 Extra Class!. Thursday 4/8 at 10am we'll have an extra class. I'll just do a review session. I'll go over WinBUGS again and show an example of calling BUGS from another program. I'll also review any other material that you suggest.

4/3/04 I posted Homework 8 below - its another BUGS exercise. Again, if you need extra time that's fine.

3/27/04 I posted Homework 7 below - its a WinBUGS exercise. I did not spend as much time as I would have liked on WinBUGS in class on Friday. If you are finding the software confusing, you can hand in the homework a few days late - I will spend quite a bit of time on WinBUGS this coming Friday.

3/4/04 Jacek Rawicki very kindly typed up the Monte Carlo and MCMC notes.

2/20/04 No class on 2/27/04. We'll arrange a makeup class later.

2/20/04: Here's the simple Metropolis example for R that I went over today in class. Here's a two-dimensional example based on Francesca Dominici's code. I working on scanning the lecture notes.

2/16/04: Idea: some students in the class would like to do a project. Lets have an optional project in place of *4* homework assignments. I expect there will be about 12 homework assignments. So, you can either do 12 homework assignments, or, 8 homework assignments plus a project. If you intend to do a project please consult me in advance.

2/12/04: I posted homework 3 and some new lecture notes.

2/7/04: I have heard about some Bayesian internship opportunities this summer in California. Let me know if you are interested.

2/6/04: Here is the R code for today's bioassay example. This is a modified version of Francesca Dominici's program.

2/3/04: JAGS is an open source alternative to BUGS. If you are interested in using JAGS let me know - one of the students in the class has managed to get it compiled.

2/2/04: Here are the football data and the corresponding R code I used in class on Friday.

1/29/04: Here is the R code I used yesterday for showing the beta priors and posteriors.