Informative priors

 Posts: 28
 Joined: Mon Apr 02, 2012 3:26 pm
Informative priors
Hello. I would be interested to read others' thoughts on this matter. I am fitting a multilevel cross classified model using Mcmc estimation in mlwin. I've found that the various noninformative priors found do not work well, so I am leaning towards informative priors. I probably will only be using a random sample of my original dataset for computational reasons, and wondered about the appropriateness of using the unused data to construct informative priors (ignoring the crossclassification or just using the noncrossclassified observations). Is that a bit pointless or could I just construct informative priors using results obtained from IGLS estimation of the original sample? Ultimately I'm not that interested in increasing the precision of my estimates, I simply wish to obtain appropriate priors so that when I rerun the model to take account of the cross classification I'm working in the correct probability space. As a nonBayesian I'd be interested in anyone's experiences on this, thanks.
Re: Informative priors
Hi Rdmcdowell,
When you say do not work well do you mean simply that the chains mix poorly? My only experience here really is in my paper Browne et al. (2007) in Statistical Modelling  https://scholar.google.com/citations?vi ... h67rFs4hoC
This is a slightly more complex MV response multivariate model and MLwiN will use inverse Wishart priors for the variance matrices where I force the prior estimates to reflect the variability in each response. I then did a prior sensitivity by splitting the data in two parts  temporally based i.e. one was the earlier data, the other the later data. I used the estimates from the earlier data in the prior for the later to show that the estimates were not too sensitive to the prior.
Does that help?
Bill.
When you say do not work well do you mean simply that the chains mix poorly? My only experience here really is in my paper Browne et al. (2007) in Statistical Modelling  https://scholar.google.com/citations?vi ... h67rFs4hoC
This is a slightly more complex MV response multivariate model and MLwiN will use inverse Wishart priors for the variance matrices where I force the prior estimates to reflect the variability in each response. I then did a prior sensitivity by splitting the data in two parts  temporally based i.e. one was the earlier data, the other the later data. I used the estimates from the earlier data in the prior for the later to show that the estimates were not too sensitive to the prior.
Does that help?
Bill.

 Posts: 28
 Joined: Mon Apr 02, 2012 3:26 pm
Re: Informative priors
Hello Bill
Thanks for your response and for that helpful citation. What I mean that the estimates I am obtaining using MCMC estimation in no way correspond to those obtained using IGLS. Consider a simple example of a logistic model with a random intercept only. Using IGLS estimation I may get an estimate of 1.2 for the intercept, with variance 4.5. If I use these as starting values for MCMC estimation, regardless of the default diffuse priors used or how long I allow the chains to run (eg 500,000) to satisfy the MCMC diagnostic criteria, the estimates I obtain never remotely come close e.g. an intercept of 10.00 with variance 200. I've tried the usual strategies to help with the mixing but these do not resolve the estimation problem. Hence my query as to whether I should really be looking at informative priors. I wondered about creating two random samples from the dataset, one which would be used to create informative priors which could then be used for the MCMC estimation with the other half of the data. The data is longitudinal, so an earlier/later distinction wouldn't really be useful. I was also wondering about the appropriateness of using the IGLS estimates to create informative priors (such as Winbugs Priors Section 6.3 in the MCMC mlwin manual), or using the winbugs code generated from Mlwin to try alternative noninformative priors.
I would add I am asking colleagues about this, it's just good to hear of the experiences of others who have encountered similar problems!
Thanks for your response and for that helpful citation. What I mean that the estimates I am obtaining using MCMC estimation in no way correspond to those obtained using IGLS. Consider a simple example of a logistic model with a random intercept only. Using IGLS estimation I may get an estimate of 1.2 for the intercept, with variance 4.5. If I use these as starting values for MCMC estimation, regardless of the default diffuse priors used or how long I allow the chains to run (eg 500,000) to satisfy the MCMC diagnostic criteria, the estimates I obtain never remotely come close e.g. an intercept of 10.00 with variance 200. I've tried the usual strategies to help with the mixing but these do not resolve the estimation problem. Hence my query as to whether I should really be looking at informative priors. I wondered about creating two random samples from the dataset, one which would be used to create informative priors which could then be used for the MCMC estimation with the other half of the data. The data is longitudinal, so an earlier/later distinction wouldn't really be useful. I was also wondering about the appropriateness of using the IGLS estimates to create informative priors (such as Winbugs Priors Section 6.3 in the MCMC mlwin manual), or using the winbugs code generated from Mlwin to try alternative noninformative priors.
I would add I am asking colleagues about this, it's just good to hear of the experiences of others who have encountered similar problems!
Re: Informative priors
Hi Rdmcdowell,
Thanks for the clarification. Good to see people using my MCMC book The first thing to note is that a variance of 4.5 is crazily big in a logistic regression model (particularly if you are getting this from the default 1st MQL) and would usually indicate that most clusters are all 1s or all 0s. It's well known that most estimation methods struggle in such scenarios and I looked a long time ago at differences between estimation methods (see Browne and Draper 2000  https://scholar.google.co.uk/citations? ... CSPbOGe4C )
and in particular the analysis there of the RodriguezGoldman datasets)
There we showed how the IGLS methods give biased (low) estimates and they were not as extreme as your example. I'd personally step back and dig into your data some more and look at your clusters and see if you are really getting mostly clusters of all 1s or all 0s in which case you might be better off collapsing to the cluster level.
Best wishes,
Bill.
Thanks for the clarification. Good to see people using my MCMC book The first thing to note is that a variance of 4.5 is crazily big in a logistic regression model (particularly if you are getting this from the default 1st MQL) and would usually indicate that most clusters are all 1s or all 0s. It's well known that most estimation methods struggle in such scenarios and I looked a long time ago at differences between estimation methods (see Browne and Draper 2000  https://scholar.google.co.uk/citations? ... CSPbOGe4C )
and in particular the analysis there of the RodriguezGoldman datasets)
There we showed how the IGLS methods give biased (low) estimates and they were not as extreme as your example. I'd personally step back and dig into your data some more and look at your clusters and see if you are really getting mostly clusters of all 1s or all 0s in which case you might be better off collapsing to the cluster level.
Best wishes,
Bill.

 Posts: 28
 Joined: Mon Apr 02, 2012 3:26 pm
Re: Informative priors
Hi Bill
Yes, all the materials available for Mlwin/Runmlwin are excellent and extremely informative. You are absolutely correct when you say that most clusters are either 0 or 1. I was able to run the analyses on collapsed (2 level) data without any problems in estimation due to the large size of the dataset, and although there is no problem getting appropriate estimates out from the 3 level data using IGLS, it is running the 3 level models using MCMC estimation where the difficulties come in. It appears then that this may not be surprising and that the use of diffuse priors may not be productive given the instability associated with fitting these types of models to this type of data. Many thanks for your guidance and helpful references.
Ron
Yes, all the materials available for Mlwin/Runmlwin are excellent and extremely informative. You are absolutely correct when you say that most clusters are either 0 or 1. I was able to run the analyses on collapsed (2 level) data without any problems in estimation due to the large size of the dataset, and although there is no problem getting appropriate estimates out from the 3 level data using IGLS, it is running the 3 level models using MCMC estimation where the difficulties come in. It appears then that this may not be surprising and that the use of diffuse priors may not be productive given the instability associated with fitting these types of models to this type of data. Many thanks for your guidance and helpful references.
Ron
Re: Informative priors
No worries Ron,
Glad to be of help  basically the larger the clustering variance the harder it is to justify fitting a multilevel model rather than simply collapsing the data by aggregating up a level.
Best wishes,
Bill.
Glad to be of help  basically the larger the clustering variance the harder it is to justify fitting a multilevel model rather than simply collapsing the data by aggregating up a level.
Best wishes,
Bill.