query on calculation of probability predictions

Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to runmlwin: Running MLwiN from within Stata >> http://www.bristol.ac.uk/cmm/software/runmlwin/
Post Reply
BernieBaffour
Posts: 2
Joined: Mon Aug 01, 2016 7:30 pm

query on calculation of probability predictions

Post by BernieBaffour »

Hi,
We are running a three-level multiple-membership (logit) model, calling MLwiN 2.32 from within Stata 14 using runmlwin.
In our panel dataset (the HILDA Survey from Australia), the hierarchy is given by person-year observations, nested within individuals, nested within interviewers.
The multiple membership applies because survey respondents can be interviewed by different interviewers over time.

In Stata syntax, the model looks something like this:
runmlwin Y cons Xs, level3(interviewer_ID: cons) level2(xwaveid: cons) level1(obs_id: ) nopause or discrete(distribution(binomial) link(logit) denom(cons))

runmlwin does a good job in estimating this, but we are having issues retrieving predicted probabilities at different values of the X variables –which journal reviewers have requested.
This is because Stata’s predict and margins routines do not seem to work (or work well) for models estimated using runmlwin, unless we are missing something?

The questions we have are:
1. Is there a way to get Stata (or MLwiN from within Stata) to come up with predicted probabilities for the above model without having to calculate these ‘manually’?
2. When we try to estimate the predicted probabilities manually, can you advice on how to best deal with the random effects at the different levels?
3. Are we correct to assume that using only the estimates in the fixed part of the model would be equivalent to setting the random effects at the individual and interviewer level at zero? Is that a good solution?
4. Importantly, when we try to estimate the predicted probabilities manually based on the fixed part of the model the resulting probabilities are very small -about 10-20 times smaller than the unconditional means. Do you have a hunch about why this may occur?

Thanks for your help!
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: query on calculation of probability predictions

Post by GeorgeLeckie »

Hi Bernie,

The data sound cross-classified rather than multiple membership. Each response score belongs to one respondent and one interviewer rather than multiple individuals or multiple interviewers.

In terms of your runmlwin syntax, I wasn't clear to what the xwaveid identifer referred to. You also did not post the full runmlwin syntax for the model so I could not see what you might have specified in terms of the multiple membership identifers and weights. However, like I said, I think you want to fit a cross-classified model to these data.

In terms of your query regarding predicted probabilities, yes there is little automatic functionality here and you have to create these predicted probabilities by first principles. So in terms of your queries

1. No

2. Depends on whether you want cluster-specific or population-averaged inference. If you want to make predictions for a specific respondent or interviewer then you need to plug in their predicted random effect values (i.e., cluster-specific inference). If you want to calculate the prediction averaging over the respondent and interviewer effects then you need to probably go down a simulation-based route (i.e., population average inference)

3. Yes you are correct. Only including the fixed-part of the model in the prediction is equivalent to setting all the random effects to zero. Whether this is good thing to do depends on what type of inference you want. I suspect you want population-average inference unless you are trying to illustrate the heterogeneity in the predicted probability across clusters. Setting the random effects to zero will give you something similar if you have a low degree of clustering, especially if the probability is close to 0.5, But if you don't have this then setting the random effects to zero will typically give much more extreme predictions (further away from 0.5) than the population average.

4. If you want to do this then you need to calculate population-averaged probabilities. See response to 3.

In sum, it sounds like you want to recover population averaged inferences from a cluster-specific model. Suggest you read "Module 7 - Multilevel models for binary responses" of our LEMMA course. The runmlwin commands to replciate the illustrations there are include at http://www.bristol.ac.uk/cmm/software/r ... /examples/. You will have to adapt this.

I hope that helps

George
BernieBaffour
Posts: 2
Joined: Mon Aug 01, 2016 7:30 pm

Re: query on calculation of probability predictions

Post by BernieBaffour »

Hi George,
Thanks for this helpful feedback.

The code for the actual cross-classified m odel is:


runmlwin Y cons Xs, level3(interviewer_ID: cons) level2(xwaveid: cons) level1(obs_id: ) mcmc(cc) nopause or discrete(distribution(binomial) link(logit) denom(cons))

‘xwaveid’ is the cross-wave person identifier in the panel data, i.e. the Level 2 identifier.

About the cross-classification vs multiple-membership, our reason to refer to this as multiple membership was that each individual could be interviewed by different interviewers on different occasions. But we are not sure whether this would constitute multiple membership or cross classification. Happy to receive your direction on this?

Cheers,
Bernie
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: query on calculation of probability predictions

Post by GeorgeLeckie »

Hi Bernie,

Thanks for providing the full runmlwin syntax. It looks correct for specifying a cross-classified model.

Each response measurement belongs to only one interviewer so do not refer to the data structure or the above model as multiple membership as this implies that each response measurement belongs to multiple interviewers. The data structure and model are cross-classified.

I hope that helps

Best wishes

George
Post Reply