www.cmm.bristol.ac.uk/forum

Posted: **Sun Jun 24, 2012 9:08 pm**

Hi everyone,
I am currently working on a research project for which I am running 3-level logit models (households are nested within municipality which are nested within states) to predict out-migration in relation to a environmental variable (measured at the state-level). I found a pretty interesting quadratic association. I used the following model (abbreviated version is displayed and the real model contains a lot more controls).

Code: Select all

runmlwin mig cons ageh eduh env env2 , ///
  level3(state: cons) ///
  level2(muni: cons) ///
  level1(hhID: ) ///
  discrete(distribution(binomial) link(logit) denominator(cons)) batch

In this equation env is the continuous environmental variable of interest and env2 is the squared term (env x env). Both regression coefficients are highly significant (b env = -.481, p=.002; b env2 = -.317, p<.001) suggesting a concave association. I would now like to display this association by means of a graph. I used the following equation to obtain predicted values (I changed the name of the environmental measures from env to lraind and env2 to lraind2).

Code: Select all

gen yhat = (_b[cons]*cons) + (_b[lraind]*lraind) + (_b[lraind2]*lraind2)

I then transformed the yhat values so that the y-axis reflects predicted probabilities instead of the meaningless log odds scale…

Code: Select all

replace yhat=((exp(yhat))/(1+(exp(yhat))))

And finally, I have plotted this association using a simplistic scatter plot.

Code: Select all

twoway (scatter yhat lraind)

However, this a rather crude way of displaying the association and I am not sure whether the use of the constant (_b[cons]*cons) term in my equation for yhat makes sense. I would rather like to use STATA’s margins command to obtain predicted probabilities. However, it appears that it is not possible to use this post-estimation command after estimating a logit model using runmlwin. Or am I wrong? Has anyone used STATA’s margins command in combination with runmlwin? Or is there another way to correctly calculate and display predicted probabilities (holding all other variables at the mean)?
Thanks so much for your help!

Best,
Raphael

Posted: **Mon Jun 25, 2012 5:13 pm**

Hi Raphael,

Thanks for your post. I'm afraid that the margins postestimation command does not currently work after runmlwin. What you have done looks correct and is how I would have done this. I have given some comments below...

When you write

Code: Select all

runmlwin mig cons ageh eduh env env2 , ///
  level3(state: cons) ///
  level2(muni: cons) ///
  level1(hhID: ) ///
  discrete(distribution(binomial) link(logit) denominator(cons)) batch

I am sure you know this, but for the benefit of other readers of this post, remember to fit any final discrete response models by PQL2 or ideally by MCMC as MQL1, the default estimation method for discrete response models, underestimates the model parameters, particularly the random part parameters. In data with high degree of clustering such as longitudinal data these biases can be severe. In data with little clustering these biases can be very small, but you only know for certain by checking and fitting the model also by PQL2 and ideally by MCMC.

When you write

Code: Select all

gen yhat = (_b[cons]*cons) + (_b[lraind]*lraind) + (_b[lraind2]*lraind2)

You are predicting the probability of out migration as a function of lraind2, holding all other covariates at zero. If you want to do this, it is probably best to centre all your covariates around their grand means so that holding all other covariates at zero implies making predictions for a typical individual. (Centring the covariates affects the magnitude of the intercept and therefore your predictions)

When you write

Code: Select all

replace yhat=((exp(yhat))/(1+(exp(yhat))))

you could have equally made use of Stata's invlogit() function.

Your use of the constant (_b[cons]*cons) term in your equation for yhat does makes sense. But remember how you centre your covariates affects the estimate of the intercept and therefore its interpretation (see above).

Best wishes

George

Posted: **Tue Jun 26, 2012 9:12 pm**

Hi George,
Thank you so much for this helpful comment! I learned a lot!
Have a nice day!

Best,
Raphael

Posted: **Wed Jun 27, 2012 8:02 am**

Hi Raphael,

Another way to do predictions is to

(1) Fit the model
(2) Add some extra observations to the end of your dataset which have the desired covariate values
(3) Use the -predict- command to make fixed part predictions for these out-of-sample observations

Instead of (2) you could simply recode the covariate values of the observations included in your estimation sample and then proceed to (3). You could for example replace the values of a covariate x by its mean values by simply typing

Code: Select all

runmlwin ...
sum x
replace x = r(mean)
predict yhat

This approach avoids you having to explicitly reference the parameter estimates, but it only allows predictions of the fixed part of the model. You would still have to manually add on the random effects and then do the invlogit() transformation to get predicted probabilities.

Hope that helps

George

Posted: **Wed Jun 27, 2012 2:37 pm**

Wow, I never thought about the possibility of obtaining predictions in this nifty way! Thanks so much for sharing these insights! Have a great day!

Best,
Raphael

www.cmm.bristol.ac.uk/forum

Predicted probabilities after multilevel logit model

Predicted probabilities after multilevel logit model

Re: Predicted probabilities after multilevel logit model

Re: Predicted probabilities

Re: Predicted probabilities after multilevel logit model

Re: Predicted probabilities after multilevel logit model