3 level multilevel data, multiple imputation, stata, runmlwi

Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to runmlwin: Running MLwiN from within Stata >> http://www.bristol.ac.uk/cmm/software/runmlwin/
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

Hi Chris,

Thank you very much for the advice! The problem is solved! But when I add random predictors at level2 (see syntax below). MLwiN gives me another error message:error while obeying batch file C:\Users\SHUANG~1\AppData\Local\Temp\ST_0000000a.tmp at line number 291: MCMC 0 500 1 5.8 50 10 1 1 1 1 1 1

Code: Select all

sort SCHID PID TID
runmlwin POST cons PRE DEXP2 DEXP3, level3(SCHID: cons) level2(PID: cons DEXP2 DEXP3) level1(TID: cons) nopause mlwinpath(C:\mlwin.exe) 
estimates store m1igls
mi est,cmdok:runmlwin POST cons PRE DEXP2 DEXP3, level3(SCHID: cons) level2(PID: cons DEXP2 DEXP3) level1(TID: cons) mcmc(cc) initsmodel(m1igls) nopause mlwinpath(C:\mlwin.exe)
ChrisCharlton
Posts: 1353
Joined: Mon Oct 19, 2009 10:34 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by ChrisCharlton »

Without seeing your data/model I can't tell whether this is the case, but I suspect that the problem is due to the starting values you provide. If I run the code below to fit a similar model using an MLwiN example dataset I get the same error:

Code: Select all

* Load the data
use "http://www.bristol.ac.uk/cmm/media/runmlwin/xc1.dta", clear

* Set random number seed
set seed 135123

* Make 20% of attain values missing
replace attain = . if runiform()<=0.2

* Make 20% of vrq values missing
replace vrq = . if runiform()<=0.2

* Fit model to complete cases to obtain starting values for MCMC model
quietly runmlwin attain cons vrq sex, ///
	level3(sid: cons) ///
	level2(pid: cons vrq) ///
	level1(pupil: cons) ///
	nopause

* Store model results
estimates store m2igls	
	
* Declare the way the additional imputed data will be stored
mi set mlong

* Specify variables which have missing values
mi register imputed attain vrq

* Create five imputed datasets using a single-level imputation model
mi impute mvn attain vrq = sex, add(5)

* Fit model of interest to imputed datasets and combine the model results
mi estimate, cmdok: runmlwin attain cons vrq sex, ///
	level3(sid: cons) ///
	level2(pid: cons vrq) ///
	level1(pupil: cons) ///
	mcmc(cc) initsmodel(m2igls) ///
	nopause
Looking at the starting values the random part estimates for the two higher classifications are zero:

Code: Select all

. estimates replay m2igls

---------------------------------------------------------------------------------------------------------------------------------------------------
Model m2igls
---------------------------------------------------------------------------------------------------------------------------------------------------
 
MLwiN 2.31 multilevel model                     Number of obs      =      2183
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
            sid |       19         59      114.9        173
            pid |      255          1        8.6         48
-----------------------------------------------------------

Run time (seconds)   =       1.26
Number of iterations =          2
Log likelihood       = -4733.9307
Deviance             =  9467.8613
------------------------------------------------------------------------------
      attain |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -10.46568   .3367917   -31.07   0.000    -11.12578    -9.80558
         vrq |   .1650179   .0034255    48.17   0.000     .1583039    .1717318
         sex |   .0422905   .0911367     0.46   0.643    -.1363341    .2209152
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 3: sid                 |
                   var(cons) |          0          0             0           0
-----------------------------+------------------------------------------------
Level 2: pid                 |
                   var(cons) |          0          0             0           0
               cov(cons,vrq) |          0          0             0           0
                    var(vrq) |          0          0             0           0
-----------------------------+------------------------------------------------
Level 1: pupil               |
                   var(cons) |    4.47815    .135546      4.212485    4.743815
------------------------------------------------------------------------------
This is probably due to the model being treated as hierarchical by IGLS.

MCMC is not able to proceed with these starting values (in particular the level-2 covariance matrix is not positive-definite) and therefore halts. If I change the starting values to be more sensible then the model is able to be estimated.
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

Yes, the starting values of the random part estimates for the second level of my model are zero as well. Do you have any suggestions on what starting values I should use? Right now it's the estimates from a naive 3-level model.
Also, regarding the output of the model, it also produces cov(cons\DEXP2) aside from var(cons) and var(DEXP2). How should I interpret cov(cons\DEXP2)? I don't remember seeing this when using xtmixed. Many thanks!
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: 3 level multilevel data, multiple imputation, stata, run

Post by GeorgeLeckie »

Hi,

Fit the two separate naive two-level models by IGLS and use the level-2 random-part parameter estimates (or values roughly in the same order of magnitudes) from these models as starting values for the cross-classified model.

cov(cons\DEXP2) is the covariance between the random-intercept effect and the random-slope effect. Specify covariance(unstructured) when using -mixed- and you will also obtain this paramter.

Best wishes

George
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

Hi,

Sorry for sending a flurry of messages. I tried two things. One is fit two separate naive 2-level models and use their estimates as starting value. I still got the same error message. Below is the output from the two models.

Code: Select all

. runmlwin _1_POST cons _1_PRE _1_DEXP2 _1_DEXP3, level2(PID: cons _1_DEXP2 _1_DEXP3) level1
> (TID: cons) nopause mlwinpath(C:\mlwin.exe)

 MLwiN 2.28 multilevel model                     Number of obs      =      3340
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
            PID |      104          2       32.1        428
-----------------------------------------------------------

Run time (seconds)   =       6.87
Number of iterations =         15
Log likelihood       = -2601.1924
Deviance             =  5202.3848
------------------------------------------------------------------------------
 _1_POST |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |   .2846612   .0275742    10.32   0.000     .2306167    .3387057
 _1_PRE |   .2490859   .0109404    22.77   0.000     .2276432    .2705287
    _1_DEXP2 |   .0012628   .0241918     0.05   0.958    -.0461522    .0486778
    _1_DEXP3 |  -.0075283   .0287033    -0.26   0.793    -.0637857    .0487291
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: PID                 |
                   var(cons) |   .0260758   .0059737      .0143676    .0377841
          cov(cons,_1_DEXP2) |          0          0             0           0
               var(_1_DEXP2) |          0          0             0           0
          cov(cons,_1_DEXP3) |  -.0049737   .0044915     -.0137769    .0038296
      cov(_1_DEXP2,_1_DEXP3) |          0          0             0           0
               var(_1_DEXP3) |   .0032904   .0048498      -.006215    .0127958
-----------------------------+------------------------------------------------
Level 1: TID                 |
                   var(cons) |   .2682021   .0067021      .2550662     .281338
------------------------------------------------------------------------------

. runmlwin _1_POST cons _1_PRE _1_DEXP2 _1_DEXP3, level2(SCHID: cons)level1(TID: cons) nopau
> se mlwinpath(C:\mlwin.exe)
 
MLwiN 2.28 multilevel model                     Number of obs      =      3340
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
          SCHID |      995          1        3.4         42
-----------------------------------------------------------

Run time (seconds)   =       4.23
Number of iterations =          4
Log likelihood       = -2621.1379
Deviance             =  5242.2759
------------------------------------------------------------------------------
 _1_POSTELNC |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |   .2583588   .0216994    11.91   0.000     .2158288    .3008887
 _1_PRE |   .2659471   .0107485    24.74   0.000     .2448804    .2870137
    _1_DEXP2 |   .0146425   .0242152     0.60   0.545    -.0328185    .0621034
    _1_DEXP3 |   .0083057   .0274388     0.30   0.762    -.0454735    .0620848
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: SCHID               |
                   var(cons) |   .0227135    .004635      .0136292    .0317979
-----------------------------+------------------------------------------------
Level 1: TID                 |
                   var(cons) |    .263262   .0071263      .2492947    .2772293
------------------------------------------------------------------------------
There are still some zeros for the random part, but even if I change them to some other values, I still got the error message.
The other thing I tried is to use a different predictor. The one used above is two dummy variables created from a categorical variable. I tried DMATH2 which is also a dummy variable. It worked without any problem despite the initial 3-level naive model has some zeros at the random part (see the output below). It seems to suggest that the starting value might not be the problem here. Can you think of other hypothesis that might cause the issue? Thank you!!

Code: Select all

runmlwin _1_POST cons _1_PRE DMATH2, level3(SCHID: cons) level2(PID: cons DMATH2) level1(T
> ID: cons) nopause mlwinpath(C:\mlwin.exe)
 
MLwiN 2.28 multilevel model                     Number of obs      =      3340
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
          SCHID |      995          1        3.4         42
            PID |     1206          1        2.8         42
-----------------------------------------------------------

Run time (seconds)   =       5.01
Number of iterations =          5
Log likelihood       = -2601.0356
Deviance             =  5202.0713
------------------------------------------------------------------------------
 _1_POSTELNC |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |   .2649757   .0112472    23.56   0.000     .2429317    .2870197
 _1_PRE |   .2602331   .0107373    24.24   0.000     .2391885    .2812778
      DMATH2 |   .0736995   .0453634     1.62   0.104    -.0152112    .1626102
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 3: SCHID               |
                   var(cons) |          0          0             0           0
-----------------------------+------------------------------------------------
Level 2: PID                 |
                   var(cons) |   .0259923   .0051492         .0159    .0360846
            cov(cons,DMATH2) |  -.0066149   .0165529      -.039058    .0258282
                 var(DMATH2) |   .1474052   .0476172      .0540772    .2407332
-----------------------------+------------------------------------------------
Level 1: TID                 |
                   var(cons) |   .2499004   .0071727      .2358422    .2639585
------------------------------------------------------------------------------
ChrisCharlton
Posts: 1353
Joined: Mon Oct 19, 2009 10:34 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by ChrisCharlton »

I still suspect that the issue is due to starting values for the random part covariance matrix. In your two level model the zero variance means this matrix is not positive-definite:

Code: Select all

. matrix define A = (.0260758,0,-.0049737\0,0,0\-.0049737,0,.0032904)

. matrix B = cholesky(A)
matrix not positive definite
r(506);
whereas the covariance matrix for your 3-level model is:

Code: Select all

. matrix C = (.0259923, -.0066149\-.0066149,.1474052)

. matrix D = cholesky(C)

. 
You don't say what values you tried, but you still need to end up with an invertible matrix.

If you just want to test that this works you could try with the identity matrix, however as this initial value matrix ends up in the prior, in your final model you will want something more sensible such as values with the right orders of magnitude.
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

Hi Chris,

You are right. The problem is solved once I use a positive-definite matrix for the random part. One question I have (hopefully this is my last one) is whether starting values affect estimates in any important ways. I know they affect how fast the model would converge. I used some random positive-definite matrix for the model and not sure how/whether it will change the estimates and std. errs. Thank you!
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: 3 level multilevel data, multiple imputation, stata, run

Post by GeorgeLeckie »

Hi,

Good practice to fit the model multiple times from different sets of overdispersed (but still plausible) starting values.

Ideally the multiple chains will converge on the same posterior distributions after an initial burnin.

If the multiple chains converge on different posterior distributions then you have a problem. The model may well be overly complex given the data and you would be best off simplifying the model.

Best wishes

George
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

It makes sense. Thank both of you for all your help! You guys are awesome!
stellaliu
Posts: 8
Joined: Thu Feb 27, 2014 3:05 am

Re: 3 level multilevel data, multiple imputation, stata, run

Post by stellaliu »

one more question on the model.....when I used the command below to fit the model on 5 imputed data sets, the final output gives some warning (although I still have the results): number of groups varies among imputations./number of observations per group varies among imputations. I checked that the number of groups and observation per group are the same across the 5 imputed data sets. Also, there is no such warning when I try:mi est:xtmixed POSTELNC PREELNCO ||PID:||SCHID:,mle var. So I think this might have something to do with the MCMC estimation conducted in MLwin. Do you have any idea why this happened? thank you!

Code: Select all

mi est,cmdok:runmlwin POST cons PRE, level3(SCHID: cons) level2(PID: cons) level1(TID: cons) mcmc(cc) initsmodel(m1igls) nopause mlwinpath(C:\mlwin.exe)
Post Reply