runmlwin with imputed data

mlwinnewbie · Post by **mlwinnewbie** » Tue Apr 29, 2014 12:56 pm

Hi there,

I would like to use runmlwin in Stata 11 to run my analysis where the level2 unit is 'team' while level 1 is the 'participants'. I imputed my missing data using multiple imputation (mi ice) in stata 11.

I wrote up the following code to call up runmlwin:

global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
g idcounter = _n
sort team idcounter
gen cons = 1
mi estimate, cmdok: runmlwin fQPR_tot bQPR_tot cons, nopause level2(team: cons) level1(ID: cons)

runmlwin does not seem to be supported by mi estimate but Stata message suggested I should specify the cmdok option to get it to run; however, when I run the script I receive the following message:

The data must be sorted according to the order of the model hierarchy: team ID.

an error occurred when mi estimate executed runmlwin on m=1

Grateful for any suggestion on how to resolve this issue - apologies if this is a very basic question but it is my first attempt at runmlwin.

cheers,
Fran

GeorgeLeckie · Post by **GeorgeLeckie** » Tue Apr 29, 2014 5:51 pm

Hi Fran,

Here is a similar two-level example which seems to work

First I artificially make some of the data missing and then create two imputed datasets

I then fit the model of interest to each imputed dataset and combine the results using Rubin's rules.

Best wishes

George

Syntax

Code: Select all

* Load the data
use "http://www.bristol.ac.uk/cmm/media/runmlwin/tutorial.dta", clear

* Set random number seed
set seed 135123

* Make 20% of normexam values missing
replace normexam = . if runiform()<=0.2

* Make 20% of standlrt values missing
replace standlrt = . if runiform()<=0.2

* Declare the way the additional imputed data will be stored
mi set mlong

* Specify variables which have missing values
mi register imputed normexam standlrt

* Create two imputed datasets using a naive single-level imputation model
mi impute mvn normexam standlrt = girl, add(2)

* Fit model of interest to imputed datasets and combine the model results
mi estimate , cmdok noisily: runmlwin normexam cons standlrt girl, ///
	level2(school: cons standlrt) ///
	level1(student: cons) ///
	nopause

Ouptut

Code: Select all

. * Load the data
. use "http://www.bristol.ac.uk/cmm/media/runmlwin/tutorial.dta", clear

. 
. * Set random number seed
. set seed 135123

. 
. * Make 20% of normexam values missing
. replace normexam = . if runiform()<=0.2
(847 real changes made, 847 to missing)

. 
. * Make 20% of standlrt values missing
. replace standlrt = . if runiform()<=0.2
(803 real changes made, 803 to missing)

. 
. * Declare the way the additional imputed data will be stored
. mi set mlong

. 
. * Specify variables which have missing values
. mi register imputed normexam standlrt
(1483 m=0 obs. now marked as incomplete)

. 
. * Create two imputed datasets using a naive single-level imputation model
. mi impute mvn normexam standlrt = girl, add(2)

Performing EM optimization:
note: 167 observations omitted from EM estimation because of all imputation variables missing
  observed log likelihood =  -2654.151 at iteration 11

Performing MCMC data augmentation ... 

Multivariate imputation                     Imputations =        2
Multivariate normal regression                    added =        2
Imputed: m=1 through m=2                        updated =        0

Prior: uniform                               Iterations =      200
                                                burn-in =      100
                                                between =      100

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
          normexam |       3212          847       847 |      4059
          standlrt |       3256          803       803 |      4059
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

. 
. * Fit model of interest to imputed datasets and combine the model results
. mi estimate , cmdok noisily: runmlwin normexam cons standlrt girl, ///
>         level2(school: cons standlrt) ///
>         level1(student: cons) ///
>         nopause

(running runmlwin on m=1)
 
MLwiN 2.30 multilevel model                     Number of obs      =      4059
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         school |       65          2       62.4        198
-----------------------------------------------------------

Run time (seconds)   =       1.66
Number of iterations =          4
Log likelihood       =  -4769.792
Deviance             =   9539.584
------------------------------------------------------------------------------
    normexam |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -.0920951   .0378853    -2.43   0.015    -.1663488   -.0178413
    standlrt |   .5609903   .0178983    31.34   0.000     .5259103    .5960702
        girl |   .1718996   .0327889     5.24   0.000     .1076345    .2361648
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school              |
                   var(cons) |   .0581592   .0122476      .0341543    .0821641
          cov(cons,standlrt) |   .0090688   .0048128     -.0003642    .0185017
               var(standlrt) |   .0091999   .0035315      .0022782    .0161215
-----------------------------+------------------------------------------------
Level 1: student             |
                   var(cons) |   .5906982   .0133154      .5646005    .6167959
------------------------------------------------------------------------------

(running runmlwin on m=2)
 
MLwiN 2.30 multilevel model                     Number of obs      =      4059
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         school |       65          2       62.4        198
-----------------------------------------------------------

Run time (seconds)   =       1.45
Number of iterations =          4
Log likelihood       = -4822.5435
Deviance             =  9645.0869
------------------------------------------------------------------------------
    normexam |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -.1325963   .0365449    -3.63   0.000    -.2042231   -.0609696
    standlrt |   .5401215    .018701    28.88   0.000     .5034681    .5767749
        girl |   .2258199   .0327043     6.90   0.000     .1617206    .2899192
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school              |
                   var(cons) |   .0513985   .0111107      .0296218    .0731751
          cov(cons,standlrt) |   .0136131   .0049637      .0038844    .0233419
               var(standlrt) |   .0107139   .0038517      .0031647    .0182632
-----------------------------+------------------------------------------------
Level 1: student             |
                   var(cons) |   .6077405   .0136965      .5808959    .6345851
------------------------------------------------------------------------------

Multiple-imputation estimates                     Imputations     =          2
Normal response model                             Number of obs   =       4059
                                                  Average RVI     =     0.8001
                                                  Largest FMI     =     0.7965
DF adjustment:   Large sample                     DF:     min     =       2.23
                                                          avg     =      17.93
                                                          max     =      79.95

------------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
FP1                |
              cons |  -.1123457   .0511437    -2.20   0.085    -.2481237    .0234323
          standlrt |   .5505559   .0257229    21.40   0.000     .4798434    .6212683
              girl |   .1988598   .0570341     3.49   0.063    -.0241931    .4219127
-------------------+----------------------------------------------------------------
RP2                |
         var(cons) |   .0547788    .013077     4.19   0.000       .02784    .0817177
cov(cons\standlrt) |   .0113409   .0062761     1.81   0.117     -.003751    .0264329
     var(standlrt) |   .0099569   .0039209     2.54   0.013     .0021541    .0177597
-------------------+----------------------------------------------------------------
RP1                |
         var(cons) |   .5992194   .0200069    29.95   0.000     .5393834    .6590553
------------------------------------------------------------------------------------
Note: number of groups varies among imputations.
Note: number of observations per group varies among imputations.

mlwinnewbie · Post by **mlwinnewbie** » Wed Apr 30, 2014 8:55 am

hi George,

Thanks a lot for sending me the example. It is very helpful.

To runmlwin to run on my imputed dataset, I had to change the format to wide (rather than having the imputed datasets in the long format):

global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
recast float bQPR_tot, force
recast float fQPR_tot, force
gen cons = 1
mi convert wide, clear
sort team ID
xi: mi est, cmdok : runmlwin fQPR_tot bQPR_tot i.Intervention i.wave cons, nopause level2(team: cons) level1(ID: cons)

Do you know why this may be the case?

Thanks again,
Fran

GeorgeLeckie · Post by **GeorgeLeckie** » Wed Apr 30, 2014 11:54 am

Hi Fran,

No I am not sure why you still run into problems.

I have just checked and the example code I provided works fine on Stata 11 so it is not your version number which is the problem.

All I can suggest is that you compare and contrast what you have done with the example code I provided to try to get to the bottom of why the former does not work but the latter does.

Sorry not to be of more help

George

mlwinnewbie · Post by **mlwinnewbie** » Wed Apr 30, 2014 12:52 pm

Thanks again for your reply - I will have a play with your code and apply it to my data.

Thanks again!

mlwinnewbie · Post by **mlwinnewbie** » Wed Apr 30, 2014 3:18 pm

hi George,

I had a close look at your example dataset and script and realised what is causing the problem in my dataset. In your example the student ID codes (level 1) go from 1 to n for each school (level 2) whereas my ID codes go from 1 to 500 counting the overall sample rather than for each cluster separately.

Is there a way to generate an ID code that counts observations within each cluster? I tried:

for each x of cluster {
g idcounter = _n}

but it did not work. I am sorry if this is a basic Stata question but it would be really helpful if you could help.

Cheers,
F

GeorgeLeckie · Post by **GeorgeLeckie** » Wed Apr 30, 2014 3:26 pm

Hi Fran,

To generate a nested ID....

Code: Select all

. bysort cluster: generate id = _n

Best wishes

George

mlwinnewbie · Post by **mlwinnewbie** » Thu May 01, 2014 7:53 am

Brilliant - thank you very much!

fonnyyyy · Post by **fonnyyyy** » Sun Oct 15, 2017 7:52 pm

Dear,
I would like to follow up on this old forum topic as I am currently struggling with some problems using the runmlwin command in the context of multiple imputation.

It concerns following situation:
I created 5 imputed datasets using the –mi impute chained- command in stata and registered all types of variables using the –mi register- command to perform a multiple imputation analysis. The –mi set- is default (flong I think). But I tried different styles as well.

The problem arises when estimating the model (multilevel multinomial logistic regression model) using the runmlwin procedure (-xi: mi estimate, cmdok: runmlwin move_a-). Here the error pops up that the data must be manually listwise deleted prior to calling mlwin. This implies that stata stores the original missing values in some way and runmlwin is trying to use this data to estimate the model. Is there any suggestion to overcome this problem? The error concludes: “an error occurred when mi estimate executed runmlwin on m=1.” I am not sure whether this means the error is generated when loading the m=0 data or in the m=1. I thought m=1 points at the first imputed dataset and I expect that all missings are imputed here? Is it valid to carry out listwise deletion using the –mi xeq- command? Maybe this is less a problem when there I create a high number of imputed datasets, given that the weight of deleted cases gets smaller then…

Many thanks for sharing your thoughts on this!

ChrisCharlton · Post by **ChrisCharlton** » Mon Oct 16, 2017 10:59 am

I have tried to replicate this below (the model itself may not make sense, but the steps should be similar) however I don't encounter this problem.

Code: Select all

. use "http://www.bristol.ac.uk/cmm/media/runmlwin/bang.dta", clear

. gen ind = runiform()

. // Make 10% of urban missing
. replace urban = . if ind < 0.1
(273 real changes made, 273 to missing)

. // Impute missing values
. mi set mlong
. mi register imputed urban
(273 m=0 obs. now marked as incomplete)
. mi impute logit urban age educ hindu, add(10)

Univariate imputation                       Imputations =       10
Logistic regression                               added =       10
Imputed: m=1 through m=10                       updated =        0

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
             urban |       2594          273       273 |      2867
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
 of the number of filled-in observations.)

.// Estimate models based on imputations and combine results
. mi estimate, cmdok: runmlwin use4 cons urban, ///
>         level1(woman: ) ///
>         discrete(distribution(multinomial) link(mlogit) denominator(cons) basecategory(4)) ///
>         nopause

Multiple-imputation estimates                   Imputations       =         10
Unordered multinomial logit response model      Number of obs     =      2,867
                                                Average RVI       =          .
                                                Largest FMI       =          .
DF adjustment:   Large sample                   DF:     min       =       9.28
                                                        avg       =          .
                                                        max       =          .

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
FP1          |
      cons_1 |  -1.795587   .0717415   -25.03   0.000    -1.936205   -1.654969
     urban_1 |   .2103383   .1431646     1.47   0.142    -.0704809    .4911576
-------------+----------------------------------------------------------------
FP2          |
      cons_2 |  -1.533378   .0649947   -23.59   0.000    -1.660802   -1.405954
     urban_2 |   1.151909   .1032499    11.16   0.000     .9492874    1.354531
-------------+----------------------------------------------------------------
FP3          |
      cons_3 |  -1.850656   .0732897   -25.25   0.000    -1.994305   -1.707007
     urban_3 |   .1584916   .1475056     1.07   0.283    -.1307294    .4477127
-------------+----------------------------------------------------------------
OD           |
     bcons_1 |          1          .        .       .            .           .
     bcons_2 |          1   3.91e-17  2.6e+16   0.000            1           1
------------------------------------------------------------------------------

The estimation should only be using m=1..5, so I would suggest looking at your imputed data to see whether perhaps you have missed some variables from your imputation that are therefore still missing in your imputed datasets. You could also try fitting the model for each of the imputed datasets by using the imputations() option of ml estimate. If you are still having the problems and can provide a replicable example then we can investigate further.

www.cmm.bristol.ac.uk/forum

runmlwin with imputed data

runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data

Re: runmlwin with imputed data