runmlwin with imputed data
-
- Posts: 12
- Joined: Tue Apr 29, 2014 12:41 pm
runmlwin with imputed data
Hi there,
I would like to use runmlwin in Stata 11 to run my analysis where the level2 unit is 'team' while level 1 is the 'participants'. I imputed my missing data using multiple imputation (mi ice) in stata 11.
I wrote up the following code to call up runmlwin:
global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
g idcounter = _n
sort team idcounter
gen cons = 1
mi estimate, cmdok: runmlwin fQPR_tot bQPR_tot cons, nopause level2(team: cons) level1(ID: cons)
runmlwin does not seem to be supported by mi estimate but Stata message suggested I should specify the cmdok option to get it to run; however, when I run the script I receive the following message:
The data must be sorted according to the order of the model hierarchy: team ID.
an error occurred when mi estimate executed runmlwin on m=1
Grateful for any suggestion on how to resolve this issue - apologies if this is a very basic question but it is my first attempt at runmlwin.
cheers,
Fran
I would like to use runmlwin in Stata 11 to run my analysis where the level2 unit is 'team' while level 1 is the 'participants'. I imputed my missing data using multiple imputation (mi ice) in stata 11.
I wrote up the following code to call up runmlwin:
global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
g idcounter = _n
sort team idcounter
gen cons = 1
mi estimate, cmdok: runmlwin fQPR_tot bQPR_tot cons, nopause level2(team: cons) level1(ID: cons)
runmlwin does not seem to be supported by mi estimate but Stata message suggested I should specify the cmdok option to get it to run; however, when I run the script I receive the following message:
The data must be sorted according to the order of the model hierarchy: team ID.
an error occurred when mi estimate executed runmlwin on m=1
Grateful for any suggestion on how to resolve this issue - apologies if this is a very basic question but it is my first attempt at runmlwin.
cheers,
Fran
-
- Site Admin
- Posts: 432
- Joined: Fri Apr 01, 2011 2:14 pm
Re: runmlwin with imputed data
Hi Fran,
Here is a similar two-level example which seems to work
First I artificially make some of the data missing and then create two imputed datasets
I then fit the model of interest to each imputed dataset and combine the results using Rubin's rules.
Best wishes
George
Syntax
Ouptut
Here is a similar two-level example which seems to work
First I artificially make some of the data missing and then create two imputed datasets
I then fit the model of interest to each imputed dataset and combine the results using Rubin's rules.
Best wishes
George
Syntax
Code: Select all
* Load the data
use "http://www.bristol.ac.uk/cmm/media/runmlwin/tutorial.dta", clear
* Set random number seed
set seed 135123
* Make 20% of normexam values missing
replace normexam = . if runiform()<=0.2
* Make 20% of standlrt values missing
replace standlrt = . if runiform()<=0.2
* Declare the way the additional imputed data will be stored
mi set mlong
* Specify variables which have missing values
mi register imputed normexam standlrt
* Create two imputed datasets using a naive single-level imputation model
mi impute mvn normexam standlrt = girl, add(2)
* Fit model of interest to imputed datasets and combine the model results
mi estimate , cmdok noisily: runmlwin normexam cons standlrt girl, ///
level2(school: cons standlrt) ///
level1(student: cons) ///
nopause
Code: Select all
. * Load the data
. use "http://www.bristol.ac.uk/cmm/media/runmlwin/tutorial.dta", clear
.
. * Set random number seed
. set seed 135123
.
. * Make 20% of normexam values missing
. replace normexam = . if runiform()<=0.2
(847 real changes made, 847 to missing)
.
. * Make 20% of standlrt values missing
. replace standlrt = . if runiform()<=0.2
(803 real changes made, 803 to missing)
.
. * Declare the way the additional imputed data will be stored
. mi set mlong
.
. * Specify variables which have missing values
. mi register imputed normexam standlrt
(1483 m=0 obs. now marked as incomplete)
.
. * Create two imputed datasets using a naive single-level imputation model
. mi impute mvn normexam standlrt = girl, add(2)
Performing EM optimization:
note: 167 observations omitted from EM estimation because of all imputation variables missing
observed log likelihood = -2654.151 at iteration 11
Performing MCMC data augmentation ...
Multivariate imputation Imputations = 2
Multivariate normal regression added = 2
Imputed: m=1 through m=2 updated = 0
Prior: uniform Iterations = 200
burn-in = 100
between = 100
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
normexam | 3212 847 847 | 4059
standlrt | 3256 803 803 | 4059
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
.
. * Fit model of interest to imputed datasets and combine the model results
. mi estimate , cmdok noisily: runmlwin normexam cons standlrt girl, ///
> level2(school: cons standlrt) ///
> level1(student: cons) ///
> nopause
(running runmlwin on m=1)
MLwiN 2.30 multilevel model Number of obs = 4059
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
| No. of Observations per Group
Level Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
school | 65 2 62.4 198
-----------------------------------------------------------
Run time (seconds) = 1.66
Number of iterations = 4
Log likelihood = -4769.792
Deviance = 9539.584
------------------------------------------------------------------------------
normexam | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cons | -.0920951 .0378853 -2.43 0.015 -.1663488 -.0178413
standlrt | .5609903 .0178983 31.34 0.000 .5259103 .5960702
girl | .1718996 .0327889 5.24 0.000 .1076345 .2361648
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school |
var(cons) | .0581592 .0122476 .0341543 .0821641
cov(cons,standlrt) | .0090688 .0048128 -.0003642 .0185017
var(standlrt) | .0091999 .0035315 .0022782 .0161215
-----------------------------+------------------------------------------------
Level 1: student |
var(cons) | .5906982 .0133154 .5646005 .6167959
------------------------------------------------------------------------------
(running runmlwin on m=2)
MLwiN 2.30 multilevel model Number of obs = 4059
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
| No. of Observations per Group
Level Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
school | 65 2 62.4 198
-----------------------------------------------------------
Run time (seconds) = 1.45
Number of iterations = 4
Log likelihood = -4822.5435
Deviance = 9645.0869
------------------------------------------------------------------------------
normexam | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cons | -.1325963 .0365449 -3.63 0.000 -.2042231 -.0609696
standlrt | .5401215 .018701 28.88 0.000 .5034681 .5767749
girl | .2258199 .0327043 6.90 0.000 .1617206 .2899192
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school |
var(cons) | .0513985 .0111107 .0296218 .0731751
cov(cons,standlrt) | .0136131 .0049637 .0038844 .0233419
var(standlrt) | .0107139 .0038517 .0031647 .0182632
-----------------------------+------------------------------------------------
Level 1: student |
var(cons) | .6077405 .0136965 .5808959 .6345851
------------------------------------------------------------------------------
Multiple-imputation estimates Imputations = 2
Normal response model Number of obs = 4059
Average RVI = 0.8001
Largest FMI = 0.7965
DF adjustment: Large sample DF: min = 2.23
avg = 17.93
max = 79.95
------------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
FP1 |
cons | -.1123457 .0511437 -2.20 0.085 -.2481237 .0234323
standlrt | .5505559 .0257229 21.40 0.000 .4798434 .6212683
girl | .1988598 .0570341 3.49 0.063 -.0241931 .4219127
-------------------+----------------------------------------------------------------
RP2 |
var(cons) | .0547788 .013077 4.19 0.000 .02784 .0817177
cov(cons\standlrt) | .0113409 .0062761 1.81 0.117 -.003751 .0264329
var(standlrt) | .0099569 .0039209 2.54 0.013 .0021541 .0177597
-------------------+----------------------------------------------------------------
RP1 |
var(cons) | .5992194 .0200069 29.95 0.000 .5393834 .6590553
------------------------------------------------------------------------------------
Note: number of groups varies among imputations.
Note: number of observations per group varies among imputations.
-
- Posts: 12
- Joined: Tue Apr 29, 2014 12:41 pm
Re: runmlwin with imputed data
hi George,
Thanks a lot for sending me the example. It is very helpful.
To runmlwin to run on my imputed dataset, I had to change the format to wide (rather than having the imputed datasets in the long format):
global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
recast float bQPR_tot, force
recast float fQPR_tot, force
gen cons = 1
mi convert wide, clear
sort team ID
xi: mi est, cmdok : runmlwin fQPR_tot bQPR_tot i.Intervention i.wave cons, nopause level2(team: cons) level1(ID: cons)
Do you know why this may be the case?
Thanks again,
Fran
Thanks a lot for sending me the example. It is very helpful.
To runmlwin to run on my imputed dataset, I had to change the format to wide (rather than having the imputed datasets in the long format):
global MLwiN_path "C:\Program Files (x86)\MLwiN v2.29\i386\mlwin.exe"
recast float bQPR_tot, force
recast float fQPR_tot, force
gen cons = 1
mi convert wide, clear
sort team ID
xi: mi est, cmdok : runmlwin fQPR_tot bQPR_tot i.Intervention i.wave cons, nopause level2(team: cons) level1(ID: cons)
Do you know why this may be the case?
Thanks again,
Fran
-
- Site Admin
- Posts: 432
- Joined: Fri Apr 01, 2011 2:14 pm
Re: runmlwin with imputed data
Hi Fran,
No I am not sure why you still run into problems.
I have just checked and the example code I provided works fine on Stata 11 so it is not your version number which is the problem.
All I can suggest is that you compare and contrast what you have done with the example code I provided to try to get to the bottom of why the former does not work but the latter does.
Sorry not to be of more help
George
No I am not sure why you still run into problems.
I have just checked and the example code I provided works fine on Stata 11 so it is not your version number which is the problem.
All I can suggest is that you compare and contrast what you have done with the example code I provided to try to get to the bottom of why the former does not work but the latter does.
Sorry not to be of more help
George
-
- Posts: 12
- Joined: Tue Apr 29, 2014 12:41 pm
Re: runmlwin with imputed data
Thanks again for your reply - I will have a play with your code and apply it to my data.
Thanks again!
Thanks again!
-
- Posts: 12
- Joined: Tue Apr 29, 2014 12:41 pm
Re: runmlwin with imputed data
hi George,
I had a close look at your example dataset and script and realised what is causing the problem in my dataset. In your example the student ID codes (level 1) go from 1 to n for each school (level 2) whereas my ID codes go from 1 to 500 counting the overall sample rather than for each cluster separately.
Is there a way to generate an ID code that counts observations within each cluster? I tried:
for each x of cluster {
g idcounter = _n}
but it did not work. I am sorry if this is a basic Stata question but it would be really helpful if you could help.
Cheers,
F
I had a close look at your example dataset and script and realised what is causing the problem in my dataset. In your example the student ID codes (level 1) go from 1 to n for each school (level 2) whereas my ID codes go from 1 to 500 counting the overall sample rather than for each cluster separately.
Is there a way to generate an ID code that counts observations within each cluster? I tried:
for each x of cluster {
g idcounter = _n}
but it did not work. I am sorry if this is a basic Stata question but it would be really helpful if you could help.
Cheers,
F
-
- Site Admin
- Posts: 432
- Joined: Fri Apr 01, 2011 2:14 pm
Re: runmlwin with imputed data
Hi Fran,
To generate a nested ID....
Best wishes
George
To generate a nested ID....
Code: Select all
. bysort cluster: generate id = _n
George
-
- Posts: 12
- Joined: Tue Apr 29, 2014 12:41 pm
Re: runmlwin with imputed data
Brilliant - thank you very much!
Re: runmlwin with imputed data
Dear,
I would like to follow up on this old forum topic as I am currently struggling with some problems using the runmlwin command in the context of multiple imputation.
It concerns following situation:
I created 5 imputed datasets using the –mi impute chained- command in stata and registered all types of variables using the –mi register- command to perform a multiple imputation analysis. The –mi set- is default (flong I think). But I tried different styles as well.
The problem arises when estimating the model (multilevel multinomial logistic regression model) using the runmlwin procedure (-xi: mi estimate, cmdok: runmlwin move_a-). Here the error pops up that the data must be manually listwise deleted prior to calling mlwin. This implies that stata stores the original missing values in some way and runmlwin is trying to use this data to estimate the model. Is there any suggestion to overcome this problem? The error concludes: “an error occurred when mi estimate executed runmlwin on m=1.” I am not sure whether this means the error is generated when loading the m=0 data or in the m=1. I thought m=1 points at the first imputed dataset and I expect that all missings are imputed here? Is it valid to carry out listwise deletion using the –mi xeq- command? Maybe this is less a problem when there I create a high number of imputed datasets, given that the weight of deleted cases gets smaller then…
Many thanks for sharing your thoughts on this!
I would like to follow up on this old forum topic as I am currently struggling with some problems using the runmlwin command in the context of multiple imputation.
It concerns following situation:
I created 5 imputed datasets using the –mi impute chained- command in stata and registered all types of variables using the –mi register- command to perform a multiple imputation analysis. The –mi set- is default (flong I think). But I tried different styles as well.
The problem arises when estimating the model (multilevel multinomial logistic regression model) using the runmlwin procedure (-xi: mi estimate, cmdok: runmlwin move_a-). Here the error pops up that the data must be manually listwise deleted prior to calling mlwin. This implies that stata stores the original missing values in some way and runmlwin is trying to use this data to estimate the model. Is there any suggestion to overcome this problem? The error concludes: “an error occurred when mi estimate executed runmlwin on m=1.” I am not sure whether this means the error is generated when loading the m=0 data or in the m=1. I thought m=1 points at the first imputed dataset and I expect that all missings are imputed here? Is it valid to carry out listwise deletion using the –mi xeq- command? Maybe this is less a problem when there I create a high number of imputed datasets, given that the weight of deleted cases gets smaller then…
Many thanks for sharing your thoughts on this!
-
- Posts: 1384
- Joined: Mon Oct 19, 2009 10:34 am
Re: runmlwin with imputed data
I have tried to replicate this below (the model itself may not make sense, but the steps should be similar) however I don't encounter this problem.
The estimation should only be using m=1..5, so I would suggest looking at your imputed data to see whether perhaps you have missed some variables from your imputation that are therefore still missing in your imputed datasets. You could also try fitting the model for each of the imputed datasets by using the imputations() option of ml estimate. If you are still having the problems and can provide a replicable example then we can investigate further.
Code: Select all
. use "http://www.bristol.ac.uk/cmm/media/runmlwin/bang.dta", clear
. gen ind = runiform()
. // Make 10% of urban missing
. replace urban = . if ind < 0.1
(273 real changes made, 273 to missing)
. // Impute missing values
. mi set mlong
. mi register imputed urban
(273 m=0 obs. now marked as incomplete)
. mi impute logit urban age educ hindu, add(10)
Univariate imputation Imputations = 10
Logistic regression added = 10
Imputed: m=1 through m=10 updated = 0
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
urban | 2594 273 273 | 2867
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
.// Estimate models based on imputations and combine results
. mi estimate, cmdok: runmlwin use4 cons urban, ///
> level1(woman: ) ///
> discrete(distribution(multinomial) link(mlogit) denominator(cons) basecategory(4)) ///
> nopause
Multiple-imputation estimates Imputations = 10
Unordered multinomial logit response model Number of obs = 2,867
Average RVI = .
Largest FMI = .
DF adjustment: Large sample DF: min = 9.28
avg = .
max = .
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
FP1 |
cons_1 | -1.795587 .0717415 -25.03 0.000 -1.936205 -1.654969
urban_1 | .2103383 .1431646 1.47 0.142 -.0704809 .4911576
-------------+----------------------------------------------------------------
FP2 |
cons_2 | -1.533378 .0649947 -23.59 0.000 -1.660802 -1.405954
urban_2 | 1.151909 .1032499 11.16 0.000 .9492874 1.354531
-------------+----------------------------------------------------------------
FP3 |
cons_3 | -1.850656 .0732897 -25.25 0.000 -1.994305 -1.707007
urban_3 | .1584916 .1475056 1.07 0.283 -.1307294 .4477127
-------------+----------------------------------------------------------------
OD |
bcons_1 | 1 . . . . .
bcons_2 | 1 3.91e-17 2.6e+16 0.000 1 1
------------------------------------------------------------------------------