Problem when imputing second level variable
Posted: Tue Jan 29, 2013 9:22 pm
Hello all,
I'm trying to impute missing cases in both level 1 and level 2 variables (individual and school respectively in this case) where some are continues and others are categorical (ordered and non-ordered). I have a few explanatory variables in the model which have no missing values from both levels of analysis.
The syntax (in STATA) for the command I used was:
realcomImpute X1 o.X2 m.W1 W2 X3 X4 W3 W4 W5 using MI3.dat, replace numresponses(4) level2id(school) cons(cons)
Where X is level 1 variables and W is level 2 variables.
The problem is that the imputed results of the continues level 2 variable I get are very different from the original data. For example, the SD of the variables in the imputed data sets is about 6 times bigger than in the original data set. Moreover, the range of the results is much wider in the new data sets (Max and Min values are much farther apart). This causes the coefficient of the relevant variables in the analytical model to drop dramatically and SE is very big.
Is there any way to use PMM in realcom or set Max or Min boundaries in the imputation of this variable? If not, is there any other way of dealing with this problem?
Any help will be much appreciated.
Kind regards,
Amit
I'm trying to impute missing cases in both level 1 and level 2 variables (individual and school respectively in this case) where some are continues and others are categorical (ordered and non-ordered). I have a few explanatory variables in the model which have no missing values from both levels of analysis.
The syntax (in STATA) for the command I used was:
realcomImpute X1 o.X2 m.W1 W2 X3 X4 W3 W4 W5 using MI3.dat, replace numresponses(4) level2id(school) cons(cons)
Where X is level 1 variables and W is level 2 variables.
The problem is that the imputed results of the continues level 2 variable I get are very different from the original data. For example, the SD of the variables in the imputed data sets is about 6 times bigger than in the original data set. Moreover, the range of the results is much wider in the new data sets (Max and Min values are much farther apart). This causes the coefficient of the relevant variables in the analytical model to drop dramatically and SE is very big.
Is there any way to use PMM in realcom or set Max or Min boundaries in the imputation of this variable? If not, is there any other way of dealing with this problem?
Any help will be much appreciated.
Kind regards,
Amit