Page 1 of 1

Problem in imputation of ordinal variable

Posted: Wed Apr 16, 2014 11:01 am
by timothymak
Hi,

I'm using RealcomImpute to impute missing values in my dataset which contains a mixture of ordinal, categorical, and continuous variables. The total number of variables requiring imputation is around 6. One of my ordinal variables has 30 categories. This variable has around 20% zeros (0). The rest of the values are roughly normally distributed from 1 to 29. It has around 7% missing.

On inspecting the imputed values of this variable, I found that the imputed values often don't make a lot of sense. For some individuals, the same value is imputed, e.g = 16 for all 16 imputations. For other individuals, some imputated values are 22 or 21, while others are 0 and 1. In general, there are very few imputed values that are in the middle range. Most are either 0,1,2 or 16+.

If I understand correctly, the model for ordinal regression is based on an underlying latent variable either Normally or Logistically distributed. It seems therefore that different individuals are having different variances for the latent variables, yet the same thresholds. Large variances would tend to lead to imputed values at the extreme, while small variances would lead to picking the same value every time.

I don't know if this is what's happening with RealcomImpute. If so, is there a way to change this behaviour? For example, can we fix the variance so that the imputation model only models the location but not the variance?

Thanks in advance for your help.

Tim

Re: Problem in imputation of ordinal variable

Posted: Thu Apr 24, 2014 12:48 pm
by ChrisCharlton
It's hard to say what is going on here without seeing your data. You might want to try fitting the same model using Stat-JR and seeing whether you get the same behaviour.

Re: Problem in imputation of ordinal variable

Posted: Fri Apr 25, 2014 2:54 am
by timothymak
Dear Chris,

Thanks for your reply. I don't have MLwiN so am not able to run Stat-JR. However, do you think you can tell me what formula they use for ordinal regression in Realcom, and in particular whether they allow for variation in the latent variable variance across observations?

I'm not sure whether I can give you the data at the moment, but in case it is possible (perhaps after some scrambling), do you think you can have a look?

Thanks again for your help.

Yours,

Tim

Re: Problem in imputation of ordinal variable

Posted: Fri Apr 25, 2014 1:19 pm
by ChrisCharlton
You can find the details of the algorithms used in Realcom-Impute in the following papers:

Carpenter, James R., Harvey Goldstein, and Michael G. Kenward. "REALCOM-IMPUTE software for multilevel multiple imputation with mixed response types." Journal of Statistical Software 45.5 (2011): 1-14. http://www.jstatsoft.org/v45/i05/paper

Goldstein, Harvey, et al. "Multilevel models with multivariate mixed response types." Statistical Modelling 9.3 (2009): 173-197. http://smj.sagepub.com/content/9/3/173.abstract

If you can provide data that causes the same issue that you are seeing (along with the model details) I could check this against Stat-JR for you.

Re: Problem in imputation of ordinal variable

Posted: Fri May 02, 2014 3:42 am
by timothymak
Dear Chris,

Thanks again for your reply. From reading the papers, it doesn't seem that Realcom is doing what I thought it was doing. Anyway, I'll try to put together an reproducible example of what is happening and send it to you.

Yours,
Tim