Problem in imputation of ordinal variable
Posted: Wed Apr 16, 2014 11:01 am
Hi,
I'm using RealcomImpute to impute missing values in my dataset which contains a mixture of ordinal, categorical, and continuous variables. The total number of variables requiring imputation is around 6. One of my ordinal variables has 30 categories. This variable has around 20% zeros (0). The rest of the values are roughly normally distributed from 1 to 29. It has around 7% missing.
On inspecting the imputed values of this variable, I found that the imputed values often don't make a lot of sense. For some individuals, the same value is imputed, e.g = 16 for all 16 imputations. For other individuals, some imputated values are 22 or 21, while others are 0 and 1. In general, there are very few imputed values that are in the middle range. Most are either 0,1,2 or 16+.
If I understand correctly, the model for ordinal regression is based on an underlying latent variable either Normally or Logistically distributed. It seems therefore that different individuals are having different variances for the latent variables, yet the same thresholds. Large variances would tend to lead to imputed values at the extreme, while small variances would lead to picking the same value every time.
I don't know if this is what's happening with RealcomImpute. If so, is there a way to change this behaviour? For example, can we fix the variance so that the imputation model only models the location but not the variance?
Thanks in advance for your help.
Tim
I'm using RealcomImpute to impute missing values in my dataset which contains a mixture of ordinal, categorical, and continuous variables. The total number of variables requiring imputation is around 6. One of my ordinal variables has 30 categories. This variable has around 20% zeros (0). The rest of the values are roughly normally distributed from 1 to 29. It has around 7% missing.
On inspecting the imputed values of this variable, I found that the imputed values often don't make a lot of sense. For some individuals, the same value is imputed, e.g = 16 for all 16 imputations. For other individuals, some imputated values are 22 or 21, while others are 0 and 1. In general, there are very few imputed values that are in the middle range. Most are either 0,1,2 or 16+.
If I understand correctly, the model for ordinal regression is based on an underlying latent variable either Normally or Logistically distributed. It seems therefore that different individuals are having different variances for the latent variables, yet the same thresholds. Large variances would tend to lead to imputed values at the extreme, while small variances would lead to picking the same value every time.
I don't know if this is what's happening with RealcomImpute. If so, is there a way to change this behaviour? For example, can we fix the variance so that the imputation model only models the location but not the variance?
Thanks in advance for your help.
Tim