longitudinal dataset
Posted: Fri Jul 19, 2013 10:55 pm
Hi REALCOM users,
I’m new to multiple imputation and this software, and wanted to run by how I am using the software to see if anything thinks I am doing something wrong.
I have a longitudinal dataset with repeated measures clustered within patients. The number of time points and their time spacing varies across patients. There are a total of 168,708 observations and 14,429 unique patients. My analysis model is a 2-level random intercept logistic regression model (random intercept for the patient), with the unit of analysis being patient-day. The analysis model includes independent variables measured at the day level, year level, time-invariant ones, and also includes physician-level demographic variables.
I am using the realcomImpute command in Stata and have included as explanatory variables in the imputation model all the non-missing variables included in the analysis model, the dependent variable, and some additional variables that might predict the probability of the variable being missing. Here is my code (too many explanatory variables to list, put explanatory_variables in their place):
realcomImpute m.comorbiditystatus_year_grp m.insurance_rev explanatory_variables using dxhyp.dat, replace numresponses(2) level2id(Patient_Key) cons(cons)
In REALCOM, I left the model specification, MCMC estimation settings, values in Impute procedure at their default values.
A few specific questions:
• Does this approach seem OK in general for the type of data I have? Or are there modifications I should make, for example to what some of the default values are?
• In an imputation model for longitudinal data, does one typically include the value of the variable with missing values at the prior time point as an explanatory variable in the model?
• I’m not sure I understand how REALCOM comes up with the default model specification. Are there situations where that should be changed?
Any comments would be much appreciated!!
Thanks,
Caroline
I’m new to multiple imputation and this software, and wanted to run by how I am using the software to see if anything thinks I am doing something wrong.
I have a longitudinal dataset with repeated measures clustered within patients. The number of time points and their time spacing varies across patients. There are a total of 168,708 observations and 14,429 unique patients. My analysis model is a 2-level random intercept logistic regression model (random intercept for the patient), with the unit of analysis being patient-day. The analysis model includes independent variables measured at the day level, year level, time-invariant ones, and also includes physician-level demographic variables.
I am using the realcomImpute command in Stata and have included as explanatory variables in the imputation model all the non-missing variables included in the analysis model, the dependent variable, and some additional variables that might predict the probability of the variable being missing. Here is my code (too many explanatory variables to list, put explanatory_variables in their place):
realcomImpute m.comorbiditystatus_year_grp m.insurance_rev explanatory_variables using dxhyp.dat, replace numresponses(2) level2id(Patient_Key) cons(cons)
In REALCOM, I left the model specification, MCMC estimation settings, values in Impute procedure at their default values.
A few specific questions:
• Does this approach seem OK in general for the type of data I have? Or are there modifications I should make, for example to what some of the default values are?
• In an imputation model for longitudinal data, does one typically include the value of the variable with missing values at the prior time point as an explanatory variable in the model?
• I’m not sure I understand how REALCOM comes up with the default model specification. Are there situations where that should be changed?
Any comments would be much appreciated!!
Thanks,
Caroline