Hi REALCOM users,
I’m new to multiple imputation and this software, and wanted to run by how I am using the software to see if anything thinks I am doing something wrong.
I have a longitudinal dataset with repeated measures clustered within patients. The number of time points and their time spacing varies across patients. There are a total of 168,708 observations and 14,429 unique patients. My analysis model is a 2level random intercept logistic regression model (random intercept for the patient), with the unit of analysis being patientday. The analysis model includes independent variables measured at the day level, year level, timeinvariant ones, and also includes physicianlevel demographic variables.
I am using the realcomImpute command in Stata and have included as explanatory variables in the imputation model all the nonmissing variables included in the analysis model, the dependent variable, and some additional variables that might predict the probability of the variable being missing. Here is my code (too many explanatory variables to list, put explanatory_variables in their place):
realcomImpute m.comorbiditystatus_year_grp m.insurance_rev explanatory_variables using dxhyp.dat, replace numresponses(2) level2id(Patient_Key) cons(cons)
In REALCOM, I left the model specification, MCMC estimation settings, values in Impute procedure at their default values.
A few specific questions:
• Does this approach seem OK in general for the type of data I have? Or are there modifications I should make, for example to what some of the default values are?
• In an imputation model for longitudinal data, does one typically include the value of the variable with missing values at the prior time point as an explanatory variable in the model?
• I’m not sure I understand how REALCOM comes up with the default model specification. Are there situations where that should be changed?
Any comments would be much appreciated!!
Thanks,
Caroline
longitudinal dataset

 Posts: 1103
 Joined: Mon Oct 19, 2009 10:34 am
Re: longitudinal dataset
As your questions are fairly general, rather than being specifically related to Realcom, you might also want to ask them on the missing data discussion group (https://groups.google.com/forum/#!forum/missingdata) if you don't get any answers here.
Re: longitudinal dataset
Thanks for the advice, I will try that.

 Posts: 7
 Joined: Tue Oct 29, 2013 12:14 pm
Re: longitudinal dataset
Hi
Can we extend Realcom Impute (through stata realcomImpute commands) to three level data structure. For example in case of onestage meta analysis of longitudinal studies where repeated measurements (level 1) nested within individual (level 2), which are further nested in studies (level 3).
Regards
Sandhu
Can we extend Realcom Impute (through stata realcomImpute commands) to three level data structure. For example in case of onestage meta analysis of longitudinal studies where repeated measurements (level 1) nested within individual (level 2), which are further nested in studies (level 3).
Regards
Sandhu
Re: longitudinal dataset
Hi Sandhu,
So that's my attempt at an answer to your question. I have a linked question to ask:
In my case, I have around 400 unique level 3 units  is it possible that a Realcom model could handle this number of additional dummy variables?
Harvey Goldstein writes  http://www.bristol.ac.uk/medialibrary/ ... tation.pdf (p.1) thatdrsatpalsandhu wrote:Hi
Can we extend Realcom Impute (through stata realcomImpute commands) to three level data structure. For example in case of onestage meta analysis of longitudinal studies where repeated measurements (level 1) nested within individual (level 2), which are further nested in studies (level 3).
Regards
Sandhu
Assuming that it still stands that this is still the only way to specify a third level (which I think it is), then it seems like it would be possible. Chris/Harvey please correct me if I'm wrong, but I think you do this in practical terms by adding n dummary variables for each study. This seems like it would be fine if you have a limited number of studies.Currently, only 2level hierarchical data can be handled, although in some cases it will
possible to substitute fixed for random effects. Thus, in a 3level structure a fixed (dummy)
variable for each level 3 unit could be used and a similar procedure for a cross classified
model.
So that's my attempt at an answer to your question. I have a linked question to ask:
In my case, I have around 400 unique level 3 units  is it possible that a Realcom model could handle this number of additional dummy variables?

 Posts: 1103
 Joined: Mon Oct 19, 2009 10:34 am
Re: longitudinal dataset
There are no hardcoded limits on the number of variables that you can specify in Realcom, however depending on your data size you may find that you come across memory size or other computation problems (plus the models may run very slowly). I would suggest that you just try the model and see how far you get.