Page 1 of 1

How to run multilevel models with missing data using Stat-JR and MLwiN?

Posted: Wed Dec 27, 2023 6:53 am
by gromatics
Hi, I'm a Stat-JR user who wants to run multilevel models with missing data using MLwiN. I have a three-level dataset with students nested within schools nested within countries, and I have some missing values in the outcome and predictor variables. I want to use multiple imputation to handle the missing data and then fit a random intercept model with MLwiN. I have read the Stat-JR documentation and the multiple imputation template, but I'm still confused about how to do this. Can anyone help me with the following questions?

• How do I specify the imputation model and the analysis model in Stat-JR? Do I need to use the same variables and levels for both models?

• How do I choose the number of imputations and iterations for the imputation process? What are the criteria or rules of thumb for this?

• How do I export the imputed datasets from Stat-JR to MLwiN? Do I need to use the realcomimpute command or the mi: prefix in MLwiN?

• How do I combine the results from the imputed datasets using Rubin's rules? Can Stat-JR or MLwiN do this automatically or do I need to do it manually?

I would appreciate any guidance or advice on how to run multilevel models with missing data using Stat-JR and MLwiN. Thank you in advance.

Re: How to run multilevel models with missing data using Stat-JR and MLwiN?

Posted: Wed Jan 03, 2024 4:11 pm
by richardparker
Hi - you may find the document "Imputation for Multilevel Models with Missing Data Using Stat-JR" helpful: https://www.bristol.ac.uk/cmm/media/sof ... statjr.pdf

For a 3-level model, you will need to use the Stat-JR template NLevelImpute. This has not been as widely tested as the 2-level-only version (2LevelImpute), but the inputs are very similar to those for 2LevelImpute (which are in turn described in more detail in the document linked to above). You may also be interested to know that Blimp (https://www.appliedmissingdata.com/blimp) offers imputation for 3-level models, but using a fully conditional specification (as opposed to joint modelling, as in Stat-JR).

"How do I specify the imputation model and the analysis model in Stat-JR? Do I need to use the same variables and levels for both models?"
As the linked document above indicates, the template questions ask you to specify both your analysis model (model of interest; MOI) and your imputation model. Since the assumptions of the two models must not conflict, then it is strongly advisable to include the same levels for both models. For the same reasons, it is advised for the imputation model to contain all the variables in the analysis model, but it can contain additional (auxiliary) variables too (e.g. that predict missingness / underlying missing values).

"How do I choose the number of imputations and iterations for the imputation process? What are the criteria or rules of thumb for this?"
With regard to choosing the number of iterations, note that Stat-JR generates imputed datasets from Markov chain Monte Carlo (MCMC) methods, which sample from the posterior distribution. MCMC chains typically don't immediately sample from the posterior distibution, however, and this initial section of the MCMC chain is often discarded as a 'warm-up' period (the user determines how long to make the 'warm-up', e.g. by inspecting relevant diagnostics). Once the number of iterations prior to the first imputation has been determined, there is also the question of how many chain iterations to leave between subsequent imputed datasets. This will depend on how autocorrelated the chains are (i.e. higher autocorrelation implies more iterations need to occur before an imputation drawn from the chain can be made which is effectively independent from the last drawn imputation). There are a number of diagnostic plots provided by Stat-JR (e.g. those outputs with an .svg extension, as described in "What is returned in the results pane?" in the document linked to above). With regard to the number of imputations, this is a more general multiple imputation question, for which there is quite a lot of advice published elsewhere.

"How do I export the imputed datasets from Stat-JR to MLwiN? Do I need to use the realcomimpute command or the mi: prefix in MLwiN?"
In terms of getting the imputed data out of Stat-JR - these should be included in the big zip file you get if you click the "Download" button after running NLevelImpute. Alternatively, they'll all end up in the general dataset list so you can switch to each dataset via the Dataset > Choose menu and then download it with Dataset > Download, or if you want to look at it first Dataset > View > Download.

"How do I combine the results from the imputed datasets using Rubin's rules? Can Stat-JR or MLwiN do this automatically or do I need to do it manually?"
Stat-JR does combine the results from the imputed datasets using Rubin's rules (see the document linked to above), or you can apply Rubin's rules yourself (via your software of choice) if you export the imputed datasets. (NB if you want to analyse the models in MLwiN after imputation then you would have to run a model on each imputed dataset individually, in MLwiN, and then apply Rubin's rules yourself (as the multiple imputation functionality in MLwiN only supports imputations in the format created by Realcom-Impute)).

Re: How to run multilevel models with missing data using Stat-JR and MLwiN?

Posted: Thu Jan 04, 2024 2:07 am
by gromatics

Re: How to run multilevel models with missing data using Stat-JR and MLwiN?

Posted: Wed Feb 28, 2024 2:48 am
by tonyadams
To run multilevel models with missing data in Stat-JR and MLwiN, you can use Stat-JR for imputation,Buckshot Roulette specifying relevant predictors in the imputation model and defining your multilevel analysis model separately.

Re: How to run multilevel models with missing data using Stat-JR and MLwiN?

Posted: Wed Mar 27, 2024 8:38 am
by Manisoa12
you can run multilevel models with missing data by first creating your multilevel analysis model and then using Stat-JR for imputation to identify important predictors.
run 3