Random intercept vs. random slopes in 2LevelImpute

shakespeare · Post by **shakespeare** » Thu Oct 02, 2014 4:40 pm

I'm testing the following MOI:

Mortality (died) predicted by sex, temperature (tcodec), and multiple births (Y/N) (bncat) (plus a constant). My clustering variable is hospital and I want a random intercept model.

For the imputation model I want to use sex (no missing values) (plus a constant) to predict temperature and multiple births (both of these variables have missing data).

Here's the input string I generated:

Current input string: {'want2': 'No', 'bin1_2': 'Binary', 'bin1_1': 'Normal', 'link': 'logit', 'iterations': '100', 'MOIlevel': '2 levels', 'numlevs': 'Yes', 'MOIslope': 'No', 'condmarg': 'Yes', 'L2ID': 'site2', 'burnin': '100', 'yimp1': 'tcodec,bncat', 'numimpute': '2', 'imputeevery': '100', 'ximp1_1': 'sexrec:cat,cons', 'ximp1_0': 'sexrec:cat,cons', 'MOIdist': 'Binary', 'y': 'died', 'x': 'sexrec:cat,tcodec,bncat:cat,cons'}

I'd like to verify that this is in fact fitting a random intercept model.

Next, I'd like to set the slope for temperature to random. The documentation has a couple of cryptic comments about how to do so. One says to include responses (not sure what responses are being referred) in the MOI and variables whose coefficients that are to vary at level 2 as responses in the imputation model. It also seems to suggest that level 2 responses should not have explanatory variables specified at level 1. If I, say, I want to fit random slopes with temperatures varying, and reset the temperature variable as a level 2 response in the imputation model, I receive the following error message:

ERROR: tcodec is not a level 2 variable. Please remove and select and appropriate level 2 variable.

So, I'm doing something wrong. I'd appreciate guidance on how to set up the random slope model.

shakespeare · Post by **shakespeare** » Thu Oct 02, 2014 6:50 pm

I should have posted my random slope attempt:

Current input string: {'want2': 'Yes', 'iterations': '100', 'bin1_1': 'Binary', 'link': 'logit', 'ximp2_0': 'cons', 'x': 'sexrec:cat,tcodec,bncat:cat,cons', 'x2': 'tcodec', 'yimp2': 'tcodec', 'numlevs': 'Yes', 'MOIslope': 'Yes', 'condmarg': 'Yes', 'L2ID': 'site2', 'burnin': '100', 'yimp1': 'bncat', 'priors': 'Uniform', 'numimpute': '2', 'imputeevery': '100', 'ximp1_0': 'sexrec:cat,cons', 'MOIdist': 'Binary', 'y': 'died', 'MOIlevel': '2 levels', 'bin2_1': 'Normal'}

This produces the message ERROR: tcodec is not a level 2 variable. Please remove and select and appropriate level 2 variable even though I've selected tcodec as a response at level2 in the imputation model.

richardparker · Post by **richardparker** » Fri Oct 03, 2014 4:41 pm

Hi - your input string for your random intercepts model seems fine.

With regard to your specification for a random slopes model, it is returning an error because your 'tcodec' variable varies within-hospital (i.e. a given hospital does not have the same value for 'tcodec' for each row of the dataset). It's only appropriate to answer 'Yes' to the question 'Are there any responses at level 2 for imputation model' when you have a level 2 variable you wish to include as a response in the imputation model (i.e. a variable which doesn't vary within-hospital).

The documentation in the eBook advises that if your MOI has a random slope, then include the response in the MOI (in this example 'died') as a response in the imputation model, and also the variable whose coefficient you wish to randomly-vary at level 2 in the MOI (in this case 'tcodec') as a response in the imputation model as well (you're doing the latter anyway - although currently specifying it at the wrong level, but not the former).

Also, just looking at your input string for the random slopes model, you need to specify 'cons', as well as 'tcodec', in response to the question 'In your MOI, which explanatory variables are random at level 2?' if you wish to allow the intercept, as well as the coefficient for 'tcodec', to randomly-vary across level 2. As discussed in Module 5 of the free LEMMA online course (e.g. see p22), it's difficult to think of an application in which one would wish just the slope to randomly-vary.

Hope that helps?

Best wishes,

Richard

shakespeare · Post by **shakespeare** » Tue Oct 07, 2014 10:07 pm

Ok. I suspected I was missing a constant, I just didn't know where. This is quite helpful.

I thought I had a better grasp on imputation theory, but it's a bit more complicated in the multilevel case, so a review of the LEMMA course is probably in order. I think writing out the equations is helpful. I'm attaching a document that shows a setup for my model with the random intercept and the random slope for tcodec. I show variables died and bncat as they are, but these are actually binary, so it's the probabilities, and not the variables themselves that are being modeled. I'd appreciate feedback on this iteration.

TREE setup for random slopes model.doc: (33 KiB) Downloaded 2341 times

shakespeare · Post by **shakespeare** » Wed Oct 08, 2014 12:15 am

My imputation equations were wrong. I think this version is correct:

TREE setup for random slopes model.doc: (33.5 KiB) Downloaded 2500 times

richardparker · Post by **richardparker** » Fri Oct 17, 2014 4:00 pm

Hi - seems fine, although you may want to investigate different estimation settings, such as longer intervals between iterations, etc. (e.g. section 3 of Carpenter et al's (2011) REALCOM article may offer some useful guidance http://www.jstatsoft.org/v45/i05).

shakespeare · Post by **shakespeare** » Fri Oct 24, 2014 3:31 pm

Thanks for the feedback. I'm familiar with the article. I went ahead and set up my entire data set with a complex model utilizing random slopes for temperature and birth weight based on that template. I had pretty good results with a random intercept model using 1000 iterations between imputations in REALCOM. I'll look at my results when things finish and make adjustments. It's been running about 10 days on an extremely fast computer, so I'm thinking that practicality might call for a more simple model. Anyway, will let you know if I have questions. Thanks again for your help.

shakespeare · Post by **shakespeare** » Thu Oct 30, 2014 5:57 pm

This finally finished after 11 days. This is the model I ran:

Current input string: {'want2': 'No', 'bin1_5': 'Unordered', 'bin1_3': 'Normal', 'bin1_2': 'Normal', 'bin1_1': 'Normal', 'bin1_6': 'Binary', 'iterations': '2000', 'bin1_4': 'Binary', 'numlevs': 'Yes', 'MOIslope': 'Yes', 'condmarg': 'Yes', 'ximp1_4_2': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_4_0': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_4_1': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'burnin': '2000', 'priors': 'Uniform', 'ximp1_1': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_0': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_3': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_2': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_5': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ncatsracerec': '4', 'link': 'logit', 'MOIlevel': '2 levels', 'x2': 'tcodec,tcode2c,bwc,cons', 'L2ID': 'site2', 'yimp1': 'tcodec,tcode2c,bwc,bncat,racerec,died', 'numimpute': '10', 'imputeevery': '1000', 'MOIdist': 'Binary', 'y': 'died', 'x': 'sexrec:cat,racerec:cat,tcodec,tcode2c,bwc:cat,bncat:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons'}

Site2 is my clustering variable and tcodec, tcode2c and bwc are random slopes. Not saying this is a valid MOI or imputation model, it's just something I'm experimenting with. In fact, it took so long as to be impractical, so I'll likely simplify it in the next round. Nonetheless, it generated a lot output that I'm having trouble deciphering-the documentation was a little sparse on the topic. I've attached a screen cap of the files created. It's a lot more than REALCOM. There are several more files after the bottom file, impute_datafile_chain1_iter0.dta ending with impute_datafile_chain1_iter9.dta. I presume there's some way of obtaining the necessary diagnostic plots for the chains. I'd be interested in knowing about that as well as what is contained in each file. Thx.

richardparker · Post by **richardparker** » Mon Nov 03, 2014 5:31 pm

Hi,

Sorry that the supporting documentation is a little scant on detail concerning outputs. Stat-JR doesn't currently hide any of the outputted files, so they're all available to the user to inspect - good in terms of transparency, but does mean that not all outputted files will be relevant to each user. Hope the following (pertaining to the outputs returned in the output pane / drop-down list in TREE) helps. (Also, just wondering how you downloaded the datasets which appear in your screenshot? Did you download them individually?)

script.py
- this is the internal script, written in Python, which runs the execution you have requested

inputs
- a list of inputs for internal purposes.

Imputation_Model_equation.tex
- currently not implemented for this template, but in some other templates this returns a LaTeX rendering of the model equation.

Imputation_Model_algorithm.tex
- since this template executes via custom C code, this isn't terribly informative here, but in some other templates it returns the algebra for the conditional posterior distributions.

*.cpp
- C++ code fragments used to fit the model

Imputation_Model_impute_datafile_chainA_iterB
- imputed level 1 datasets. As discussed in the eBook, the nomenclature here (i.e. for 'A' and 'B') will depend on both the inputs you have chosen, and the number of processors on your machine.

Imputation_Model_impute__L2Data_chainA_iterB
- as above, but at level 2.

Imputation_Model_Chains
- an object used for internal purposes, doesn't actually render anything if selected in the output pane

Imputation_Model_out
- chain dataset for the imputation model, length of which will depend on the estimation options chosen.

Imputation_Model_ModelParameters
Imputation_Model_ModelResults
- for this template these outputs return the same information (because there's no DIC)

Model1_impute_datafile_chain3_iter1000
Model2_impute_datafile_chain3_iter1000
Model3_impute_datafile_chain3_iter1000, etc.
- these are erroneous outputs; the bug producing them will be fixed in the next release of 2LevelImpute.

CombinedResults
- chains for each imputation model, we 'pretend' that each MOI from each imputed dataset is a chain from a multiple chain model, which allows us to combine them to derive diagnostic graphs.

ResultsTable:Imputation
- estimates generated using Rubin's rules

Everything with prefix CompleteCasesModel refers to the complete cases model which is simply ran for comparison. Most of these files won't be of much interest, although estimates from the complete cases model can be found in CompleteCasesModel_ModelParameters, CompleteCasesModel_ModelFit and CompleteCasesModel_ModelResults.

*.svg - MCMC diagnostic plots. The top-left graph shows the values plotted against iteration number, and is useful to confirm that the chain is mixing well, meaning that it visits most of the posterior distribution in few iterations. The top-right graph contains a kernel density plot representing the posterior distribution for this parameter. The two graphs in the middle row are time series plots known as the autocorrelation (ACF) and partial autocorrelation (PACF) functions. The ACF indicates the level of correlation within the chain; this is calculated by moving the chain by a number of iterations (called the lag) and looking at the correlation between this shifted chain and the original. The PACF picks up the degree of auto-regression in the chain. By definition a Markov chain should act like an autoregressive process of order 1, as the Markov definition is that the future state of the chain is independent of all the past states of the chain given the current value. If, for example, in reality the chain had additional dependence on the past 2 values, then we would see a significant PACF at lag 2. The bottom-left plot is the estimated Monte Carlo standard error (MCSE) plot for the posterior estimate of the mean. As MCMC is a simulation-based approach this induces (Monte Carlo) uncertainty due to the random numbers it uses. This uncertainty reduces with more iterations, and is measured by the MCSE, and so this graph details how long the chain needs to be run to achieve a specific MCSE. The sixth (bottom-right) plot is a multiple chains diagnostic: a Brooks-Gelman-Rubin diagnostic plot (BGRD; Brooks and Gelman, 1998). This plot looks at mixing across the chains: the green and blue lines measure variability between and within the chains, and the red is their ratio. For good convergence the red line should be close to 1.0.

Best wishes,

Richard

shakespeare · Post by **shakespeare** » Wed Nov 05, 2014 1:48 pm

Thanks for your response. To obtain the files I followed the template instructions by going to Dataset>Choose, selecting the file, and downloading via Dataset>Download. These files appeared in a list that included the example files that shipped with StatJR. I downloaded everything that looked like it was generated by the template, i.e., not an example file. The thing that's troubling is that many of the files that you list in your response did not seem to be present, e.g., the level 2 imputations, .svg files. Plus there were some extra files that I obtained that I'm still not clear what imp_1_0-imp_10_0 are.

Since I'm not getting everything I should, what do you think should be my next step to diagnose the problem? I tried to simplify this and run a random intercept model:

Current input string: {'want2': 'No', 'bin1_5': 'Binary', 'mv': 'Yes', 'ximp1_0_2': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_0_0': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_0_1': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'bin1_3': 'Normal', 'bin1_2': 'Normal', 'bin1_1': 'Unordered', 'iterations': '5000', 'bin1_4': 'Normal', 'ncatsracerec': '4', 'numlevs': 'Yes', 'condmarg': 'Yes', 'L2ID': 'site2', 'x1': 'sexrec:cat,racerec:cat,tcodec,tcode2c,bwc,bncat:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'burnin': '5000', 'yimp1': 'racerec,tcodec,tcode2c,bwc,bncat', 'numimpute': '10', 'imputeevery': '5000', 'ximp1_1': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_3': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_2': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'ximp1_4': 'sexrec:cat,quarterrec:cat,gestagecat:cat,steroids:cat,dischyearrec:cat,cons', 'MOIdist': 'Multivariate Normal', 'y': 'ivh', 'MOIlevel': '2 levels'}

It ran about 20 hours and crashed:

Traceback (most recent call last):
File "webtest.py", line 433, in go
File "C:\StatJR\packages\Python_script.py", line 71, in run
self.eng.run('script.py')
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\lib\EStat\engines\Engine.py", line 44, in decorated
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\lib\EStat\engines\Python.py", line 16, in run
File "", line 248, in
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\apps\cmdtest\cmdtest.py", line 127, in RunStatJR
File "C:\StatJR\packages\eStat.py", line 889, in init
elif self.eng.outputs[p + '.xml'].deterministic:
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\lib\EStat\engines\Engine.py", line 23, in __getitem__
KeyError: 'tau_u.xml'

Not sure if I made an error in my setup, or if there's a problem with the program here.

One thing I'm wondering is if it's possible to obtain single files containing the imputed values as in Realcom, or if would one have to combine the output to obtain these. I'm thinking it would be nice to have the actual data in order to compare the distributions of the data before and after imputation.

Anyway, I'd be interested in knowing what I could do next to provide you information to diagnose the problems I'm having.

Thanks.

www.cmm.bristol.ac.uk/forum

Random intercept vs. random slopes in 2LevelImpute

Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute

Re: Random intercept vs. random slopes in 2LevelImpute