Slow with imputation model

shanekav · Post by **shanekav** » Sun Nov 23, 2014 3:57 am

Hi,

I have set up a model in 2level impute, but it is really slow. I even set it up as 2 imputed datasets, 10 iterations between imputations, 10 burn in and 20 iterations of MOI just to get a test response and it is still going after 27 hours. It seems that this approach may be unworkable if I use suggested MCMC settings. Do you have any suggestions?

I am running a 2 level logistic model with 111,000 individuals nested in 50 states. I have 2 continuous and 4 categorical independent variables at level 1 and 3 continuous variables at level 2. One of the level 1 continuous variables has 10% missing data. I want to show that this has minimal effect on the outcome so that I can justify a full case analysis.

Thanks in advance

Shane

richardparker · Post by **richardparker** » Mon Dec 01, 2014 4:12 pm

Hi Shane,

Just seen your message; did the model run finish?

Best wishes,

Richard

shanekav · Post by **shanekav** » Thu Dec 04, 2014 1:25 pm

Hi Richard,

Yes, it did at about the 27 hour mark. I think part of the issue may be that I am running Statjr via Virtual Box on a MacBook Air. It seems that it may not be able to access multiple processors via this set up. Not sure what to do. I can try and get access to a faster PC, but not sure how much this will help given that I have a complex mode.

Do you have any suggestions? Is there a way of simplifying the model without compromising accuracy too much? I just want to show that the missing data is likely to be inconsequential so that I can justify a full case approach.

On another issue, when the model finished I think it allows you to save as an ebook. A circulating dial opened up, but it just kept going and going. Does it usually take a long time for this part of the process?

many thanks for your time

Shane

richardparker · Post by **richardparker** » Fri Dec 05, 2014 5:38 pm

Hi - thanks for letting us know.

Your dataset is quite large (111,000 nested within 50 states), and if you're running it virtually that may indeed add time (one can install Windows on Macs (via Bootcamp) these days, but obviously that requires the relevant licences). However, the Settings in Virtual Box should allow you to select the number of processors you would like to use (from those available), which might help?

Note that, although you have put a low number for the burn-in and number of iterations for the MOI, it is a relatively trivial computational overhead anyway, as the MOI is fitted at the end of the whole process on the imputed datasets. Fitting the imputation model, prior to this, takes the majority of computational time, and the MCMC settings for this are determined automatically (bar the number of iterations between imputations, but note that the setting you've chosen here is very low and thus imputations (only 10 iterations apart) are more likely to be auto-correlated than if the interval were considerably larger; note also that adapting for 5000 iterations and a burn-in of 1000 precedes this, so increasing the number of iterations between imputations won't increase the proportion of computation time as much as one might think). If you have unordered categorical variables as responses in your imputation model, then the template will fit (more-or-less) one latent normal response for each category (which again can take time if you have a large number of categories... so one speed-up would be to amalgamate categories, but obviously only if it made sense to do so).

If you save it as an eBook, and then open it up in Stat-JR's eBook-reading interface (DEEP) it will starting running the execution (i.e. all 27 hours of it!) again, which I'm guessing you saw the circulating dial (in DEEP?) In a future release we plan to ensure eBooks can be uploaded with 'pre-run' executions (e.g. with the outputted objects from those executions embedded within the eBook).

Best wishes,

Richard

shanekav · Post by **shanekav** » Sat Dec 06, 2014 12:31 am

Hi Richard,

Many thanks for your detailed response to my questions. I will have a look at my options both computationally and as per specifying the model and let you know how I go.

Shane

shanekav · Post by **shanekav** » Wed Dec 17, 2014 9:34 am

Hi Richard,
So after 5 days it crashed with the following error:

File "webtest.py", line 433, in go
File "C:\StatJR\packages\Python_script.py", line 71, in run
self.eng.run('script.py')
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\lib\EStat\engines\Engine.py", line 44, in decorated
File "Q:\edcmjc\repo\StatJR\estat\trunk\src\lib\EStat\engines\Python.py", line 16, in run
File "", line 259, in
File "c:\Python27\lib\multiprocessing\pool.py", line 558, in get
WindowsError: [Error 2] The system cannot find the file specified

Could you please let me know your thoughts?

thanks

Shane

richardparker · Post by **richardparker** » Wed Dec 17, 2014 2:31 pm

Hi Shane - sorry it crashed. The issue may be occurring when it's fitting the MOI and there's a clash when its accessing multiple processors; it's a problem another user has experienced before -- an error which is inconsistent in its frequency, but one which is obviously more commonplace than we thought. In this instance your imputed datasets may have been returned prior to it crashing (so you may be able to access them), but we'll release a new version of the template with a fix for the bug in the next few days - we'll let you know.

richardparker · Post by **richardparker** » Fri Dec 19, 2014 2:35 pm

Hi Shane - I've attached a new version of the 2LevelImpute template with a fix for the bug which likely caused your crash (we'll replace the copy on the CMM website with this version too). It's zipped up, so just need to extract it into your StatJR/templates folder.

sabusera · Post by **sabusera** » Tue Jun 06, 2023 1:16 am

Hello,
I unzipped it, but when I go to the StatJR/templates folder, it's empty. I'm using Win 10. Is there any problem? Thank you!

ChrisCharlton · Post by **ChrisCharlton** » Tue Jun 27, 2023 9:23 am

There are a couple of possible StatJR template folders:

in the StatJR installation directory (for templates included with the software): C:\Program Files\StatJR\templates
In the user directory (for additional user-specific templates) (C:\Users\<your username>\.statjr\templates

The second of these will start off as empty as this is where user templates are expected to be placed. Can you confirm which you are looking at?

www.cmm.bristol.ac.uk/forum

Slow with imputation model

Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model

Re: Slow with imputation model