MCMC is not taking the same starting values but for different datasets

Welcome to the forum for R2MLwiN users. Feel free to post your question about R2MLwiN here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to R2MLwiN: Running MLwiN from within R >> http://www.bris.ac.uk/cmm/software/r2mlwin/
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

** I started a new thread instead **

I am following up on this thread, but the question is merely related.

I am using starting values and passing both FP.b and RP.b to the model as you suggested. R2mlwing manages to run this model if I use a small random sample of the original data (~ 1000 cases), but not when I want to estimate the model of the full sample (~ 1.9 million cases).


I am getting this error code:

> m1test2 <- runMLwiN(logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 |CountryClusterHouse), D = "Binomial", estoptions = list(EstM = 1, resi.store=F,
+ debugmode=F, optimat=T,
+ mcmcMeth=list(iterations=10, burnin=10),
+ mcmcOptions=list(hcen=3),
+ startval=list(FP.b = PLOSONEestimations20000IGLS[[1]]@FP , RP.b = PLOSONEestimations20000IGLS[[1]]@RP)), data = test4a, workdir = tempdir(), MLwiNPath="C:/Program Files (x86)/MLwiN v2.35/")


MLwiN is running, please wait......
/nogui option ignored
ECHO 0


Echoing is ON
STAR
iteration 0

Convergence not achieved
JOIN -0.491145551204681 '_FP_b'
JOIN 0 '_FP_v'
JOIN 1.10714721679688 1.04919278621674 1 '_RP_b'
JOIN 0 0 0 0 0 0 '_RP_v'
ECHO 0

Echoing is ON
MCMC 0 10 1 5.8 50 10 G30[1] G30[2] 2 2 2 1 1 2

error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d002647c62.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2

wrong length random constraint matrix


error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d002647c62.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2


wrong length random constraint matrix
.
Execution completed

Error in read.dta(MCMCfile) :
unable to open file: 'No such file or directory'

process.png
process.png (40.66 KiB) Viewed 19212 times
Do you have any idea what is going on? I have tried to pass only the relevant variables, in case we are having a ram problem (my PC has 64 GB of ram), but that does not work either. I am clueless about what to try next.

I am hoping you might have any ideas about what to do next.


Many thanks in advance


PS I am not sure if it is related. But I am also observing a wobbling RAM consumption behaviour when I run this model (see picture please). I have not seen anything like it before. Can these two events be related?




AN UPDATE:

1. I ran a new IGLS with the a dataset to obtain a new set of starting values. I then used those to initiate an MCMC model. But I am still getting the same error. Namely:

MLwiN is running, please wait......
/nogui option ignored
ECHO 0


Echoing is ON
STAR
iteration 0

Convergence not achieved
JOIN -0.49 '_FP_b'
JOIN 0 '_FP_v'
JOIN 1.11 1.05 1 '_RP_b'
JOIN 0 0 0 0 0 0 '_RP_v'
ECHO 0

Echoing is ON
MCMC 0 10 1 5.8 50 10 G30[1] G30[2] 2 2 2 1 1 2

error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d00c492552.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2

wrong length random constraint matrix


error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d00c492552.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2


wrong length random constraint matrix
.
Execution completed

Error in read.dta(MCMCfile) :
unable to open file: 'No such file or directory'


2. I rounded the starting values before passing them to runMlwin – incase that was the issue. But the problem persists.
ChrisCharlton
Posts: 1390
Joined: Mon Oct 19, 2009 10:34 am

Re: MCMC is not taking the same starting values but for different datasets

Post by ChrisCharlton »

Could you try turning on debugmode before running the model, and then once you get to the error open a command and output window (Data Manipulation->Command Interface, and click "output"? Once these have opened issue the following command:

Code: Select all

SETT
which should give you output similar to:

Code: Select all

->SETT
EXPLanatory variables in       bcons.1  cons     age      
FPARameters                             cons     age      
RESPonse variable in           use      
FSDErrors : uncorrected                 RSDErrors : uncorrected
MAXIterations  20   TOLErance     2     METHod is IGLS    BATCh is OFF
RCONstraints in c1494                   
IDENtifying codes : 1-woman, 2-district

LEVEL 2 RPM
         cons     cons     1        
LEVEL 1 RPM(RESETTING OFF)
         bcons.1  bcons.1  1        
After doing so could you look in the column referred to by RCONstraints in ... and let me know what the contents look like?
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

Re: MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

Chris,

Thanks for the support. Here come some screenshots:

1. Some curious thing here. Why is the method set to IGLS when I requested a MCMC model? Also, the model outputs “convergence not achieved” before I clicked “Resume Macro”: this text was there when I opened up the Output window.
Skärmklipp 2015-10-15 19.48.42.png
Skärmklipp 2015-10-15 19.48.42.png (103.2 KiB) Viewed 19202 times
2. C2499 looks ok, but _Stats has missing values.
Skärmklipp 2015-10-15 19.53.15.png
Skärmklipp 2015-10-15 19.53.15.png (91.65 KiB) Viewed 19202 times
3. Not sure why c2498 and c2499 are of different lengths (567411 vs 56745). I guess they are referring to my second (household) level? I am running a three level model: kids, in households, in countries.
Skärmklipp 2015-10-15 19.56.25.png
Skärmklipp 2015-10-15 19.56.25.png (96.79 KiB) Viewed 19202 times
Please, let me know if you need more information
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

Re: MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

I re-ran with the data which is supposed to work in debugmode out of curiosity. It seems to fail in debugmode but not in non-debugmode.

This is the model output in non-debugmode:

Dbar D(thetabar) pD DIC
1289.819 1236.107 53.713 1343.532
---------------------------------------------------------------------------------------------------
The model formula:
logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 | CountryClusterHouse)
Level 3: country Level 2: CountryClusterHouse Level 1: l1id
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Cred. Interval] ESS
Intercept -0.34126 0.01404 -23.05 1.354e-117 *** -0.36532 -0.32483 10
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the country level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.16383 0.00799 0.15207 0.17628 10
---------------------------------------------------------------------------------------------------
The random part estimates at the CountryClusterHouse level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.22550 0.01436 0.20378 0.24898 3
---------------------------------------------------------------------------------------------------
The random part estimates at the l1id level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_bcons_1 1.00000 0.00000 0.99845 1.00000 10
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Please, see attached document for the relevant screenshots.
This happens in debugmode.docx
(166.89 KiB) Downloaded 997 times
ChrisCharlton
Posts: 1390
Joined: Mon Oct 19, 2009 10:34 am

Re: MCMC is not taking the same starting values but for different datasets

Post by ChrisCharlton »

This looks like a different error to that you reported before. Could you check that the 'mcmchains' column contains the same number of rows as the 'parnum' and 'itnum' columns? Could you also run the command:

Code: Select all

PRINT b1000
from within MLwiN when you get the error and let me know what output it gives?
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

Re: MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

This looks like a different error to that you reported before. Could you check that the 'mcmchains' column contains the same number of rows as the 'parnum' and 'itnum' columns?

I assume that you are referring to the second. Mcmchains has 440 columns whereas parnum and itnum has only 40 each.

Could you also run the command:

This is the output:
->PRINT b1000


B1000
4.0000


Could it be that I am only initiating 10 burn-in and 10 iterations respectively that causes this second error? Just thinking.
ChrisCharlton
Posts: 1390
Joined: Mon Oct 19, 2009 10:34 am

Re: MCMC is not taking the same starting values but for different datasets

Post by ChrisCharlton »

Thanks for looking at this. The mcmcchains column should only be 40 rows as well (as it's the stacked chains of 10 iterations for 4 parameters). My guess would be that this difference is due to the refresh MCMC option, which is set to 50 iterations by default. This is only used when debugmode is turned on, and updates the interface every refresh iterations. As in your case this number of iterations is higher than the total number requested it is likely that the calculation of how to split them up is going wrong and it is performing more iterations than expected. Could you try setting refresh to ten and seeing whether you still get the same behaviour?
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

Re: MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

Thanks Chris. I will check the refresh option asap for the second issue (the one where runmlwin manages to estimate in non-debugmode but not in debugmode).

Do you have any input on the first issue, which is the more pressing one? I would be happy to share the data with you if that would make troubleshooting easier for us? Please, let me know and I can email a Dropbox link.

Cheers
Adel
ChrisCharlton
Posts: 1390
Joined: Mon Oct 19, 2009 10:34 am

Re: MCMC is not taking the same starting values but for different datasets

Post by ChrisCharlton »

Regarding the first problem - it looks as if the column in question (c2499) has been allocated for both the starting residuals (as it appears in the MCMC 0 command) and as the IGLS random constraint column. This has resulted in the two sets of values being appended (the first 4 values are the constraints, and the following rows are the residuals). I remember fixing an issue similar to this recently, are you using the development version of R2MLwiN? You might see the method set to IGLS initially when you request an MCMC model as the model is set up and run for some iterations with IGLS, and the starting residuals are generated using the IGLS model where appropriate.
adeldaoud
Posts: 63
Joined: Sat Aug 15, 2015 4:00 pm

Re: MCMC is not taking the same starting values but for different datasets

Post by adeldaoud »

Thanks for looking at this. The mcmcchains column should only be 40 rows as well (as it's the stacked chains of 10 iterations for 4 parameters). My guess would be that this difference is due to the refresh MCMC option, which is set to 50 iterations by default. This is only used when debugmode is turned on, and updates the interface every refresh iterations. As in your case this number of iterations is higher than the total number requested it is likely that the calculation of how to split them up is going wrong and it is performing more iterations than expected. Could you try setting refresh to ten and seeing whether you still get the same behaviour?
1. I changed the number of iterations and burn-in to 50 and the problem disappeared.

2. Change the refresh option to 10 also works. Like this:

(logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 |CountryClusterHouse), D = "Binomial", estoptions = list(EstM = 1, resi.store=F,
debugmode=T, optimat=T,
mcmcMeth=list(iterations=10, burnin=10, refresh = 10),
mcmcOptions=list(hcen=3),
startval=list(FP.b = round(m1test3IGLS@FP, 2), RP.b = round(m1test3IGLS@RP, 2))), data = dfsm, workdir = tempdir(), MLwiNPath="C:/Program Files (x86)/MLwiN v2.35/")
modelsstart <- PLOSONEestim


3. Changing the refresh option merely within Mlwin does not help.

Regarding the first problem - it looks as if the column in question (c2499) has been allocated for both the starting residuals (as it appears in the MCMC 0 command) and as the IGLS random constraint column. This has resulted in the two sets of values being appended (the first 4 values are the constraints, and the following rows are the residuals). I remember fixing an issue similar to this recently, are you using the development version of R2MLwiN? You might see the method set to IGLS initially when you request an MCMC model as the model is set up and run for some iterations with IGLS, and the starting residuals are generated using the IGLS model where appropriate.
1) I am re-running the model in the development version currently. I will come back as soon as I have some new results.

2) For my own information, do you have an explanation to why the model takes starting values from some datasets but not other datasets? This seems to be purely data driven and maybe depending on the size of the data.
Post Reply