Errors fitting cross-classified logistic model by MCMC

Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to runmlwin: Running MLwiN from within Stata >> http://www.bristol.ac.uk/cmm/software/runmlwin/
Post Reply
rebvas86
Posts: 11
Joined: Tue Feb 07, 2012 10:32 am

Errors fitting cross-classified logistic model by MCMC

Post by rebvas86 »

Have been getting some errors when running MLwiN from STATA...One is the following
'->OBEY "C:\Users\rv1g09\AppData\Local\Temp\3\ST_04000007.tmp"
error while obeying batch file C:\Program Files (x86)\MLwiN v2.25\discrete\pre
at line number 51:
calc 'F~(H)' = expo('H')*( 1+expo('H') )^(-2)
1248 numeric errors in calculate command, first 20 affected entries listed.
Affected entries replaced with system missing.
WARNING: macro file has aborted, your data may be in a transformed state'
Not sure what this means.

The other is something to do with having a small work matrix. Tried changing the worksheet size through by simply adding mlwin_setting(size(10000)) at the end of my runmlwin command

Code: Select all

quietly runmlwin v1 cons, ///
    level3 (areaid:cons) ///
    level2 (intid:cons) ///
    level1 (serialno:) ///
    discrete(distribution(binomial) link(logit) denominator(cons)) ///
    mcmc(cc burnin(5000) chain(100000)) ////
    mlwin_setting(size(10000))
but I got the error `option mlwin_settings() not allowed` in STATA.

Any idea what I can do to get around these?
Thanks in advance for your help.
Rebecca
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: Errors fitting cross-classified logistic model by MCMC

Post by GeorgeLeckie »

Hi Rebecca,

In terms of your first query, the first error is an MLwiN error and probably relates to fitting your model by IGLS (as opposed to by MCMC). However, you must use MCMC to fit your model as it is a cross-classified logistic model. When you fit this model in IGLS, for example to get starting values for fitting the model subsequently in MCMC, MLwiN will naively fit a three-level logistic model as MLwiN can't fit cross-classified logistic models using IGLS. So the error is probably because you have attempted to force a three-level model on your data when it is not three-level.

A better way to obtain starting values is to fit a single-level logistic model using IGLS and save the fixed part parameter estimates in a matrix. Then tag two extra elements on to this matrix, one for each of the variance components you have in the cross-classified model. You will need to manually specify starting values for each of these variances. Then fit the cross-classified logistic model using MCMC, but use the initsb() option to refer to your matrix of starting values.

Here is a simpler example where I have fit a two-level logistic model using MCMC, but where I have manually supplied the starting values by setting the fixed part parameters to be equal to those obtained from a single-level model and where I have arbitrarily started the level-2 variance at a value of 2.

Code: Select all

* Load the data
use http://www.bristol.ac.uk/cmm/media/runmlwin/bang, clear

* Generate dummy variables for number of living children at home
generate lc1 = (lc==1)
generate lc2 = (lc==2)
generate lc3plus = (lc>=3)

* Fit single-level logistic model by IGLS (MQL1)
runmlwin use cons lc1 lc2 lc3plus age, ///
	level1(woman) ///
	discrete(distribution(binomial) link(logit) denominator(cons)) ///
	nopause

* Save parameter estimates as the vector a1
matrix a1 = e(b)

* List vector a1
matrix list a1

* Construct a vector of starting values a2 where we set the starting value for the level-2 variance to be 2
matrix a2 = (a1[1,1..5],2,a1[1,6])

* List vector a2
matrix list a2

* Fit the two-level logistic model by MCMC where we use as starting values the values in vector a2
runmlwin use cons lc1 lc2 lc3plus age, ///
	level2(district: cons) ///
	level1(woman) ///
	discrete(distribution(binomial) link(logit) denominator(cons)) ///
	mcmc(on) initsb(a2) ///
	nopause

The output associated with these commands is

Code: Select all

. * Load the data
. use http://www.bristol.ac.uk/cmm/media/runmlwin/bang, clear

. 
. * Generate dummy variables for number of living children at home
. generate lc1 = (lc==1)

. generate lc2 = (lc==2)

. generate lc3plus = (lc>=3)

. 
. * Fit single-level logistic model by IGLS (MQL1)
. runmlwin use cons lc1 lc2 lc3plus age, ///
>         level1(woman) ///
>         discrete(distribution(binomial) link(logit) denominator(cons)) ///
>         nopause
 
MLwiN 2.25 multilevel model                     Number of obs      =      2867
Binomial logit response model
Estimation algorithm: IGLS, MQL1

Run time (seconds)   =       2.06
Number of iterations =          4
------------------------------------------------------------------------------
         use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -1.255983   .0977567   -12.85   0.000    -1.447583   -1.064384
         lc1 |   .9913077   .1237629     8.01   0.000     .7487369    1.233879
         lc2 |   1.223558   .1348033     9.08   0.000     .9593484    1.487768
     lc3plus |   1.116551   .1382418     8.08   0.000      .845602      1.3875
         age |  -.0162877   .0060931    -2.67   0.008    -.0282299   -.0043456
------------------------------------------------------------------------------


. 
. * Save parameter estimates as the vector a1
. matrix a1 = e(b)

. 
. * List vector a1
. matrix list a1

a1[1,6]
             FP1:          FP1:          FP1:          FP1:          FP1:          RP1:
            cons           lc1           lc2       lc3plus           age  var(bcons_1)
y1    -1.2559832     .99130774     1.2235581     1.1165509    -.01628775             1

. 
. * Construct a vector of starting values a2 where we set the starting value for the level-2 variance to be 2
. matrix a2 = (a1[1,1..5],2,a1[1,6])

. 
. * List vector a2
. matrix list a2

a2[1,7]
           FP1:        FP1:        FP1:        FP1:        FP1:                        
          cons         lc1         lc2     lc3plus         age          c6          c7
y1  -1.2559832   .99130774   1.2235581   1.1165509  -.01628775           2           1

. 
. * Fit the two-level logistic model by MCMC where we use as starting values the values in vector a2
. runmlwin use cons lc1 lc2 lc3plus age, ///
>         level2(district: cons) ///
>         level1(woman) ///
>         discrete(distribution(binomial) link(logit) denominator(cons)) ///
>         mcmc(on) initsb(a2) ///
>         nopause
 
MLwiN 2.25 multilevel model                     Number of obs      =      2867
Binomial logit response model
Estimation algorithm: MCMC

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
       district |       60          3       47.8        173
-----------------------------------------------------------

Burnin                     =        500
Chain                      =       5000
Thinning                   =          1
Run time (seconds)         =       26.4
Deviance (dbar)            =    3543.96
Deviance (thetabar)        =    3497.09
Effective no. of pars (pd) =      46.87
Bayesian DIC               =    3590.83
------------------------------------------------------------------------------
         use |      Mean    Std. Dev.     ESS     P       [95% Cred. Interval]
-------------+----------------------------------------------------------------
        cons |  -1.456018     .13134       75   0.000    -1.714841    -1.20631
         lc1 |   1.052494   .1274839      185   0.000     .7980338    1.296817
         lc2 |   1.356008   .1391182      121   0.000      1.08062    1.611022
     lc3plus |   1.281909   .1464471       83   0.000     1.002622    1.579236
         age |  -.0192826   .0065537      155   0.001    -.0326349   -.0064667
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |     Mean   Std. Dev.   ESS     [95% Cred. Int]
-----------------------------+------------------------------------------------
Level 2: district            |
                   var(cons) |  .3294986   .092852    767   .1862269  .5398784
------------------------------------------------------------------------------



In terms of your second query about "option mlwin_settings()", the option is actually mlwinsettings, so you want to add something like the following to your command

Code: Select all

mlwinsettings(size(10000))
Best wishes

George
rebvas86
Posts: 11
Joined: Tue Feb 07, 2012 10:32 am

Re: Errors fitting cross-classified logistic model by MCMC

Post by rebvas86 »

Hi George,

Thanks for your detailed reply.

Re the second error...have tried using mlwinsettings(size10000) and am still getting an error (though a different one). In MLwiN I am getting an error box reading 'error while obeying batch file C:Users\rv1g09\AppData\Local\Temp\3\ST_03000007.tmp at line number 51: NEXT work matrix too small(v_b_l.build()).

Any suggestions?

Thanks in advance,
Rebecca
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: Errors fitting cross-classified logistic model by MCMC

Post by GeorgeLeckie »

Does increasing mlwinsettings(size(10000)) to some higher worksheet size not solve this problem?

For example you could try increasing 10000 to 100000 or higher

If this doesn't solve the problem then we may need to take a closer look

George
rebvas86
Posts: 11
Joined: Tue Feb 07, 2012 10:32 am

Re: Errors fitting cross-classified logistic model by MCMC

Post by rebvas86 »

Just tried running it with a size of 500,000 and I still get the same error...
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: Errors fitting cross-classified logistic model by MCMC

Post by GeorgeLeckie »

Hi Rebecca,

If you email me the dofile and dataset to replicate the error message I will take a look

George
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: Errors fitting cross-classified logistic model by MCMC

Post by GeorgeLeckie »

Hi Rebecca,

You are trying fit 100 two-way cross-classified variance components logistic models to 100 different binary responses variables using the following code

Code: Select all

forvalues i = 1/100 {

	runmlwin v`i' cons, ///
		level3(areaid: cons) ///
		level2(intid: cons) ///
		level1(serialno:) ///
		discrete(distribution(binomial) link(logit) denominator(cons) pql2) ///
		nopause mlwinsettings(size(100000)) 

	runmlwin v`i' cons, ///
		level3(areaid: cons) ///
		level2(intid: cons) ///
		level1(serialno:) ///
		discrete(distribution(binomial) link(logit) denominator(cons)) ///
		mcmc(cc burnin(5000) chain(100000)) initsprevious ///
		nopause mlwinsettings(size(100000)) 

	estimates store model`i'

}
These models are sufficiently simple that I would actually just provide the starting values for the MCMC chains manually, rather than first fitting the model by PQL2.

Code: Select all

matrix a = (0,1,1,1)
More importantly, the model you fit by MCMC is a two-way cross-classified model while the model you try to fit by PQL2 is a three-level hierarchical model. The three-level model is an incorrect model for these data and will often give poor starting values for the two-way cross-classified model. Indeed, when you try to run your code you get the following error message

Code: Select all

WARNING: IGLS algorithm failed to converge. Increase the number of iterations. See the maxiterations() option.
Which tells you that MLwiN has had convergence problems when it tries to fit the three-level hierarchical model. This is not surprising given that the data are cross-classified, not hierarchical.

So specify your starting values manually.

Furthermore, these models are sufficiently simple that a burnin of 5000 followed by a chain of 1000000 seems excessive, especially if this is exploratory work.

The long length of the chain might be one reason why you were running into errors (though it did not give rise to errors for me). This is a lot of data to bring back from MLwiN to Stata. If you do run chains as long as 1000000 I strongly suggest that you use a sensible amount of thinning.

Code: Select all

thinning(1000)
I suggest you therefore use something like the following code

Code: Select all

matrix a = (0,1,1,1)

matrix list a

forvalues i = 1/100 {

	runmlwin v`i' cons, ///
		level3(areaid: cons) ///
		level2(intid: cons) ///
		level1(serialno:) ///
		discrete(distribution(binomial) link(logit) denominator(cons)) ///
		mcmc(cc burnin(1000) chain(5000) thinning(5)) initsb(a) ///
		nopause

	estimates store model`i'

}
Best wishes

George
Post Reply