offset for negative binomial models

Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to runmlwin: Running MLwiN from within Stata >> http://www.bristol.ac.uk/cmm/software/runmlwin/
Post Reply
Raphael
Posts: 19
Joined: Wed Oct 12, 2011 2:52 am

offset for negative binomial models

Post by Raphael »

Hi all,
In my research I have run into a problem using an offset.
Research context: I am modeling household level temporary outmigration. The outcome variable is a count variable indicating the number of temporary outmigrants (variable tmig) in a particular year per household (range 0 to 11). Households are nested within villages and some village level characteristics are the predictors of main interest – thus, I am using a two-level structure. Formerly I was using Poisson models as following:

Code: Select all

runmlwin tmig cons ///
     predictor1 predictor2 predictor3 , ///
    level2(Village: cons) ///
    level1(Household_Anon: ) ///
    discrete(distribution(poisson) offset(lgsize)) nopause
Employing Poisson models, I was able to use the log transformed household size as offset (variable lgsize). The problem is that I am now switching to model permanent outmigration (variable pmig) which is a less frequent event and the variable shows a high level of overdispersion. Thus, I tried to switch to negative binomial models to account for the overdispersion.

Code: Select all

runmlwin pmig cons ///
    predictor1 predictor2 predictor3 , ///
    level2(Village: cons) ///
    level1(Household_Anon: ) ///
    discrete(distribution(nbinomial) offset(lgsize)) nopause
With this setup, however, STATA returns the following error message: “Remove offset(). offset() is only valid for the Poisson distribution.” I had a look at the runmlwin help entry and there it provides the following description for offset(varname): “include varname in the model with coefficient constrained to 1 (Poisson or negative binomial distributions only)”
Thus, this help file entry seems to indicate that it is also possible for negative binomial models to specify an offset. From my understanding the negative binomial models are basically Poisson models with an added dispersion parameter and thus should be able to incorporate an offset. But maybe I lack understanding of the difference between the two model types…

But here is my question: In case I can’t use an offset in the negative binomial models, has anyone an advice how to deal with this issue in an alternative way? Is it statistically sound to include the logged household size as a covariate in the negative binomial models? I highly appreciate any help with this issue! Thanks!

Best,
Raphael
ChrisCharlton
Posts: 1354
Joined: Mon Oct 19, 2009 10:34 am

Re: offset for negative binomial models

Post by ChrisCharlton »

To answer the question about the inconsistency between the help file and the runmlwin error message, it is the error message that is wrong as MLwiN does allow offsets in negative binomial models. We will fix this message in the next release, but if you are happy editing runmlwin.ado the fix is to change the code:

Code: Select all

	* Check that the offset function is correctly specified.
	if ("`discrete'"~="") {
		if "`offset'" ~= "" {
			local tmppois "poisson"
			//if ~inlist("`distribution'","poisson") {
			if ~`:list tmppois in distribution' {
				display as error "Remove offset(). offset() is only valid for the Poisson distribution." _n
				exit 198
			}
		}
	}
to:

Code: Select all

	* Check that the offset function is correctly specified.
	if ("`discrete'"~="") {
		if "`offset'" ~= "" {
			local tmppois "poisson nbinomial"
			if "`:list tmppois & distribution'" == "" {
				display as error "Remove offset(). offset() is only valid for the Poisson or negative binomial distribution." _n
				exit 198
			}
		}
	}
Raphael
Posts: 19
Joined: Wed Oct 12, 2011 2:52 am

Re: offset for negative binomial models

Post by Raphael »

Hi Chris,
Great suggestion! There is only one problem – I am just a regular PC user and have never tried to mess around with .ado files. I just tried to find the runmlwin.ado file, but was not successful. There is a folder “C:\Program Files\Stata11\ado” that contains a zillion of .ado files in different subfolders sorted by alphabetic letters. However, I was not able to locate the runmlwin.ado in the “r” folder. Any suggestion where I can find the runmlwin.ado file? Thanks a lot!

Best,
Raphael
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: offset for negative binomial models

Post by GeorgeLeckie »

Hi Raphael,

To find out where runmlwin.ado is on your computer, type the following

Code: Select all

. which runmlwin
For example, if we type this on our computer we see the following

Code: Select all

. which runmlwin
Q:\C-Modelling\runmlwin\development version\runmlwin\runmlwin.ado
*! runmlwin.ado, George Leckie and Chris Charlton, 31Mar2012
and on our computer, runmlwin is stored at

Q:\C-Modelling\runmlwin\development version\runmlwin\runmlwin.ado

Best wishes

George
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: offset for negative binomial models

Post by GeorgeLeckie »

Hi Raphael,

There is an alternative approach to accounting for overdispersion that you may want to consider. Basically you add a new level to your Poisson model to account for the overdispersion.

Consider the three-level random intercepts Poisson model example in the MLwiN User manual

Code: Select all

. use http://www.bristol.ac.uk/cmm/media/runmlwin/mmmec, clear
. generate lnexpected = ln(exp)
. runmlwin obs cons uvbi, ///
    level3(nation: cons) ///
    level2(region: cons) ///
    level1(county) discrete(distribution(poisson) offset(lnexpected)) ///
    rigls nopause

MLwiN 2.25 multilevel model                     Number of obs      =       354
Poisson response model
Estimation algorithm: RIGLS, MQL1

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         nation |        9          3       39.3         95
         region |       78          1        4.5         13
-----------------------------------------------------------

Run time (seconds)   =       1.56
Number of iterations =          7
------------------------------------------------------------------------------
         obs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |   .0157395   .1435605     0.11   0.913    -.2656338    .2971129
        uvbi |  -.0338052   .0106901    -3.16   0.002    -.0547574    -.012853
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 3: nation              |
                   var(cons) |   .1639265   .0846003     -.0018871      .32974
-----------------------------+------------------------------------------------
Level 2: region              |
                   var(cons) |   .0441752   .0096214      .0253176    .0630328
------------------------------------------------------------------------------
To account for potential overdispersion in this model we can create an extra county level as follows

Code: Select all

. runmlwin obs cons uvbi, ///
    level4(nation: cons) ///
    level3(region: cons) ///
    level2(county: cons) ///
    level1(county:) ///
    discrete(distribution(poisson) offset(lnexpected)) ///
    rigls nopause
 
MLwiN 2.25 multilevel model                     Number of obs      =       354
Poisson response model
Estimation algorithm: RIGLS, MQL1

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         nation |        9          3       39.3         95
         region |       78          1        4.5         13
         county |      354          1        1.0          1
-----------------------------------------------------------

Run time (seconds)   =       1.67
Number of iterations =          8
------------------------------------------------------------------------------
         obs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -.0020954   .1393393    -0.02   0.988    -.2751955    .2710047
        uvbi |  -.0382796   .0107662    -3.56   0.000     -.059381   -.0171782
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 4: nation              |
                   var(cons) |   .1534841   .0794733     -.0022807    .3092489
-----------------------------+------------------------------------------------
Level 3: region              |
                   var(cons) |   .0391589   .0093747      .0207847     .057533
-----------------------------+------------------------------------------------
Level 2: county              |
                   var(cons) |   .0097199   .0038033      .0022656    .0171743
------------------------------------------------------------------------------
Then we can switch to MCMC estimation (you cannot fit negative binomial models by MCMC)

Code: Select all

. runmlwin obs cons uvbi, ///
    level4(nation: cons) ///
    level3(region: cons) ///
    level2(county: cons) ///
    level1(county:) ///
    discrete(distribution(poisson) offset(lnexpected)) ///
    mcmc(on) initsprevious nopause

MLwiN 2.25 multilevel model                     Number of obs      =       354
Poisson response model
Estimation algorithm: MCMC

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         nation |        9          3       39.3         95
         region |       78          1        4.5         13
         county |      354          1        1.0          1
-----------------------------------------------------------

Burnin                     =        500
Chain                      =       5000
Thinning                   =          1
Run time (seconds)         =       11.1
Deviance (dbar)            =    1957.25
Deviance (thetabar)        =    1839.29
Effective no. of pars (pd) =     117.96
Bayesian DIC               =    2075.21
------------------------------------------------------------------------------
         obs |      Mean    Std. Dev.     ESS     P       [95% Cred. Interval]
-------------+----------------------------------------------------------------
        cons |  -.0194028   .1246272        6   0.369    -.2116869    .2098003
        uvbi |  -.0340979   .0105646       28   0.000    -.0544113    -.015185
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |     Mean   Std. Dev.   ESS     [95% Cred. Int]
-----------------------------+------------------------------------------------
Level 4: nation              |
                   var(cons) |  .1938669  .1348879    740   .0575914  .5506623
-----------------------------+------------------------------------------------
Level 3: region              |
                   var(cons) |  .0430351  .0111135    393   .0251727  .0688957
-----------------------------+------------------------------------------------
Level 2: county              |
                   var(cons) |  .0141038  .0039471     72   .0073781  .0229078
------------------------------------------------------------------------------

You can see that the county level variance is very small relative to the region and nation level variance components suggesting that overdispersion is not particualrly problematic in this model.

I hope this helps

Best wishes

George
Raphael
Posts: 19
Joined: Wed Oct 12, 2011 2:52 am

Re: offset for negative binomial models

Post by Raphael »

Hi George,

As always, thanks so much for the comprehensive response to my question! It is truly amazing how much you know about the statistical mechanics behind the commands!
I was able to locate the runmlwin.ado file and based on Chris’ suggestions I changed the code and the negative binomial models work perfectly now. I also tried to add an additional level to the model and this also works great. In my models there is actually a lot of variation picked up by the added variance component. Interestingly the results of the added-level model are completely similar to the negative binomial model.
However, I am not quite sure I understand what is going on here. Normally, in the case of discrete response models the level-1 variance is a function of the mean, which depends on the values of the explanatory variables in the model. Thus, the level-1 variance is usually fixed and a constant. Somehow we are overriding this assumption and I am not quite sure I understand how this is done. Could you maybe provide a brief explanation (in case you have the time)? Has this method been used in published work somewhere (would be helpful if I could provide a cite if I use this method in my own research/publications)? Thanks so much!

Best,
Raphael
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: offset for negative binomial models

Post by GeorgeLeckie »

In the single-level poisson model the mean of the count is assumed equal to the variance

In multilevel poisson model the inclusion of random effects results in the variance being greater than the mean

So all multilevel poisson models implicity allow for an amount of overdispersion

Your problem was that even after allowing for the higher level random effects you still have overdispersion

Putting in the random effect at the response level helps mops up this remaining overdisperion

If you google for overdisperion in multilevel poisson models you should find references and a mathematical description of this

George
Raphael
Posts: 19
Joined: Wed Oct 12, 2011 2:52 am

Re: offset for negative binomial models

Post by Raphael »

Hi George,
Thanks so much for this helpful explanation! I have spent some hours and did a thorough literature research on the issue of overdispersed multilevel models. I found a few articles that use random effects to account for overdispersion (e.g., Link and Sauer 2002, Liu and Dey 2006, Huang and Abdel-Aty 2010). However, the most useful explanation of how to directly model overdispersion through the inclusion of a variance component at the response level and its mathematical description is provided by Gelman and Hill (2007) in chapter 15 “Multilvel generalized linear models.”
Maybe these citations are of help to other MLwiN users running into the same issues as I did.
Have a nice day!

Best,
Raphael

References:

Gelman, Andrew, and Jennifer Hill. 2007. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

Huang, H., and M. Abdel-Aty. 2010. "Multilevel data and Bayesian analysis in traffic safety." Accident Analysis and Prevention 42:1556-1565.

Link, W. A., and J. R. Sauer. 2002. "A hierarchical analysis of population change with application to Cerulean Warblers." Ecology 83:2832-2840.

Liu, Junfeng, and Dipak K. Dey. 2006. "Hierarchical overdispersed Poisson model with macrolevel autocorrelation." Statistical Methodology 4:354-370.
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: offset for negative binomial models

Post by GeorgeLeckie »

Hi Raphael,

Thank you very much for listing these references. These will be very helpful for people wanting to use this approach to model over dispersion in count models.

Best wishes

George
rdmcdowell
Posts: 31
Joined: Mon Apr 02, 2012 3:26 pm

Re: offset for negative binomial models

Post by rdmcdowell »

In these examples, if I wanted to allow for a random slope for uvbi at the region level (for whatever reason), would I just amend the level3(region: cons) syntax or does the addition of the lower level random effect for overdispersion affect how this would be specified in MlWin? Thank you.
Post Reply