Mean centred variables

Mark.McCann · Post by **Mark.McCann** » Mon Feb 15, 2010 9:28 am

Hello all,

I was wondering about grand mean centreing variable for Multi level models, specifically I'd be interested in answers for the following questions;

What are the advantages of using mean centred variables in models (or what are the main advantages)?

Do the same advantages apply when using dummy variables for binary, or ordinal/nominal variables?

Are there different considerations when comparing MCMC to numerical estimation?

Many thanks,
Mark

Lydia · Post by **Lydia** » Mon Feb 15, 2010 3:39 pm

As far as I'm aware, there are 2 advantages to grand mean centring:

1) interpretability
2) aid estimation

If you have a simple model (e.g. probably most continuous response random intercept models) then 2) is not really going to apply- you're more likely to gain benefit from centring in terms of estimation when you have a discrete response model, when it can make the difference between being able to run the model and not being able to. (There may well be other kinds of models where it helps too; my experience of this is just with discrete response models). It needn't necessarily be grand-mean centred: for example I've run models where I had year as an explanatory variable and centred on the first year; a better choice would probably have been one of the middle years, but the important thing was that my explanatory variable didn't take the very large values it would have done without centring.

So what about 1)? Why does centring improve the interpretability of the model? Well, the intercept is the predicted value of the response when all the explanatory variables take the value 0. If you grand mean centre all your (continuous) explanatory variables, then someone who has the value 0 for all these variables is average with respect to all these qualities, and that's a nice interpretation for the intercept. If you don't centre, then the intercept might be harder to interpret- for example if you don't centre age in a model predicting voting patterns, your intercept will correspond to the predicted probability of voting Conservative for a newborn infant. This is both less useful to know than the predicted probability of voting Conservative for someone of the average age of your respondents, and rather questionable as it's doubtful that your model can be extrapolated to such young ages. So if you centre you will have an intercept you can give a meaningful interpretation to while if you don't centre, then it's not wrong, but you just can't attach a meaningful interpretation to your intercept (perhaps you might not mind this). The benefits perhaps become more apparent when we consider interactions. The centring of the variables multiplied together to form the interaction should be the same as the centring for each variable as a main effect (and MLwiN automatically uses the centred forms of the variables to form the interaction, provided of course you add the interaction term after adding the main effects). Suppose we have an interaction between variables A and B:

y = beta0 + beta1 A + beta2 B + beta3 AB + e

The term AB is 0 when A is 0 or B is 0. If both variables have been grand mean centred, then this is when A takes its mean value or B takes its mean value. When both A and B take their mean value, then beta1 A, beta2 B and beta3 AB are all 0 so we are just left with the intercept: this is the predicted value for a person of average A and average B. When A (only) takes its mean value, we have y = beta0 + beta2 B + e. So beta2 is the effect of B on the response for a person of average A. Similarly beta1 is the effect of A on the response for a person of average B. If we didn't centre, we would instead have beta2 is the effect of B on the response for a person with A = 0, whatever that might be- again beta2 would be the effect of B for a newborn if A was age.

There are also advantages when you have a random slope model: the random intercepts are the deviations of the group lines from the overall regression line when all the explanatory variables are 0, and their variance is the level 2 variance when all the explanatory variables are 0. Again making sure that 0 corresponds to the mean of each variable makes this easier to interpret; you could end up with a giant sigma^2_u0 if 0 is well outside the range of one or more explanatory variables, and the lines happen to be fanning out as you approach 0 from the range of the data (or a tiny sigma^2_u if the lines are fanning the other way), but in spite of looking impressive it wouldn't actually mean anything.

You wouldn't want to centre dummy variables. This would make them harder to interpret, not easier- imagine you have a gender dummy, coded 1 for female, and you take the average- say this is 0.501. Grand mean centre and the variable becomes -0.501 for men (0 - 0.501) and 0.499 for women (1 - 0.501). So now your predicted value for men is beta0 - 0.501 beta1 and your predicted value for women is beta0 + 0.499 beta1. Whereas if you leave your dummy variable alone then beta0 is the predicted value for men and beta0 + beta1 is your predicted value for women, and beta1 is the difference between men and women- much easier to understand! Similarly for explanatory variables with more than 2 categories. I don't believe that centring the dummies would improve estimation either; 0 is already within the range of the data. (However, changing the reference category can certainly sometimes help).

I don't believe there are any different considerations for MCMC, but I'm not an expert so I wouldn't like to say for sure!

Mark.McCann · Post by **Mark.McCann** » Mon Feb 15, 2010 4:04 pm

Thanks very much for this Lydia, it is very helpful.

I'm usually not too worried about aiding interpretation of intercepts. I work mostly with discrete data and look for relative differences, controlling for whatever variables and the probabilities from models tend not to interest me themselves. I have, however been having serious trouble with convergence and time. My computer's been running 2 level logistic and 2 level cross classified logistics non stop since Friday, with many many non-starters thrown into the mix!

These models have been dummies exclusively, with the continuous variables broken into quantiles. Would the order of the categories make a difference to estimation, and which ordering is most helpful?

A second question is whether using continuous data would make it easier on the processor than dummy quantiles?

Thanks,

Mark

Lydia · Post by **Lydia** » Wed Feb 17, 2010 5:48 pm

I'm not quite sure what you mean by the 'order of the categories'. Do you mean which order they appear in in the Equations window? As far as I'm aware that should make no difference at all- whatever order they're in it's the same model. Or do you mean the underlying coding, i.e. in the original categorical variable, before the dummies are created, what the meaning of the numbers is (e.g. 1 for 'Agree', 2 for 'Neutral' and 3 for 'Disagree' vs 1 for 'Disagree', 2 for 'Neutral' and 3 for 'Agree')? Again, by the time the dummies are created the model is identical so there should be no difference.

I'm not too sure about the continuous variables- I think it depends on what the actual relation is with the response (or, to be completely accurate, the logit of the probability of the response). If the relationship is (close to) linear, then I would expect the model would be easier to estimate by entering the variable as continuous, because that would mean fewer parameters to estimate. On the other hand, if the relationship is extremely non-linear, then I would expect the model might be easier to estimate by entering the variable as quantiles, because the continuous variable model will be a very bad fit. Somewhere in between those extremes, a polynomial of the variable could be the best option- if it is of a degree that is smaller than the number of quantiles you would alternatively enter- because it would again have fewer parameters to estimate than the quantile version, and would be a better fit than the linear continuous version. The number of quantiles could make a difference too- again, the too-few-and-the-fit's-bad vs too-many-and-there-are-loads-of-parameters balance. I'm afraid the definitive answer to this question can only come from playing around with the model; to make life easier for yourself you could experiment with the continuous variables one by one, building the model up gradually and not adding another term until you've found the best way to put in the previous one- this should reduce the number of combinations to try!

Are you using MCMC? This is in any case recommended for discrete response models (see http://www.cmm.bristol.ac.uk/MLwiN/tech ... entresults). MQL and particularly PQL can struggle to fit some models; while it's good to get the best possible starting values for MCMC (implying you should use PQL2), it's not essential- you can start with bad values and (hopefully) with a long enough burn-in it will move away from these and towards the right ones, so if you're having problems getting PQL to run, you can just use MQL then switch to MCMC. If even MQL is having problems converging, you can try stopping it before it's finished (maybe even after just 1 iteration), and then going to MCMC- you might need to edit some of the values before starting MCMC though (see http://www.cmm.bristol.ac.uk/MLwiN/tech ... html#c1096).

If you're having convergence problems with MCMC, you might want to check out some of the new features, which are shown in the last 5 chapters of the MCMC manual. The first few are only for Normal response models, but hierarchical centring in particular is useful and can be used with discrete response models. I've often found it can dramatically improve the trajectories and estimated sample sizes for my parameters.

Mark.McCann · Post by **Mark.McCann** » Wed Feb 17, 2010 6:11 pm

Hi,

By ordering the categories, I mean changing the reference category, which you mentioned might help, I usually plump for the largest nominal category or the bottom of an ordinal, might the mid point of an ordinal make a difference to estimation?

I tried quantiles and then after reading here I tried continuous variables, but I wasn't finding an association with the continuous, and actually as I've started typing I've figured that without a square I could easily get no association as I did see some ' n ' shapes, I'll give the polynomials a go.

I'm not using MCMC, I'm using STATA which doesn't offer MCMC; this is the main reason I'm so interested in the substantive effect on estimation of tweaking the variables, I've had a lot of backed up and non-converging estimates.

STATA also tells me that I've had problems with Hessians, although I have no idea what they are!

The cross-classified models section of the MLwiN paperwork makes me think it might take a bit of time for me to get my data (and my brain) into shape to run these in MLwiN, I'm seeing if there are alternative solutions first.

Thanks again,
Mark

Lydia · Post by **Lydia** » Wed Feb 17, 2010 6:29 pm

Ah, ok. I too would naturally go for the largest category of an unordered categorical variable to be the reference category; I think this is a sensible choice. I'd actually probably go with the largest category for an ordered categorical variable too, though sometimes for reasons of interpretation I'd choose one of the end categories. But even though the largest category is a 'sensible choice' in that it can most often be expected to give you a model that will run, sometimes you can find for particular models it works better choosing a different reference category- just the quirks of that data. Again, the way to find out is to experiment!

Cross-classified is quite a lot to get the head round, but can I just ask whether you've looked at the MCMC manual chapter on cross-classified models, or only the User's Guide chapter? Although it doesn't point to the MCMC manual in the User's Guide chapter, cross-classified models are a whole lot easier (from the user's point of view) in MCMC- you don't have to do all that business with making cons an extra level and sticking loads of dummies in, you just tick a box in an options window and the model looks the same as the non-cross-classified version. So there's a bit less to take in conceptually there!

Mark.McCann · Post by **Mark.McCann** » Thu Feb 18, 2010 10:14 am

Thanks very much for that Lydia, I suppose by the time I get the models built up I don't need the easiet interpretable reference category, and as the largest will shrink the se's I assume that makes quadrature easier......I have a vague notion that's true from something I was told long ago, but I haven't gotten the mathematician to tell me why!

As I'm still in STATA I never thought to look at the MCMC manual, I'll give it a go and see how things work

Thanks very much for your help,

Mark

www.cmm.bristol.ac.uk/forum

Mean centred variables

Mean centred variables

Re: Mean centred variables

Re: Mean centred variables

Re: Mean centred variables

Re: Mean centred variables

Re: Mean centred variables

Re: Mean centred variables