I am running a 2level model with several hundred thousand level1 units but only about 70 level2 units. I have already identified a set of level1 predictors for inclusion, and am considering which, if any, predictors to add in at level2.
I can identify about 9 individual L2 predictors whose individual inclusion results in a significant reduction in LRS, compared with the LRS of a model with my L1 predictors only. But I'm hesitant to include these L2 predictors, and I don't really know why: I think I might be overfitting my model, if I consider just the number of potential L2 variables and units. My intuition is that variables fitted to just 70 points should not be allowed to influence a model to the same extent that a L1 variable fitted to 500,000 units does, but I'm prepared for this intuition to be completely wrong.
Although the deltaLRS stats for each of the L2 variables (comparing a model with L1 variables only versus a model with the L1 set plus a single L2 var) indicate significance, they are of the order of, say 4 or 5; whereas the deltaLRS stats for most L1 vars are several thousand. Again, I'm not sure whether this is a reflection of the relative merit of the L1 and L2 variables or not.
Any insights gratefully received.
Thanks
John
Higher level predictors: are they worth it?

 Posts: 9
 Joined: Tue Oct 16, 2018 1:24 pm
Re: Higher level predictors: are they worth it?
Dear John,
I would say that it would be important to include the level 2 predictors. In some respect you are largely doing the multilevel modelling to control for variation in the response variable that can be explained by clustering into groups. Of course putting level 2 predictors into the model can explain why there is this variation and may even remove the need for the multilevel modelling i.e. one could generate a dataset where the only clustering was due to a level 2 predictor variable and so in adding this variable to the model you might be able to collapse from a 2 level model to a 1 level model.
Hope that helps,
Bill.
I would say that it would be important to include the level 2 predictors. In some respect you are largely doing the multilevel modelling to control for variation in the response variable that can be explained by clustering into groups. Of course putting level 2 predictors into the model can explain why there is this variation and may even remove the need for the multilevel modelling i.e. one could generate a dataset where the only clustering was due to a level 2 predictor variable and so in adding this variable to the model you might be able to collapse from a 2 level model to a 1 level model.
Hope that helps,
Bill.