Higher level predictors: are they worth it?
Posted: Tue Jul 09, 2019 3:48 pm
I am running a 2-level model with several hundred thousand level-1 units but only about 70 level-2 units. I have already identified a set of level-1 predictors for inclusion, and am considering which, if any, predictors to add in at level-2.
I can identify about 9 individual L-2 predictors whose individual inclusion results in a significant reduction in LRS, compared with the LRS of a model with my L-1 predictors only. But I'm hesitant to include these L-2 predictors, and I don't really know why: I think I might be overfitting my model, if I consider just the number of potential L-2 variables and units. My intuition is that variables fitted to just 70 points should not be allowed to influence a model to the same extent that a L-1 variable fitted to 500,000 units does, but I'm prepared for this intuition to be completely wrong.
Although the delta-LRS stats for each of the L-2 variables (comparing a model with L-1 variables only versus a model with the L-1 set plus a single L-2 var) indicate significance, they are of the order of, say 4 or 5; whereas the delta-LRS stats for most L-1 vars are several thousand. Again, I'm not sure whether this is a reflection of the relative merit of the L-1 and L-2 variables or not.
Any insights gratefully received.
Thanks
John
I can identify about 9 individual L-2 predictors whose individual inclusion results in a significant reduction in LRS, compared with the LRS of a model with my L-1 predictors only. But I'm hesitant to include these L-2 predictors, and I don't really know why: I think I might be overfitting my model, if I consider just the number of potential L-2 variables and units. My intuition is that variables fitted to just 70 points should not be allowed to influence a model to the same extent that a L-1 variable fitted to 500,000 units does, but I'm prepared for this intuition to be completely wrong.
Although the delta-LRS stats for each of the L-2 variables (comparing a model with L-1 variables only versus a model with the L-1 set plus a single L-2 var) indicate significance, they are of the order of, say 4 or 5; whereas the delta-LRS stats for most L-1 vars are several thousand. Again, I'm not sure whether this is a reflection of the relative merit of the L-1 and L-2 variables or not.
Any insights gratefully received.
Thanks
John