Nested covariates and implication for ML model structure

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
MossyMcCollie
Posts: 17
Joined: Tue Oct 16, 2018 1:24 pm

Nested covariates and implication for ML model structure

Post by MossyMcCollie »

I have been asked to run a model of student exam data, with multiple entries per student. The repeated measures will form the lower level, and the students the higher level, of a 2-level hierarchy.

I have various student-level variables that I need to include: gender, ethnicity etc. I also need to include department and faculty information for each student (at this institution, there are 6 faculties, and each is split into between 2 and 4 departments).

My understanding is that it it not valid to specify department and faculty as levels 3 and 4 of my hierarchy, as both are fixed classifications. (We are not currently proposing to generalise findings beyond this institution.) I can add them as additional student-level terms in the same way that I add gender, ethnicity etc. (i.e. with one reference category specified and using indicator variables)... but the department/faculty data is nested. Faculty A contains only departments 1, 2 and 3; faculty B contains only departments 4 and 5, and so on.

I wondered what (if any) the implications of this nesting are - can the indicator variables modelling department and faculty be added to the model in the same way that other student-level variables such as gender are added in MLwiN? Or is there a better way?

It also occurs to me that if we decide to generalise our findings beyond our own institution, then both department and faculty become random classifications - and in this situation, might be considered to be levels 3 and 4 of the model, dispensing with the need to model them as fixed effects using indicator variables?

A possible further complication: currently I have data supplied to me from only 1 of the 6 faculties. It's a safe bet that I will get data from 3 or 4 in total, but there is a good chance that 1 or 2 faculties will not supply data at all. In which case, I am back to thinking that both faculty and department are random classifications... except that the population of units (6) will not be very much larger than the number of units sampled (say 4).

So in summary, the questions I am asking myself are:

1. Does the nesting of certain student-level factors treated as fixed classifications (department within faculty) matter?
2. If I decide to generalise my own institutions' data beyond its boundaries, can I consider department and faculty to be levels, rather than fixed classification variables, in the model?
3. If I do not receive data from all the faculties in my institution, even if I do not generalise beyond the institution, again, can I consider department and faculty to be levels, rather than fixed classification variables, in the model?

Any insights on any of this would be much appreciated, and apologies if I have missed something obvious.

Many thanks
John
billb
Posts: 157
Joined: Fri May 21, 2010 1:21 pm

Re: Nested covariates and implication for ML model structure

Post by billb »

Hi John,
With only 6 faculties you would be hard pressed to estimate random effects - I guess you may get enough departments to treat as random. If you decide to fit them as fixed effects then simply putting in department effects will saturate the model at that level and you will not be able to add in faculty effects on top. Basically you have say 25 department means and you therefore like in an ANOVA put in 24 dummies with 1 department as base thus using up all the data at department level. The only route to comparing faculties would to put in more base categories i.e. make 1 department within each faculty a base category and then you could add in say 5 dummies for faculty (with the 6th as base).

So in answer to 2 and 3 you would be making some exchangability assumptions about faculties and schools which will make sense in some scenarios the challenge is the fitting which is hard when you have less than about 10-15 units i.e. it is hard to check the normality assumption and indeed to estimate the variance.
Hope that helps,
Bill.
MossyMcCollie
Posts: 17
Joined: Tue Oct 16, 2018 1:24 pm

Re: Nested covariates and implication for ML model structure

Post by MossyMcCollie »

Many thanks Bill - very helpful as always. I think that defining a faculty, and one department per faculty as bases and using dummies for the rest sounds like the best way forward.
There are reasons to think that there is very little variation between faculties, so I will also try the 24-dummies ANOVA-style approach, i.e. disregarding faculty. If I have interpreted your response correctly, if I subsequently decide to generalise beyond my own institution, I can - possibly - re-structure the model with department as a 3rd level (above student on L-2 and repeated measures on L-1); on the ground that about 25 departments may be sufficient numbers of L-3 units to do this.
Thanks again for your advice.
John
Post Reply