Limitations in level-3 units

Raphael · Post by **Raphael** » Mon Dec 05, 2011 5:33 am

Hi everyone,
I am currently working on a project where I have to run a three level model. Individuals are nested within households and households are nested within geographical clusters (445 clusters, 11260 households, 19033 individuals). The problem is that even the null model does not converge and I get the error message “SSP matrix for fixed part has gone negative definite – a reconstruction to the nearest non-negative form has been used.” It then asks: “Continue estimation?” But even after hitting the “Yes” button and numerous additional iterations the models do not converge. After a couple of hours of data exploring, I found that when I collapse the level-3 units into only 6 groups the models run fine. However, doing such a collapse is theoretically impossible to justify. Has anyone else run into such a problem?
Also, the models run fine if I specify a three level structure but do not let the constant vary either at level-2 or at level-3. For example…
runmlwin wealth cons if rural==1, ///
level3(clust: ) ///
level2(houseid: cons) ///
level1(indid: cons) //nopause

However, I am wondering whether specifying the above model takes into account the clustering at level-3 in case I am including level-3 predictors. Does anyone know whether this is the case?

Any help with this issue would be highly appreciated!
Thanks so much!

Best,
Raphael

GeorgeLeckie · Post by **GeorgeLeckie** » Mon Dec 05, 2011 5:43 pm

Hi Raphael,

Your sample has 19033 individuals (level-1) nested within 11260 households (level-2) nested within 445 clusters (level-3)

Thus, many of your households are singletons (just one individual per household).

This makes it very hard to separate the individual-level (level-1) variance component from the household-level (level-2) variance component.

Indeed, if every household had one individual it would be impossible.

My advice would be to build the model up gradually adding one parameter at at a time.

When fitting each model use the parameter estimates from the previous model as starting values.

For example...

Code: Select all

runmlwin wealth cons if rural==1, level1(indid: cons)

Code: Select all

runmlwin wealth cons if rural==1, level2(houseid: cons) level1(indid: cons) initsprevious

Code: Select all

runmlwin wealth cons if rural==1, level3(clust: cons) level2(houseid: cons) level1(indid: cons) initsprevious

On a separate point, you say "I found that when I collapse the level-3 units into only 6 groups the models run fine." and you are right to say that this is not an ideal thing to do.

However, it might suggest that the problem you have lies with the clusters (level-3) rather than the low number of individuals per household.

Ideally the number of households per cluster should be fairly even, however it would be worth checking this.

If you also have many clusters with one or very few household this will also make the estimation problem more difficult.

In terms of the runmlwin part of your query:

No, the following code does not take into account clustering at level-3 in your data:

Code: Select all

runmlwin wealth cons if rural==1, ///
level3(clust: ) ///
level2(houseid: cons) ///
level1(indid: cons) //nopause

As it is equivilent to simply writing:

Code: Select all

runmlwin wealth cons if rural==1, ///
level2(houseid: cons) ///
level1(indid: cons) //nopause

Let us know how you get on.

Best wishes

George

Raphael · Post by **Raphael** » Fri Dec 09, 2011 5:26 pm

Hi George,

Thank you so much for your fast response to my questions! This was really helpful and provided me with additional insight in how to use runmlwin – I had no idea that you can also use initsprevious in regular estimation procedures since I only used it to obtain starting values for MCMC. Unfortunately, the model building process, adding one level at a time, did not help to get the models to run properly. But you are completely right the issue is the large percentage of singletons (52% in my sample).
I did some reading on the issue of sparse data. An excellent article by Clarke and Wheaton (2007) uses a simulation approach to discuss in detail the impact of the prevalence of various percentages of singletons on the reliability and precision of fixed and random effect coefficients. A common theme in the literature seems to be that results are unbiased if the group size is at least 5 cases and the group number is 50 groups or larger (though some mention that 30 groups are sufficient) (Maas and Hox 2005).
The predominant approach to reduce observation sparseness and the percentage of singletons is to use cluster analysis with the goal to combine level-2 groups to larger aggregates based on sociodemographic similarity or/and geographic proximity (Beland et al. 2002, Cutrona et al. 2000, Buka et al. 2003). In this way clustering reduces the number of level-2 groups and increases the average group size. I followed this approach conceptually; but rather than creating my own artificial clusters I just use the cluster-level as my second level. This results in 445 cluster-level groups with an average group size of 43 individuals, which represents ideal conditions for a multilevel analysis (Maas and Hox 2005, Clarke and Wheaton 2007).
So far my approach to the issue of singletons…
Thanks again for your help in getting me to thinking about the source of the issue!

Best,
Raphael

Bibliography (articles that deal with group number and group size in multilevel modeling):

Beland, F., Birch, S., & Stoddart, G. (2002). Unemployment and health: contextual-level influences on the production of health in populations. [Article]. Social Science & Medicine, 55(11), 2033-2052.

Buka, S. L., Brennan, R. T., Rich-Edwards, J. W., Raudenbush, S. W., & Earls, F. (2003). Neighborhood support and the birth weight of urban infants. [Article]. American Journal of Epidemiology, 157(1), 1-8.

Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single-level models with sparse data. [Article]. Journal of Epidemiology and Community Health, 62(8), 752-758.

Cutrona, C. E., Russell, D. W., Hessling, R. M., Brown, P. A., & Murry, V. (2000). Direct and moderating effects of community context on the psychological well-being of African American women. [Article]. Journal of Personality and Social Psychology, 79(6), 1088-1101.

Clarke, P., & Wheaton, B. (2007). Addressing data sparseness in contextual population research - Using cluster analysis to create synthetic neighborhoods. [Article]. Sociological Methods & Research, 35(3), 311-351.

Maas, C. J. M., & Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1(3), 86-92.

GeorgeLeckie · Post by **GeorgeLeckie** » Mon Dec 12, 2011 10:46 am

Hi Raphael,

Thank you very much for letting us know how you got on.

In particular, thank you for the useful guidance and references you have given on sparse data.

I'm sure other runmlwin users will find this information useful

Best wishes

George

www.cmm.bristol.ac.uk/forum

Limitations in level-3 units

Limitations in level-3 units

Re: Limitations in level-3 units

Re: Limitations in level-3 units

Re: Limitations in level-3 units