Removing Multivariate Outliers

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
Jillianeh
Posts: 4
Joined: Wed Mar 22, 2017 9:01 pm

Removing Multivariate Outliers

Post by Jillianeh »

Hello,

I have a data set of 20,000 students, nested in 1000 classrooms, nested in 300 schools.

I have saved my standardized residuals for level 1, level 2, and level 3. The only thing random in my model is the intercept.

I want to remove the standardized residuals that are >/=2 or </=(-2) (at first level, second level, and third level) for a sensitivity analysis.

I can store these residuals, however, the second and third level residuals only provide me with 1 residual per level-unit (i.e. 1000 at class level and 300 at school level) as opposed to a higher-level residual associated with every observation (i.e. level 1, students). When I export my dataset to SPSS, the second and third level residuals are not matching up with my level 2 and level 3 IDs, so I cannot aggregate the values.

In the MLwiN manual, the only thing I can seem to find about sensitivity analyses/removing outliers, suggested manually pointing-and-clicking every outlying observation in the residual plot and "removing from analysis" by hand. This is not feasible given the size of my dataset.

Does anyone know how to remove multivariate outliers in MLwiN?

Thanks in advance,
Jillian
ChrisCharlton
Posts: 1351
Joined: Mon Oct 19, 2009 10:34 am

Re: Removing Multivariate Outliers

Post by ChrisCharlton »

If you just want the residuals expanded up to be the same length as level-1 then probably the easiest way to do this is via the predictions window. If you just select the residual term of interest (i.e. u0j) here without any fixed effects then the prediction will only contain the data for this term, but for each level-1 unit. An alternative is to use the CALC command to generate the expression determining the data to be excluded. If you do this there is a lev1 operator which automatically expands the referred to column to the level-1 length. For more information see the help topic for this command.
Post Reply