Identifying residuals
Identifying residuals
Hello,
I have been fitting a straightforward variance component twolevel model, with did and lev2ID thelevel one and level 2 ids , and would like to label my level 2 residuals in a caterpillar plot. However, I am not able to match mymodel2e@residual$lev_2_resi_est_Intercept to mymodel2e@data$lev2ID, as their size are different. How can I identify my level 2 observations?
Thank you
I have been fitting a straightforward variance component twolevel model, with did and lev2ID thelevel one and level 2 ids , and would like to label my level 2 residuals in a caterpillar plot. However, I am not able to match mymodel2e@residual$lev_2_resi_est_Intercept to mymodel2e@data$lev2ID, as their size are different. How can I identify my level 2 observations?
Thank you

 Posts: 1067
 Joined: Mon Oct 19, 2009 10:34 am
Re: Identifying residuals
The residuals should be in the same order as the units in the original data, so if your ID variables are sorted, are unique across level2 units and you had a twolevel model then you could calculate the IDs with a line such as:
Unfortunately the current caterpillar() function provided with R2MLwiN does not provide a way to add your own labels to the plot, so you would need to adapt it or make the plot manually. The following is an example of how you might do this for the first example in the example at http://www.bristol.ac.uk/cmm/media/r2ml ... ide03.Rout:
Code: Select all
residualsID < unique(mymodel2e@data$lev2ID)
Code: Select all
caterpillarlab < function(y, x, ranks, qtlow, qtup, xlab = "", ylab = "", xlim = NULL, ylim = NULL, main = "") {
# This function draws a caterpillar plot
if (is.null(ylim)){
ylim = c(min(qtlow), max(qtup))
}
plot(x, y[rankno], xlim = xlim, ylim = ylim, pch = 15, xlab = xlab, ylab = ylab, main = main, xaxt="n")
points(x, qtlow[rankno], pch = 24, bg = "grey")
points(x, qtup[rankno], pch = 25, bg = "grey")
for (i in 1:length(x)) {
lines(rep(x[i], 2), c(qtlow[rankno][i], qtup[rankno][i]))
}
if (!is.null(names(x))) {
axis(side=1, at=x, labels = names(x)[rankno], cex.axis = 0.5)
} else {
if (is.factor(x)) {
axis(side=1, at=x, labels = levels(x)[rankno], cex.axis = 0.5)
} else {
axis(side=1, at=x, labels = x[rankno])
}
}
}
library(R2MLwiN)
data(tutorial, package = "R2MLwiN")
(mymodel1 < runMLwiN(normexam ~ 1 + (1  school) + (1  student), data = tutorial, estoptions = list(resi.store = TRUE)))
residuals < mymodel1@residual$lev_2_resi_est_Intercept
residualsCI < 1.96 * sqrt(mymodel1@residual$lev_2_resi_var_Intercept)
residualsRank < rank(residuals)
rankno < order(residualsRank)
residualsID < unique(mymodel1@data$school)
xaxis < 1:65
names(xaxis) < residualsID
# Note, xaxis could have been a factor instead, as below:
# xaxis < factor(1:65, labels=unique(mymodel1@data$school))
caterpillarlab(y = residuals, x = xaxis, ranks=rankno, qtlow = (residuals  residualsCI), qtup = (residuals + residualsCI), xlab = 'Rank', ylab = 'Intercept')
Re: Identifying residuals
Thank you for the reply as well as for the graphing tips.
It looks like there is something wrong with the model specification then, as
provides the expected answer (25), but
gives 943. This is indeed a two level variance component model, in its simplest form:
PID (level 1) uniquely identifies the observations and lev2ID is distributed as follows:
Am I missing something obvious? Both PID and lev2ID are numerical and reg.dat is data frame.
It looks like there is something wrong with the model specification then, as
Code: Select all
length(unique(mymodel0e@data$lev2ID))
provides the expected answer (25), but
Code: Select all
length(unique(mymodel2e@residual$lev_2_resi_est_Intercept))
gives 943. This is indeed a two level variance component model, in its simplest form:
Code: Select all
mymodel2e < runMLwiN(mwept.nenj ~ 1 +(1  lev2ID)+(1  pid),
data = reg.dat ,estoptions = list(resi.store = T))
PID (level 1) uniquely identifies the observations and lev2ID is distributed as follows:
Code: Select all
table(reg.dat$lev2ID)
11 12 21 22 23 24 31 32 33 34 35 41 42 51 52 53 54 61 62 71 72 81 82 91 92
230 73 76 26 144 80 25 99 37 64 129 156 41 39 61 53 40 169 42 111 25 63 84 58 127
Re: Identifying residuals
In case this is helpful , here is the R2MLwiN output:
Code: Select all
*************************************************
MLwiN (version: 3) multilevel model (Normal)
N min mean max N_complete min_complete mean_complete max_complete
lev2ID 25 25 82.08 230 25 25 82.08 230
Estimation algorithm: IGLS Elapsed time : 0.3s
Number of obs: 2052 (from total 2052) The model converged after 3 iterations.
Log likelihood: 3389.5
Deviance statistic: 6779

The model formula:
mwept.nenj ~ 1 + (1  lev2ID) + (1  pid)
Level 2: lev2ID Level 1: pid

The fixed part estimates:
Coef. Std. Err. z Pr(>z) [95% Conf. Interval]
Intercept 4.70063 0.03188 147.46 0 *** 4.63815 4.76311
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The random part estimates at the lev2ID level:
Coef. Std. Err.
var_Intercept 0.86884 0.06859

The random part estimates at the pid level:
Coef. Std. Err.
var_Intercept 0.85364 0.05156
*************************************************

 Posts: 1067
 Joined: Mon Oct 19, 2009 10:34 am
Re: Identifying residuals
That does look like something is wrong, as I would expect mymodel2e@residual$lev_2_resi_est_Intercept to have a length of 25. Could you confirm that your data are sorted by the level2 identifier? If so, could you add the debugmode=TRUE option into the estoptions part of your specification and then check the Model>Hierarchy viewer within MLwiN once it opens to check that it matches your expectations. Finally I would suggest looking at the contents of mymodel2e@residual$lev_2_resi_est_Intercept to see whether it looks sensible or if it has been padded with NA values.
Re: Identifying residuals
Oops. The level 2 variable was not sorted. I had completely forgotten this was a requirement.
Sorry for wasting your time.
Sorry for wasting your time.