Identifying residuals

pwalthery · Post by **pwalthery** » Wed Jul 04, 2018 4:03 pm

Hello,

I have been fitting a straightforward variance component two-level model, with did and lev2ID thelevel one and level 2 ids , and would like to label my level 2 residuals in a caterpillar plot. However, I am not able to match mymodel2e@residual$lev_2_resi_est_Intercept to mymodel2e@data$lev2ID, as their size are different. How can I identify my level 2 observations?

Thank you

ChrisCharlton · Post by **ChrisCharlton** » Thu Jul 05, 2018 9:42 am

The residuals should be in the same order as the units in the original data, so if your ID variables are sorted, are unique across level-2 units and you had a two-level model then you could calculate the IDs with a line such as:

Code: Select all

residualsID <- unique(mymodel2e@data$lev2ID)

Unfortunately the current caterpillar() function provided with R2MLwiN does not provide a way to add your own labels to the plot, so you would need to adapt it or make the plot manually. The following is an example of how you might do this for the first example in the example at http://www.bristol.ac.uk/cmm/media/r2ml ... ide03.Rout:

Code: Select all

caterpillarlab <- function(y, x, ranks, qtlow, qtup, xlab = "", ylab = "", xlim = NULL, ylim = NULL, main = "") {
  # This function draws a caterpillar plot
  if (is.null(ylim)){
    ylim = c(min(qtlow), max(qtup))
  }
  plot(x, y[rankno], xlim = xlim, ylim = ylim, pch = 15, xlab = xlab, ylab = ylab, main = main, xaxt="n")
  points(x, qtlow[rankno], pch = 24, bg = "grey")
  points(x, qtup[rankno], pch = 25, bg = "grey")
  for (i in 1:length(x)) {
    lines(rep(x[i], 2), c(qtlow[rankno][i], qtup[rankno][i]))
  }
  if (!is.null(names(x))) {
    axis(side=1, at=x, labels = names(x)[rankno], cex.axis = 0.5)
  } else {
    if (is.factor(x)) {
      axis(side=1, at=x, labels = levels(x)[rankno], cex.axis = 0.5)
    } else {
      axis(side=1, at=x, labels = x[rankno])
    }
  }
}


library(R2MLwiN)
data(tutorial, package = "R2MLwiN")
(mymodel1 <- runMLwiN(normexam ~ 1 + (1 | school) + (1 | student), data = tutorial, estoptions = list(resi.store = TRUE)))

residuals <- mymodel1@residual$lev_2_resi_est_Intercept
residualsCI <- 1.96 * sqrt(mymodel1@residual$lev_2_resi_var_Intercept)
residualsRank <- rank(residuals)
rankno <- order(residualsRank)
residualsID <- unique(mymodel1@data$school)
xaxis <- 1:65
names(xaxis) <- residualsID
# Note, xaxis could have been a factor instead, as below:
# xaxis <- factor(1:65, labels=unique(mymodel1@data$school))
caterpillarlab(y = residuals, x = xaxis, ranks=rankno, qtlow = (residuals - residualsCI), qtup = (residuals + residualsCI), xlab = 'Rank', ylab = 'Intercept')

pwalthery · Post by **pwalthery** » Thu Jul 05, 2018 1:03 pm

Thank you for the reply as well as for the graphing tips.
It looks like there is something wrong with the model specification then, as

Code: Select all

length(unique(mymodel0e@data$lev2ID))

provides the expected answer (25), but

Code: Select all

length(unique(mymodel2e@residual$lev_2_resi_est_Intercept))

gives 943. This is indeed a two level variance component model, in its simplest form:

Code: Select all

mymodel2e <- runMLwiN(mwept.nenj ~ 1 +(1 | lev2ID)+(1 | pid), 
data = reg.dat ,estoptions = list(resi.store = T))

PID (level 1) uniquely identifies the observations and lev2ID is distributed as follows:

Code: Select all

table(reg.dat$lev2ID)

 11  12  21  22  23  24  31  32  33  34  35  41  42  51  52  53  54  61  62  71  72  81  82  91  92 
230  73  76  26 144  80  25  99  37  64 129 156  41  39  61  53  40 169  42 111  25  63  84  58 127

Am I missing something obvious? Both PID and lev2ID are numerical and reg.dat is data frame.

pwalthery · Post by **pwalthery** » Thu Jul 05, 2018 1:11 pm

In case this is helpful , here is the R2MLwiN output:

Code: Select all

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 3)  multilevel model (Normal) 
         N min  mean max N_complete min_complete mean_complete max_complete
lev2ID 25  25 82.08 230         25           25         82.08          230
Estimation algorithm:  IGLS        Elapsed time : 0.3s 
Number of obs:  2052 (from total 2052)        The model converged after 3 iterations.
Log likelihood:      -3389.5 
Deviance statistic:  6779 
--------------------------------------------------------------------------------------------------- 
The model formula:
mwept.nenj ~ 1 + (1 | lev2ID) + (1 | pid)
Level 2: lev2ID     Level 1: pid      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
              Coef.   Std. Err.        z   Pr(>|z|)         [95% Conf.   Interval] 
Intercept   4.70063     0.03188   147.46          0   ***      4.63815     4.76311 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the lev2ID level: 
                  Coef.   Std. Err. 
var_Intercept   0.86884     0.06859 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the pid level: 
                  Coef.   Std. Err. 
var_Intercept   0.85364     0.05156 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

ChrisCharlton · Post by **ChrisCharlton** » Thu Jul 05, 2018 1:20 pm

That does look like something is wrong, as I would expect mymodel2e@residual$lev_2_resi_est_Intercept to have a length of 25. Could you confirm that your data are sorted by the level-2 identifier? If so, could you add the debugmode=TRUE option into the estoptions part of your specification and then check the Model->Hierarchy viewer within MLwiN once it opens to check that it matches your expectations. Finally I would suggest looking at the contents of mymodel2e@residual$lev_2_resi_est_Intercept to see whether it looks sensible or if it has been padded with NA values.

pwalthery · Post by **pwalthery** » Fri Jul 13, 2018 1:42 pm

Oops. The level 2 variable was not sorted. I had completely forgotten this was a requirement.
Sorry for wasting your time.

www.cmm.bristol.ac.uk/forum

Identifying residuals

Identifying residuals

Re: Identifying residuals

Re: Identifying residuals

Re: Identifying residuals

Re: Identifying residuals

Re: Identifying residuals