check the homoscedasticity assumption in logistic models
check the homoscedasticity assumption in logistic models
Hi
I would like to check the homoscedasticity assumption in my logistic multilevel model. According to this post viewtopic.php?f=1&t=2006&e=1&view=unread#p4567, “You can assess the homoskedasticity assumption by plotting the predicted random effects against the covariates.”.
In the MLwiN envirioment, I assume this is achieved by plotting “the standardised residual x fixed part prediction” for the second level. I wonder how I can achieve this in R2mlwin? I cannot find anything on this in the manual.
Thanks in advance
I would like to check the homoscedasticity assumption in my logistic multilevel model. According to this post viewtopic.php?f=1&t=2006&e=1&view=unread#p4567, “You can assess the homoskedasticity assumption by plotting the predicted random effects against the covariates.”.
In the MLwiN envirioment, I assume this is achieved by plotting “the standardised residual x fixed part prediction” for the second level. I wonder how I can achieve this in R2mlwin? I cannot find anything on this in the manual.
Thanks in advance

 Posts: 1099
 Joined: Mon Oct 19, 2009 10:34 am
Re: check the homoscedasticity assumption in logistic models
The following is an example of how to replicate this residuals plot within R:
Code: Select all
library(R2MLwiN)
library(doBy)
data(tutorial)
(mymodel < runMLwiN(normexam ~ 1 + standlrt + (1  school) + (1  student), estoptions = list(resi.store = TRUE, resioptions = "standardised"), data = tutorial))
pred = data.frame(school=tutorial$school, predexam=predict(mymodel))
predexam.mean < summaryBy(predexam~school, data=pred, FUN=mean)$predexam.mean
plot(predexam.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="pred. val.", ylab="std(Intercept)")

 Posts: 1099
 Joined: Mon Oct 19, 2009 10:34 am
Re: check the homoscedasticity assumption in logistic models
If I am interpreting what George said correctly he is suggesting plotting the residuals vs each of the predictors in the model (I assume averaged within level2 units). The R code for this would be the same as above, except that the prediction would be replaced by the variable being checked.
Re: check the homoscedasticity assumption in logistic models
Thanks for the clarifications, Chris, and some additional questions:
1)
Does the predict(mymodel) function takes the mean of the covariates or the actual data values for each observation (and plug them into the model)? [it looks like it is taking the actual values…]
What if I want to set some covariates to their mean, and but usual actual values for one?
2)
library(R2MLwiN)
library(doBy)
data(tutorial)
(mymodel < runMLwiN(normexam ~ 1 + standlrt + sex + (1  school) + (1  student), estoptions = list(resi.store = TRUE, resioptions = "standardised"), data = tutorial))
# predicting on standlrt
pred = data.frame(school=tutorial$school, predexam=predict(mymodel, terms = "standlrt"))
predexam.mean < summaryBy(predexam~school, data=pred, FUN=mean)$predexam.mean
plot(predexam.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="pred. val.", ylab="std(Intercept)")
# predicting on sex
pred = data.frame(school=tutorial$school, predexam=predict(mymodel, terms = "sex"))
predexam.mean < summaryBy(predexam~school, data=pred, FUN=mean)$predexam.mean
plot(predexam.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="pred. val.", ylab="std(Intercept)")
1)
Does the predict(mymodel) function takes the mean of the covariates or the actual data values for each observation (and plug them into the model)? [it looks like it is taking the actual values…]
What if I want to set some covariates to their mean, and but usual actual values for one?
2)
I am refitting the model with two variables, to get predictions on each variable. However, I am getting the exact same predictions back (same plots). Can you see what I am doing wrong?If I am interpreting what George said correctly he is suggesting plotting the residuals vs each of the predictors in the model (I assume averaged within level2 units). The R code for this would be the same as above, except that the prediction would be replaced by the variable being checked.
library(R2MLwiN)
library(doBy)
data(tutorial)
(mymodel < runMLwiN(normexam ~ 1 + standlrt + sex + (1  school) + (1  student), estoptions = list(resi.store = TRUE, resioptions = "standardised"), data = tutorial))
# predicting on standlrt
pred = data.frame(school=tutorial$school, predexam=predict(mymodel, terms = "standlrt"))
predexam.mean < summaryBy(predexam~school, data=pred, FUN=mean)$predexam.mean
plot(predexam.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="pred. val.", ylab="std(Intercept)")
# predicting on sex
pred = data.frame(school=tutorial$school, predexam=predict(mymodel, terms = "sex"))
predexam.mean < summaryBy(predexam~school, data=pred, FUN=mean)$predexam.mean
plot(predexam.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="pred. val.", ylab="std(Intercept)")

 Posts: 1099
 Joined: Mon Oct 19, 2009 10:34 am
Re: check the homoscedasticity assumption in logistic models
1) The predict function by default will give you the sum of the covariates multiplied by their parameter estimates. It does however have an option to supply you own data, so you could give it a data frame where you have replaced the values with their means.
2) If you wanted to replicate the behaviour of the MLwiN residuals window where it plots residuals by an average for variable instead of a fixedpart prediction the code would just be something like the following:
2) If you wanted to replicate the behaviour of the MLwiN residuals window where it plots residuals by an average for variable instead of a fixedpart prediction the code would just be something like the following:
Code: Select all
library(R2MLwiN)
library(doBy)
data(tutorial)
(mymodel < runMLwiN(normexam ~ 1 + standlrt + (1  school) + (1  student), estoptions = list(resi.store = TRUE, resioptions = "standardised"), data = tutorial))
standlrt.mean < summaryBy(standlrt~school, data=tutorial, FUN=mean)$standlrt.mean
plot(standlrt.mean, mymodel@residual$lev_2_std_resi_est_Intercept, xlab="standlrt", ylab="std(Intercept)")
Re: check the homoscedasticity assumption in logistic models
Thanks, again, Chris.
Regarding the behavior of the predict() function.
1) does it take the sample data stored in the mymodel object?
2) not getting what the difference between the terms= argument and the params= argument (from the manual and my own experimentation)? Could you please help me understand this?
3) Beyond testing homoscedasticity, I would like to produce a couple of prediction holding a set of covariates at their mean, and only vary one variable (the range specified by me, and not the sample). [is this where you propose I feed the predict() with new data?]
Many thanks
Regarding the behavior of the predict() function.
1) does it take the sample data stored in the mymodel object?
2) not getting what the difference between the terms= argument and the params= argument (from the manual and my own experimentation)? Could you please help me understand this?
3) Beyond testing homoscedasticity, I would like to produce a couple of prediction holding a set of covariates at their mean, and only vary one variable (the range specified by me, and not the sample). [is this where you propose I feed the predict() with new data?]
Many thanks

 Posts: 1099
 Joined: Mon Oct 19, 2009 10:34 am
Re: check the homoscedasticity assumption in logistic models
1) Unless you specify an alternative data frame with the newdata= argument in the function call, then yes this is the data that it uses.
2) The terms= argument allows you to choose by a subset of data via the variable names, params= allows to to choose via a subset of data via the parameter names (FP_...). The terms= parameter is only used if you have also specified type="terms", however in this case it takes precedence over the params= argument. The params= parameter applies to all prediction types.
3) Yes, my suggestion here was to create a new data frame which has the same variable names as the original data, but has the variables not being investigated held at their means. You would then pass this to the predict function via the newdata= argument.
2) The terms= argument allows you to choose by a subset of data via the variable names, params= allows to to choose via a subset of data via the parameter names (FP_...). The terms= parameter is only used if you have also specified type="terms", however in this case it takes precedence over the params= argument. The params= parameter applies to all prediction types.
3) Yes, my suggestion here was to create a new data frame which has the same variable names as the original data, but has the variables not being investigated held at their means. You would then pass this to the predict function via the newdata= argument.