Using predict() to calculate predicted value

Welcome to the forum for R2MLwiN users. Feel free to post your question about R2MLwiN here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to R2MLwiN: Running MLwiN from within R >> http://www.bris.ac.uk/cmm/software/r2mlwin/
Post Reply
vivian1234
Posts: 30
Joined: Tue Apr 12, 2016 10:54 am

Using predict() to calculate predicted value

Post by vivian1234 »

Hi,

I'm following the suggestion from this post https://www.cmm.bristol.ac.uk/forum/vie ... a976#p4572 to replicate the standardised residual vs predicted value within R.

I am two models. Model 1 is a 2-level random slope model with continuous outcome and IGLS is used to estimate the model. Model 2 is a 2-level random coefficient cumulative logit model with a ordinal outcome with 3 categories and MCMC estimation is used.

I have no problem to calculate the predicted value for Model 1 using the following code:

Code: Select all

pred <- data.frame(level2 = model1@data$lv2, pred.outcome = predict(model1))
pred.outcome.mean <- summaryBy(pred.outcome ~ level2, data = pred, FUN = mean)$pred.outcome.mean
However, when I used the code for Model 2:

Code: Select all

pred <- data.frame(level2 = model2@data$lv2, pred.outcome = predict(model2))
I received this error:
Error in `[.data.frame`(indata, x.names) : undefined columns selected
I've also tried to add type = "response" in the predict(), the same error appeared.

Anyone has any idea?

Thanks a lot.

Vivian
ChrisCharlton
Posts: 1351
Joined: Mon Oct 19, 2009 10:34 am

Re: Using predict() to calculate predicted value

Post by ChrisCharlton »

It sounds like there is probably a mismatch between the variable names referenced in the parameter names and those in the data associated with the model. If you look in the predict method in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R you will see that x.names is created as follows:

Code: Select all

  if (is.null(params)) {
    fp.names <- names(FP <- object@FP)
  } else {
    fp.names <- params
  }
  x.names <- sub("FP_", "", fp.names)
i.e. it takes the names for the fixed part parameters and removes the "FP_" from the beginning to find their associated data.

If you compare these to what is in the data (i.e. sub("FP_", "", names(model2@FP)) with colnames(model2@data)) I suspect that you will find a mismatch. Without more details of how they differ I can't suggest a fix.
vivian1234
Posts: 30
Joined: Tue Apr 12, 2016 10:54 am

Re: Using predict() to calculate predicted value

Post by vivian1234 »

Hi Chris,

Thanks for your quick reply.
I replicate the example from http://www.bristol.ac.uk/cmm/media/r2ml ... CGuide13.R.

Code: Select all

data(alevchem, package = "R2MLwiN")
alevchem$gcseav <- double2singlePrecision(alevchem$gcse_tot/alevchem$gcse_no - 6)


model1 <- runMLwiN(a_point ~ 1 + gcseav + I(gcseav^2) + I(gcseav^3) + gender + (1 | pupil), 
                    estoptions = list(EstM = 1), data = alevchem)

model2 <- runMLwiN(logit(a_point, cons, 6) ~ 1 + gcseav[1:5] + I(gcseav^2)[1:5] + 
                     I(gcseav^3)[1:5] + gender[1:5],
                    D = "Ordered Multinomial", estoptions = list(EstM = 1), data = alevchem)

pred1 <- predict(model1)
pred2 <- predict(model2)
and this is the result:
> pred1 <- predict(model1)
> pred2 <- predict(model2)
Error in `[.data.frame`(indata, x.names) : undefined columns selected
I hope this example makes sense to you.

Thanks.

Vivian
ChrisCharlton
Posts: 1351
Joined: Mon Oct 19, 2009 10:34 am

Re: Using predict() to calculate predicted value

Post by ChrisCharlton »

Yes that does make sense, thanks. If you look at the variables associated with the model parameters:

Code: Select all

> sub("FP_", "", names(model2@FP))
[1] "Intercept_F"        "Intercept_E"        "Intercept_D"        "Intercept_C"       
[5] "Intercept_B"        "gcseav_12345"       "I(gcseav^2)_12345"  "I(gcseav^3)_12345" 
[9] "genderfemale_12345"
and the variables in the data attached to the model objects:

Code: Select all

> colnames(model2@data)
[1] "a_point"      "l1id"         "Intercept"    "gcseav"       "I(gcseav^2)"  "I(gcseav^3)" 
[7] "genderfemale" "cons"  
You will see that there is a mismatch.

I believe that this is because the extra variables are created internally within MLwiN, rather than generated on the R side, so R2MLwiN is not aware of them. I will think about whether there is a way to resolve this, however in the mean time you would need to manually add the necessary variables to the data frame associated with the model in order to make the predict() function work.
vivian1234
Posts: 30
Joined: Tue Apr 12, 2016 10:54 am

Re: Using predict() to calculate predicted value

Post by vivian1234 »

Do you mean in this case I need to manually add 4 columns of Intercept into the data frame?

Thanks,
Vivian
ChrisCharlton
Posts: 1351
Joined: Mon Oct 19, 2009 10:34 am

Re: Using predict() to calculate predicted value

Post by ChrisCharlton »

The data stored with the model object is the version sent to MLwiN, rather than the version used to fit the model. When setting up the multinomial model MlwiN will expand the data so that each possible response value get its own row. This is referred to on page 168 of the MLwiN user's guide (http://www.bristol.ac.uk/cmm/media/soft ... al-web.pdf). In order to recreate the prediction that you would get via the MLwiN interface you would need to generate this expanded dataset and then use this instead of the data attached to the model object. One way to do this might be to add the debugmode=TRUE option to your runMLwiN function call, save the expanded data from MLwiN and then load this into R and use it for the prediction.
vivian1234
Posts: 30
Joined: Tue Apr 12, 2016 10:54 am

Re: Using predict() to calculate predicted value

Post by vivian1234 »

I get what you mean. Thanks!!
Post Reply