Page 1 of 1

Using predict() to calculate predicted value

Posted: Mon Apr 16, 2018 2:54 pm
by vivian1234
Hi,

I'm following the suggestion from this post https://www.cmm.bristol.ac.uk/forum/vie ... a976#p4572 to replicate the standardised residual vs predicted value within R.

I am two models. Model 1 is a 2-level random slope model with continuous outcome and IGLS is used to estimate the model. Model 2 is a 2-level random coefficient cumulative logit model with a ordinal outcome with 3 categories and MCMC estimation is used.

I have no problem to calculate the predicted value for Model 1 using the following code:

Code: Select all

pred <- data.frame(level2 = model1@data$lv2, pred.outcome = predict(model1))
pred.outcome.mean <- summaryBy(pred.outcome ~ level2, data = pred, FUN = mean)$pred.outcome.mean
However, when I used the code for Model 2:

Code: Select all

pred <- data.frame(level2 = model2@data$lv2, pred.outcome = predict(model2))
I received this error:
Error in `[.data.frame`(indata, x.names) : undefined columns selected
I've also tried to add type = "response" in the predict(), the same error appeared.

Anyone has any idea?

Thanks a lot.

Vivian

Re: Using predict() to calculate predicted value

Posted: Mon Apr 16, 2018 4:02 pm
by ChrisCharlton
It sounds like there is probably a mismatch between the variable names referenced in the parameter names and those in the data associated with the model. If you look in the predict method in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R you will see that x.names is created as follows:

Code: Select all

  if (is.null(params)) {
    fp.names <- names(FP <- object@FP)
  } else {
    fp.names <- params
  }
  x.names <- sub("FP_", "", fp.names)
i.e. it takes the names for the fixed part parameters and removes the "FP_" from the beginning to find their associated data.

If you compare these to what is in the data (i.e. sub("FP_", "", names(model2@FP)) with colnames(model2@data)) I suspect that you will find a mismatch. Without more details of how they differ I can't suggest a fix.

Re: Using predict() to calculate predicted value

Posted: Tue Apr 17, 2018 9:25 am
by vivian1234
Hi Chris,

Thanks for your quick reply.
I replicate the example from http://www.bristol.ac.uk/cmm/media/r2ml ... CGuide13.R.

Code: Select all

data(alevchem, package = "R2MLwiN")
alevchem$gcseav <- double2singlePrecision(alevchem$gcse_tot/alevchem$gcse_no - 6)


model1 <- runMLwiN(a_point ~ 1 + gcseav + I(gcseav^2) + I(gcseav^3) + gender + (1 | pupil), 
                    estoptions = list(EstM = 1), data = alevchem)

model2 <- runMLwiN(logit(a_point, cons, 6) ~ 1 + gcseav[1:5] + I(gcseav^2)[1:5] + 
                     I(gcseav^3)[1:5] + gender[1:5],
                    D = "Ordered Multinomial", estoptions = list(EstM = 1), data = alevchem)

pred1 <- predict(model1)
pred2 <- predict(model2)
and this is the result:
> pred1 <- predict(model1)
> pred2 <- predict(model2)
Error in `[.data.frame`(indata, x.names) : undefined columns selected
I hope this example makes sense to you.

Thanks.

Vivian

Re: Using predict() to calculate predicted value

Posted: Tue Apr 17, 2018 10:05 am
by ChrisCharlton
Yes that does make sense, thanks. If you look at the variables associated with the model parameters:

Code: Select all

> sub("FP_", "", names(model2@FP))
[1] "Intercept_F"        "Intercept_E"        "Intercept_D"        "Intercept_C"       
[5] "Intercept_B"        "gcseav_12345"       "I(gcseav^2)_12345"  "I(gcseav^3)_12345" 
[9] "genderfemale_12345"
and the variables in the data attached to the model objects:

Code: Select all

> colnames(model2@data)
[1] "a_point"      "l1id"         "Intercept"    "gcseav"       "I(gcseav^2)"  "I(gcseav^3)" 
[7] "genderfemale" "cons"  
You will see that there is a mismatch.

I believe that this is because the extra variables are created internally within MLwiN, rather than generated on the R side, so R2MLwiN is not aware of them. I will think about whether there is a way to resolve this, however in the mean time you would need to manually add the necessary variables to the data frame associated with the model in order to make the predict() function work.

Re: Using predict() to calculate predicted value

Posted: Tue Apr 17, 2018 10:44 am
by vivian1234
Do you mean in this case I need to manually add 4 columns of Intercept into the data frame?

Thanks,
Vivian

Re: Using predict() to calculate predicted value

Posted: Tue Apr 17, 2018 10:58 am
by ChrisCharlton
The data stored with the model object is the version sent to MLwiN, rather than the version used to fit the model. When setting up the multinomial model MlwiN will expand the data so that each possible response value get its own row. This is referred to on page 168 of the MLwiN user's guide (http://www.bristol.ac.uk/cmm/media/soft ... al-web.pdf). In order to recreate the prediction that you would get via the MLwiN interface you would need to generate this expanded dataset and then use this instead of the data attached to the model object. One way to do this might be to add the debugmode=TRUE option to your runMLwiN function call, save the expanded data from MLwiN and then load this into R and use it for the prediction.

Re: Using predict() to calculate predicted value

Posted: Tue Apr 17, 2018 11:10 am
by vivian1234
I get what you mean. Thanks!!