Page **1** of **1**

### Using predict() to calculate predicted value

Posted: **Mon Apr 16, 2018 2:54 pm**

by **vivian1234**

Hi,

I'm following the suggestion from this post

https://www.cmm.bristol.ac.uk/forum/vie ... a976#p4572 to replicate the standardised residual vs predicted value within R.

I am two models. Model 1 is a 2-level random slope model with continuous outcome and IGLS is used to estimate the model. Model 2 is a 2-level random coefficient cumulative logit model with a ordinal outcome with 3 categories and MCMC estimation is used.

I have no problem to calculate the predicted value for Model 1 using the following code:

Code: Select all

```
pred <- data.frame(level2 = model1@data$lv2, pred.outcome = predict(model1))
pred.outcome.mean <- summaryBy(pred.outcome ~ level2, data = pred, FUN = mean)$pred.outcome.mean
```

However, when I used the code for Model 2:

Code: Select all

`pred <- data.frame(level2 = model2@data$lv2, pred.outcome = predict(model2))`

I received this error:

Error in `[.data.frame`(indata, x.names) : undefined columns selected

I've also tried to add type = "response" in the predict(), the same error appeared.

Anyone has any idea?

Thanks a lot.

Vivian

### Re: Using predict() to calculate predicted value

Posted: **Mon Apr 16, 2018 4:02 pm**

by **ChrisCharlton**

It sounds like there is probably a mismatch between the variable names referenced in the parameter names and those in the data associated with the model. If you look in the

**predict** method in

https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R you will see that

**x.names** is created as follows:

Code: Select all

```
if (is.null(params)) {
fp.names <- names(FP <- object@FP)
} else {
fp.names <- params
}
x.names <- sub("FP_", "", fp.names)
```

i.e. it takes the names for the fixed part parameters and removes the "FP_" from the beginning to find their associated data.

If you compare these to what is in the data (i.e.

**sub("FP_", "", names(model2@FP))** with

**colnames(model2@data)**) I suspect that you will find a mismatch. Without more details of how they differ I can't suggest a fix.

### Re: Using predict() to calculate predicted value

Posted: **Tue Apr 17, 2018 9:25 am**

by **vivian1234**

Hi Chris,

Thanks for your quick reply.

I replicate the example from

http://www.bristol.ac.uk/cmm/media/r2ml ... CGuide13.R.

Code: Select all

```
data(alevchem, package = "R2MLwiN")
alevchem$gcseav <- double2singlePrecision(alevchem$gcse_tot/alevchem$gcse_no - 6)
model1 <- runMLwiN(a_point ~ 1 + gcseav + I(gcseav^2) + I(gcseav^3) + gender + (1 | pupil),
estoptions = list(EstM = 1), data = alevchem)
model2 <- runMLwiN(logit(a_point, cons, 6) ~ 1 + gcseav[1:5] + I(gcseav^2)[1:5] +
I(gcseav^3)[1:5] + gender[1:5],
D = "Ordered Multinomial", estoptions = list(EstM = 1), data = alevchem)
pred1 <- predict(model1)
pred2 <- predict(model2)
```

and this is the result:

> pred1 <- predict(model1)

> pred2 <- predict(model2)

Error in `[.data.frame`(indata, x.names) : undefined columns selected

I hope this example makes sense to you.

Thanks.

Vivian

### Re: Using predict() to calculate predicted value

Posted: **Tue Apr 17, 2018 10:05 am**

by **ChrisCharlton**

Yes that does make sense, thanks. If you look at the variables associated with the model parameters:

Code: Select all

```
> sub("FP_", "", names(model2@FP))
[1] "Intercept_F" "Intercept_E" "Intercept_D" "Intercept_C"
[5] "Intercept_B" "gcseav_12345" "I(gcseav^2)_12345" "I(gcseav^3)_12345"
[9] "genderfemale_12345"
```

and the variables in the data attached to the model objects:

Code: Select all

```
> colnames(model2@data)
[1] "a_point" "l1id" "Intercept" "gcseav" "I(gcseav^2)" "I(gcseav^3)"
[7] "genderfemale" "cons"
```

You will see that there is a mismatch.

I believe that this is because the extra variables are created internally within MLwiN, rather than generated on the R side, so R2MLwiN is not aware of them. I will think about whether there is a way to resolve this, however in the mean time you would need to manually add the necessary variables to the data frame associated with the model in order to make the

**predict()** function work.

### Re: Using predict() to calculate predicted value

Posted: **Tue Apr 17, 2018 10:44 am**

by **vivian1234**

Do you mean in this case I need to manually add 4 columns of Intercept into the data frame?

Thanks,

Vivian

### Re: Using predict() to calculate predicted value

Posted: **Tue Apr 17, 2018 10:58 am**

by **ChrisCharlton**

The data stored with the model object is the version sent to MLwiN, rather than the version used to fit the model. When setting up the multinomial model MlwiN will expand the data so that each possible response value get its own row. This is referred to on page 168 of the MLwiN user's guide (

http://www.bristol.ac.uk/cmm/media/soft ... al-web.pdf). In order to recreate the prediction that you would get via the MLwiN interface you would need to generate this expanded dataset and then use this instead of the data attached to the model object. One way to do this might be to add the

**debugmode=TRUE** option to your runMLwiN function call, save the expanded data from MLwiN and then load this into R and use it for the prediction.

### Re: Using predict() to calculate predicted value

Posted: **Tue Apr 17, 2018 11:10 am**

by **vivian1234**

I get what you mean. Thanks!!