Hi
This relates to the question of R generic methods. I am replicating "7.4 Predicted probabilities from a multilevel model" in LEMMA, and translating it to R2mlwin objects. See here,https://www.cmm.bris.ac.uk/lemma/mod/le ... pageid=970
I am having an issue with translating this part of the code to R2mlwin:
X < model.matrix(terms(fit), mydatapred)
terms() does not take R2mlwin objects, and wonder if anyone has a suggestion for an elegant workaround? I know that I could drop each and every notused covariate, but this becomes laborious for large models and models with many dummies.
cheers
model.matrix() and terms() for R2mlwin objects
Re: model.matrix() and terms() for R2mlwin objects
Chris, do you happen to have any input on this question?
Many thanks in advance
Adel
Many thanks in advance
Adel

 Posts: 1250
 Joined: Mon Oct 19, 2009 10:34 am
Re: model.matrix() and terms() for R2mlwin objects
If I understand the module correctly then it is just creating a prediction of the fixed part on a slightly modified version of the data.
The following code that uses the generic predict function should I believe do the same:
You could instead go straight to calculating mediapredprob, avoiding using the VGAM package by substituting the following predict call:
The following code that uses the generic predict function should I believe do the same:
Code: Select all
> mydata < read.table("7.4.txt", header = TRUE, sep = ",")
> require(R2MLwiN)
Loading required package: R2MLwiN
Loading required package: stats4
Loading required package: lattice
Loading required package: memisc
Loading required package: MASS
Attaching package: ‘memisc’
The following objects are masked from ‘package:stats’:
contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:
as.array
Loading required package: coda
The MLwiN_path option is currently set to C:/Program Files (x86)/MLwiN v2.36/
To change this use: options(MLwiN_path="<path to MLwiN>")
> (fit < runMLwiN(logit(antemed) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc + (1 comm), D = "Binomial", estoptions = list(nonlinear = c(N = 1, M = 2)), data = mydata))
/nogui option ignored
ECHO 0
Some input variables held data in more precision than MLwiN supports, these have been rounded
Echoing is ON
BATC 1
Batch mode is ON
MAXI 2
STAR
iteration 0
iteration 1
Convergence not achieved
TOLE 2
MAXI 20
NEXT
iteration 2
iteration 3
iteration 4
Convergence achieved
ECHO 0
Execution completed
*************************************************
MLwiN (version: 2.36) multilevel model (Binomial)
N min mean max N_complete min_complete mean_complete max_complete
comm 361 3 14.86427 25 361 3 14.86427 25
Estimation algorithm: IGLS PQL2 Elapsed time : 0.67s
Number of obs: 5366 (from total 5366) The model converged after 5 iterations.
Log likelihood: NA
Deviance statistic: NA

The model formula:
logit(antemed) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc +
(1  comm)
Level 2: comm Level 1: l1id

The fixed part estimates:
Coef. Std. Err. z Pr(>z) [95% Conf. Interval]
Intercept 0.45468 0.08273 5.50 3.883e08 *** 0.61683 0.29254
magec 0.00039 0.00655 0.06 0.9528 0.01323 0.01246
magecsq 0.00101 0.00068 1.47 0.1403 0.00235 0.00033
meduc2 0.54790 0.08460 6.48 9.386e11 *** 0.38209 0.71371
meduc3 1.30924 0.09778 13.39 6.996e41 *** 1.11759 1.50089
wealthc 0.39775 0.02959 13.44 3.344e41 *** 0.33976 0.45574
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The random part estimates at the comm level:
Coef. Std. Err.
var_Intercept 0.85385 0.09358

The random part estimates at the l1id level:
Coef. Std. Err.
var_bcons_1 1.00000 0.00000
*************************************************
> mydatapred < mydata
> mydatapred$magec < 0
> mydatapred$magecsq < 0
>
> medianpredproblogit < predict(fit, newdata = mydatapred)
>
> require(VGAM)
Loading required package: VGAM
Loading required package: splines
Attaching package: ‘VGAM’
The following object is masked from ‘package:coda’:
nvar
>
> medianpredprob < logit(medianpredproblogit, inverse = TRUE)
>
> mydatapred2 < unique(data.frame(cbind("medianpredprob" = medianpredprob, "medianpredproblogit" = medianpredproblogit, wealth = mydatapred$wealth, meduc = mydata$meduc)))
>
> mydatapred2 < mydatapred2[order(mydatapred2$wealth, mydatapred2$meduc), ]
>
> colnames(mydatapred2)[1:2] < c("medianpredprob","medianpredproblogit")
>
> mydatapred2
medianpredprob medianpredproblogit wealth meduc
13 0.2221039 1.25344762 1 1
24 0.3305831 0.70554891 1 2
97 0.5139447 0.05579344 1 3
4 0.2982393 0.85569629 2 1
3 0.4236524 0.30779758 2 2
7 0.6114817 0.45354477 2 3
18 0.3874735 0.45794487 3 1
1 0.5224733 0.08995384 3 2
8 0.7008390 0.85129618 3 3
55 0.4849562 0.06019354 4 1
5 0.6195657 0.48770517 4 2
2 0.7771349 1.24904751 4 3
478 0.5835971 0.33755771 5 1
66 0.7079516 0.88545642 5 2
36 0.8384579 1.64679876 5 3
Code: Select all
medianpredprob < predict(fit, type="response", newdata = mydatapred)
Re: model.matrix() and terms() for R2mlwin objects
(A)
Thanks, Chris. Good idea, re bypassing the VGAM package entirely. It does not seem that predict() is sensitive about having variables in the data passed to it, that are not used in the model. Right?
(B)
Another issue. I replicated the module in both lme4 and R2mlwin, but I am getting quite different figures in terms of predicted probabilities. Any ideas way? Both methods use clusterspecific predictions right? and both are setting the covariates to their mean. This should produce the exact same output
(C)
On a related note. I am setting se.fit=T but no standard errors are produced. Any ideas way?
Thanks, Chris. Good idea, re bypassing the VGAM package entirely. It does not seem that predict() is sensitive about having variables in the data passed to it, that are not used in the model. Right?
(B)
Another issue. I replicated the module in both lme4 and R2mlwin, but I am getting quite different figures in terms of predicted probabilities. Any ideas way? Both methods use clusterspecific predictions right? and both are setting the covariates to their mean. This should produce the exact same output
Code: Select all
mydata < read.table("7.4.txt", header = TRUE, sep = ",")
# run with lme4
library(lme4)
(fit < glmer(antemed ~ magec + magecsq + meduc2 + meduc3 + wealthc + (1  comm), data = mydata, family = binomial("logit")))
summary(fit)
mydatapred < mydata
# set to sample mean
mydatapred$magec < 0
# set to sample mean
mydatapred$magecsq < 0
X < model.matrix(terms(fit), mydatapred)
b < fixef(fit)
medianpredproblogit < X %*% b
library(VGAM)
medianpredprob < logit(medianpredproblogit, inverse = TRUE)
mydatapred2 < unique(data.frame(cbind("medianpredprob" = medianpredprob,
"medianpredproblogit" = medianpredproblogit, wealth = mydatapred$wealth, meduc = mydata$meduc)))
mydatapred2 < mydatapred2[order(mydatapred2$wealth, mydatapred2$meduc), ]
colnames(mydatapred2)[1:2] < c("medianpredprob","medianpredproblogit")
mydatapred2
# run with R2mlwin
library(R2MLwiN)
mydata$cons < 1
(fit.1 < runMLwiN(logit(antemed, cons) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc + (1  comm), data = mydata, D = "Binomial", estoptions = list(EstM = 0)))
medianpredprob2 < predict(fit.1, type="response", newdata = mydatapred) # set at mean
mydatapred22 < unique(data.frame(cbind("medianpredprob" = medianpredprob2,
wealth = mydatapred$wealth, meduc =
mydata$meduc)))
mydatapred22 < mydatapred22[order(mydatapred22$wealth, mydatapred22$meduc), ]
colnames(mydatapred22)[1:2] < c("medianpredprob")
mydatapred22
(C)
On a related note. I am setting se.fit=T but no standard errors are produced. Any ideas way?
Code: Select all
> medianpredprob2 < predict(fit.1, type="response", newdata = mydatapred, se.fit=T) #
> str(medianpredprob2)
Named num [1:5366] 0.519 0.748 0.43 0.323 0.606 ...
 attr(*, "names")= chr [1:5366] "1" "2" "3" "4" ...

 Posts: 1250
 Joined: Mon Oct 19, 2009 10:34 am
Re: model.matrix() and terms() for R2mlwin objects
(A) Yes, this is right. The R2MLwiN predict command extracts the data from the passedin data frame using the model formula in the same way as when estimating the model (see setMethod("predict"... in https://github.com/rforge/r2mlwin/blob/ ... nfitIGLS.R).
(B) This might be due to the method of quasiliklihood being used in MLwiN. When testing the solution for my previous post I found that I had to use PQL2 before I got results that closely much matched those from R.
(C) Yes this is true, the se.fit option only does something if type is equal to "terms" or "link". I think that I was following the example of other implementations of the predict() function in R when implementing this behaviour, however I might be able to revisit this in the future if this isn't the case.
(B) This might be due to the method of quasiliklihood being used in MLwiN. When testing the solution for my previous post I found that I had to use PQL2 before I got results that closely much matched those from R.
(C) Yes this is true, the se.fit option only does something if type is equal to "terms" or "link". I think that I was following the example of other implementations of the predict() function in R when implementing this behaviour, however I might be able to revisit this in the future if this isn't the case.