I have fitted a 2-level binomial model using mcmc. I wish to use the estimated probabilities in a matching algorithm (propensity score matching).
How can I extract the fitted propensity scores (the estimated prob of Y for all obs) from a mlwinfitMCMC object into a numeric object?
In base R the command is simply: p <- model$fitted.values
storing fitted values (propensity scores) from mlwinfitMCMC object
-
- Posts: 1384
- Joined: Mon Oct 19, 2009 10:34 am
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
You should be able to use the standard R functions predict or fitted.values to generate these (see line 815 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R) for the implementation. Note however that this is not currently available for all model types supported by MLwiN. As you are using MCMC you may wish to create a chain of these predictions (to preserve the uncertainty information) in which case you could adapt the code in https://github.com/rforge/r2mlwin/blob/ ... redLines.r and https://github.com/rforge/r2mlwin/blob/ ... edCurves.r which calculates these predictions and then subsequently plots them.
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
Thanks for getting back, Chris. I worked around the immediate need by just calculating predictions in MLwiN UI then exporting worksheet back into R but this is obviously far from ideal. I probably should have given more info in my first post but I've tried both predict, fitted and augment and received error messages for all.
For predict and fitted, " Error in as.matrix(indata[x.names]) %*% as.matrix(object@FP[fp.names]): requires numeric/complex matrix/vector arguments". Seems to be the data stored in mlwinfitMCMC objects isn't the right class for these functions. Is this because it's 2-lev mcmc binomial and it is just not supported?
For augment, when I just try the standard augment I receive "augment method not yet implemented for mlwinfitMCMC objects". When I use augment.mlwinfitMCMC I receive "could not find function "augment.mlwinfitMCMC" and when I write the function for augment.mlwinfitMCMC following code in the github I receive "augment method not yet implemented for mlwinfitMCMC objects" again.
Your latter solution is an interesting one which nods to a problem highlighted in the literature on Bayes propensity score matching about how to incorporate uncertainty into propensity scores (every obs having a fixed value isn't very Bayesian!) for example An (2010) Kaplan & Chen (2013) and Zigler (2016, 2020). Following your suggestion would then rely on matching on a line rather than a point estimate (is that right?) and so I am unsure how it could be incorporated into common matching algorithms (I'm using Matching https://cran.r-project.org/web/packages ... index.html) which rely on a numeric vector for matching - a little beyond my level maybe.
For predict and fitted, " Error in as.matrix(indata[x.names]) %*% as.matrix(object@FP[fp.names]): requires numeric/complex matrix/vector arguments". Seems to be the data stored in mlwinfitMCMC objects isn't the right class for these functions. Is this because it's 2-lev mcmc binomial and it is just not supported?
For augment, when I just try the standard augment I receive "augment method not yet implemented for mlwinfitMCMC objects". When I use augment.mlwinfitMCMC I receive "could not find function "augment.mlwinfitMCMC" and when I write the function for augment.mlwinfitMCMC following code in the github I receive "augment method not yet implemented for mlwinfitMCMC objects" again.
Your latter solution is an interesting one which nods to a problem highlighted in the literature on Bayes propensity score matching about how to incorporate uncertainty into propensity scores (every obs having a fixed value isn't very Bayesian!) for example An (2010) Kaplan & Chen (2013) and Zigler (2016, 2020). Following your suggestion would then rely on matching on a line rather than a point estimate (is that right?) and so I am unsure how it could be incorporated into common matching algorithms (I'm using Matching https://cran.r-project.org/web/packages ... index.html) which rely on a numeric vector for matching - a little beyond my level maybe.
-
- Posts: 1384
- Joined: Mon Oct 19, 2009 10:34 am
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
The predict and fitted functions not working sounds like a bug, as the part where you are getting the error message should just be taking the vector of fixed part parameter coefficients and multiplying them by the corresponding data columns. Binomial MCMC is supposed to work (see line 815 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R). Are you able to provide any further information regarding your model specification?
Augment is not yet implemented within R2MLwiN (see line 1143 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R). I am not sure why the version that you wrote isn't being picked up, but you could try renaming your function to have a unique name to see if it works then. If not then it may still be attempting to call the unimplemented R2MLwiN version.
The details of how to handle this in a Bayesian way is a bit beyond me too. I suspect that this would be a question for Bill.
Augment is not yet implemented within R2MLwiN (see line 1143 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R). I am not sure why the version that you wrote isn't being picked up, but you could try renaming your function to have a unique name to see if it works then. If not then it may still be attempting to call the unimplemented R2MLwiN version.
The details of how to handle this in a Bayesian way is a bit beyond me too. I suspect that this would be a question for Bill.
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
Here is model spec, basically just default with a bigger chain. Full formula does include a lot of variables and data is relatively large (>15k rows).
I used the code from line 815 to write a separate predict function but it results in the same error message. Also tried your suggestion with augment and same error before.
Code: Select all
f <- logit(treatment, cons) ~ 1 + level1_var + level2_var + (1 | lev2)
mcmc1 <- runMLwiN(Formula = f, D = "Binomial", data = dat,
estoptions = list(EstM = 1, mcmcMeth = list(burnin = 500, iterations = 20000)))
-
- Posts: 1384
- Joined: Mon Oct 19, 2009 10:34 am
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
I tested this with some of the MLwiN example models and it seemed to work for me (see below). I can't see anything obviously different about the way that you have set up your model to these. Can you check that these examples work for you?
Code: Select all
> library(R2MLwiN)
>
> ## Read bang1 data
> data(bang1, package = "R2MLwiN")
>
> (mymodel1 <- runMLwiN(logit(use) ~ 1 + age, D = "Binomial", estoptions = list(EstM = 1), data = bang1))
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN (version: 3.05) multilevel model (Binomial)
Estimation algorithm: MCMC Elapsed time : 2.75s
Number of obs: 1934 (from total 1934) Number of iter.: 5000 Chains: 1 Burn-in: 500
Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
2591.294 2589.293 2.001 2593.294
---------------------------------------------------------------------------------------------------
The model formula:
logit(use) ~ 1 + age
Level 1: l1id
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Cred. Interval] ESS
Intercept -0.43858 0.04731 -9.27 1.863e-20 *** -0.53074 -0.34677 1100
age 0.00681 0.00509 1.34 0.1803 -0.00333 0.01649 1145
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the l1id level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_bcons_1 1.00000 1e-05 1.00000 1.00000 5000
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> fit1 <- predict(mymodel1)
> head(fit1)
[1] -0.3129285 -0.4764604 -0.4287636 -0.3810668 -0.5309711 -0.5173434
>
> (mymodel4 <- runMLwiN(logit(use) ~ 1 + age + lc + (1 | district), D = "Binomial", estoptions = list(EstM = 1), data = bang1))
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN (version: 3.05) multilevel model (Binomial)
N min mean max N_complete min_complete mean_complete max_complete
district 60 2 32.23333 118 60 2 32.23333 118
Estimation algorithm: MCMC Elapsed time : 6.28s
Number of obs: 1934 (from total 1934) Number of iter.: 5000 Chains: 1 Burn-in: 500
Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
2396.324 2355.120 41.204 2437.528
---------------------------------------------------------------------------------------------------
The model formula:
logit(use) ~ 1 + age + lc + (1 | district)
Level 2: district Level 1: l1id
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Cred. Interval] ESS
Intercept -1.48934 0.13771 -10.82 2.919e-27 *** -1.76712 -1.22862 94
age -0.02604 0.00773 -3.37 0.0007579 *** -0.04114 -0.01136 287
lcOne_child 1.11009 0.15247 7.28 3.317e-13 *** 0.81361 1.41775 273
lcTwo_children 1.32959 0.16892 7.87 3.511e-15 *** 1.00613 1.66664 164
lcThree_plus 1.30533 0.16875 7.74 1.03e-14 *** 0.98436 1.62393 127
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the district level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.30225 0.09800 0.15032 0.53346 456
---------------------------------------------------------------------------------------------------
The random part estimates at the l1id level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_bcons_1 1.00000 1e-05 1.00000 1.00000 5000
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> fit4 <- predict(mymodel4)
> head(fit4)
[1] -0.6641047 -1.3445875 -0.1972416 -0.4037507 -1.1363043 -1.1883751
>
> (mymodel6 <- runMLwiN(logit(use) ~ 1 + age + lc + urban + (1 + urban | district), D = "Binomial", estoptions = list(EstM = 1), data = bang1))
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
MLwiN (version: 3.05) multilevel model (Binomial)
N min mean max N_complete min_complete mean_complete max_complete
district 60 2 32.23333 118 60 2 32.23333 118
Estimation algorithm: MCMC Elapsed time : 9s
Number of obs: 1934 (from total 1934) Number of iter.: 5000 Chains: 1 Burn-in: 500
Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
2327.272 2269.602 57.670 2384.941
---------------------------------------------------------------------------------------------------
The model formula:
logit(use) ~ 1 + age + lc + urban + (1 + urban | district)
Level 2: district Level 1: l1id
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Cred. Interval] ESS
Intercept -1.72364 0.16457 -10.47 1.142e-25 *** -2.08125 -1.39809 64
age -0.02594 0.00774 -3.35 0.0008015 *** -0.04135 -0.01102 292
lcOne_child 1.12629 0.15253 7.38 1.535e-13 *** 0.83731 1.43950 279
lcTwo_children 1.34653 0.16855 7.99 1.363e-15 *** 1.01308 1.67241 182
lcThree_plus 1.34301 0.17061 7.87 3.501e-15 *** 1.00944 1.65942 96
urbanUrban 0.83932 0.19878 4.22 2.418e-05 *** 0.46094 1.25496 93
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the district level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.43687 0.14548 0.22124 0.78523 186
cov_Intercept_urbanUrban -0.46159 0.19214 -0.92123 -0.15848 118
var_urbanUrban 0.78897 0.33255 0.29580 1.58722 96
---------------------------------------------------------------------------------------------------
The random part estimates at the l1id level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_bcons_1 1.00000 1e-05 1.00000 1.00000 5000
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
> fit6 <- predict(mymodel6)
> head(fit6)
[1] -0.01958227 -0.74011857 0.42486231 0.23978036 -0.53262847 -0.58450099
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
I ran these examples and they all worked no problem so after a bit of trial and error I discovered the only reason it wasn't working is because one of the model variables was a character vector. Transforming it to factor and then running the full model allowed predict() to work.
What a load of work for something so simple - as ever!
Thanks a lot for working through this with me, Chris.
What a load of work for something so simple - as ever!

-
- Posts: 1384
- Joined: Mon Oct 19, 2009 10:34 am
Re: storing fitted values (propensity scores) from mlwinfitMCMC object
Thanks for letting us know what the cause turned out to be. I suspect that this may become more common in future as there was a change in the default stringsAsFactors setting in R version 4 (see https://developer.r-project.org/Blog/pu ... asfactors/). I will think about whether there is any way that R2MLwiN can provide a better warning/error message when this happens.