storing fitted values (propensity scores) from mlwinfitMCMC object

wfleming · Post by **wfleming** » Thu Aug 13, 2020 9:25 pm

I have fitted a 2-level binomial model using mcmc. I wish to use the estimated probabilities in a matching algorithm (propensity score matching).
How can I extract the fitted propensity scores (the estimated prob of Y for all obs) from a mlwinfitMCMC object into a numeric object?

In base R the command is simply: p <- model$fitted.values

ChrisCharlton · Post by **ChrisCharlton** » Tue Aug 25, 2020 12:20 pm

You should be able to use the standard R functions predict or fitted.values to generate these (see line 815 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R) for the implementation. Note however that this is not currently available for all model types supported by MLwiN. As you are using MCMC you may wish to create a chain of these predictions (to preserve the uncertainty information) in which case you could adapt the code in https://github.com/rforge/r2mlwin/blob/ ... redLines.r and https://github.com/rforge/r2mlwin/blob/ ... edCurves.r which calculates these predictions and then subsequently plots them.

wfleming · Post by **wfleming** » Thu Aug 27, 2020 2:54 pm

Thanks for getting back, Chris. I worked around the immediate need by just calculating predictions in MLwiN UI then exporting worksheet back into R but this is obviously far from ideal. I probably should have given more info in my first post but I've tried both predict, fitted and augment and received error messages for all.

For predict and fitted, " Error in as.matrix(indata[x.names]) %*% as.matrix(object@FP[fp.names]): requires numeric/complex matrix/vector arguments". Seems to be the data stored in mlwinfitMCMC objects isn't the right class for these functions. Is this because it's 2-lev mcmc binomial and it is just not supported?

For augment, when I just try the standard augment I receive "augment method not yet implemented for mlwinfitMCMC objects". When I use augment.mlwinfitMCMC I receive "could not find function "augment.mlwinfitMCMC" and when I write the function for augment.mlwinfitMCMC following code in the github I receive "augment method not yet implemented for mlwinfitMCMC objects" again.

Your latter solution is an interesting one which nods to a problem highlighted in the literature on Bayes propensity score matching about how to incorporate uncertainty into propensity scores (every obs having a fixed value isn't very Bayesian!) for example An (2010) Kaplan & Chen (2013) and Zigler (2016, 2020). Following your suggestion would then rely on matching on a line rather than a point estimate (is that right?) and so I am unsure how it could be incorporated into common matching algorithms (I'm using Matching https://cran.r-project.org/web/packages ... index.html) which rely on a numeric vector for matching - a little beyond my level maybe.

ChrisCharlton · Post by **ChrisCharlton** » Thu Aug 27, 2020 3:26 pm

The predict and fitted functions not working sounds like a bug, as the part where you are getting the error message should just be taking the vector of fixed part parameter coefficients and multiplying them by the corresponding data columns. Binomial MCMC is supposed to work (see line 815 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R). Are you able to provide any further information regarding your model specification?

Augment is not yet implemented within R2MLwiN (see line 1143 in https://github.com/rforge/r2mlwin/blob/ ... nfitMCMC.R). I am not sure why the version that you wrote isn't being picked up, but you could try renaming your function to have a unique name to see if it works then. If not then it may still be attempting to call the unimplemented R2MLwiN version.

The details of how to handle this in a Bayesian way is a bit beyond me too. I suspect that this would be a question for Bill.

wfleming · Post by **wfleming** » Thu Aug 27, 2020 8:12 pm

Here is model spec, basically just default with a bigger chain. Full formula does include a lot of variables and data is relatively large (>15k rows).

Code: Select all

f <- logit(treatment, cons) ~ 1 + level1_var + level2_var + (1 | lev2)
mcmc1 <- runMLwiN(Formula = f, D = "Binomial", data = dat, 
	estoptions = list(EstM = 1, mcmcMeth = list(burnin = 500, iterations = 20000)))

I used the code from line 815 to write a separate predict function but it results in the same error message. Also tried your suggestion with augment and same error before.

ChrisCharlton · Post by **ChrisCharlton** » Fri Aug 28, 2020 1:47 pm

I tested this with some of the MLwiN example models and it seemed to work for me (see below). I can't see anything obviously different about the way that you have set up your model to these. Can you check that these examples work for you?

Code: Select all

> library(R2MLwiN)
> 
> ## Read bang1 data
> data(bang1, package = "R2MLwiN")
> 
> (mymodel1 <- runMLwiN(logit(use) ~ 1 + age, D = "Binomial", estoptions = list(EstM = 1), data = bang1))

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 3.05)  multilevel model (Binomial) 
Estimation algorithm:  MCMC      Elapsed time : 2.75s 
Number of obs:  1934 (from total 1934)          Number of iter.: 5000  Chains: 1  Burn-in: 500 
Bayesian Deviance Information Criterion (DIC)
Dbar      D(thetabar)    pD      DIC
2591.294   2589.293   2.001      2593.294   
--------------------------------------------------------------------------------------------------- 
The model formula:
logit(use) ~ 1 + age
Level 1: l1id      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.      z    Pr(>|z|)       [95% Cred.   Interval]    ESS 
Intercept   -0.43858     0.04731  -9.27   1.863e-20  ***    -0.53074    -0.34677   1100 
age          0.00681     0.00509   1.34      0.1803         -0.00333     0.01649   1145 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the l1id level: 
                Coef.   Std. Err.   [95% Cred.   Interval]    ESS 
var_bcons_1   1.00000       1e-05      1.00000     1.00000   5000 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> fit1 <- predict(mymodel1)
> head(fit1)
[1] -0.3129285 -0.4764604 -0.4287636 -0.3810668 -0.5309711 -0.5173434
> 
> (mymodel4 <- runMLwiN(logit(use) ~ 1 + age + lc + (1 | district), D = "Binomial", estoptions = list(EstM = 1), data = bang1))

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 3.05)  multilevel model (Binomial) 
          N min     mean max N_complete min_complete mean_complete max_complete
district 60   2 32.23333 118         60            2      32.23333          118
Estimation algorithm:  MCMC      Elapsed time : 6.28s 
Number of obs:  1934 (from total 1934)          Number of iter.: 5000  Chains: 1  Burn-in: 500 
Bayesian Deviance Information Criterion (DIC)
Dbar      D(thetabar)    pD      DIC
2396.324   2355.120   41.204     2437.528   
--------------------------------------------------------------------------------------------------- 
The model formula:
logit(use) ~ 1 + age + lc + (1 | district)
Level 2: district     Level 1: l1id      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                    Coef.   Std. Err.       z    Pr(>|z|)       [95% Cred.   Interval]   ESS 
Intercept        -1.48934     0.13771  -10.82   2.919e-27  ***    -1.76712    -1.22862    94 
age              -0.02604     0.00773   -3.37   0.0007579  ***    -0.04114    -0.01136   287 
lcOne_child       1.11009     0.15247    7.28   3.317e-13  ***     0.81361     1.41775   273 
lcTwo_children    1.32959     0.16892    7.87   3.511e-15  ***     1.00613     1.66664   164 
lcThree_plus      1.30533     0.16875    7.74    1.03e-14  ***     0.98436     1.62393   127 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the district level: 
                  Coef.   Std. Err.   [95% Cred.   Interval]   ESS 
var_Intercept   0.30225     0.09800      0.15032     0.53346   456 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the l1id level: 
                Coef.   Std. Err.   [95% Cred.   Interval]    ESS 
var_bcons_1   1.00000       1e-05      1.00000     1.00000   5000 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> fit4 <- predict(mymodel4)
> head(fit4)
[1] -0.6641047 -1.3445875 -0.1972416 -0.4037507 -1.1363043 -1.1883751
> 
> (mymodel6 <- runMLwiN(logit(use) ~ 1 + age + lc + urban + (1 + urban | district), D = "Binomial", estoptions = list(EstM = 1), data = bang1))

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 3.05)  multilevel model (Binomial) 
          N min     mean max N_complete min_complete mean_complete max_complete
district 60   2 32.23333 118         60            2      32.23333          118
Estimation algorithm:  MCMC      Elapsed time : 9s 
Number of obs:  1934 (from total 1934)          Number of iter.: 5000  Chains: 1  Burn-in: 500 
Bayesian Deviance Information Criterion (DIC)
Dbar      D(thetabar)    pD      DIC
2327.272   2269.602   57.670     2384.941   
--------------------------------------------------------------------------------------------------- 
The model formula:
logit(use) ~ 1 + age + lc + urban + (1 + urban | district)
Level 2: district     Level 1: l1id      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                    Coef.   Std. Err.       z    Pr(>|z|)       [95% Cred.   Interval]   ESS 
Intercept        -1.72364     0.16457  -10.47   1.142e-25  ***    -2.08125    -1.39809    64 
age              -0.02594     0.00774   -3.35   0.0008015  ***    -0.04135    -0.01102   292 
lcOne_child       1.12629     0.15253    7.38   1.535e-13  ***     0.83731     1.43950   279 
lcTwo_children    1.34653     0.16855    7.99   1.363e-15  ***     1.01308     1.67241   182 
lcThree_plus      1.34301     0.17061    7.87   3.501e-15  ***     1.00944     1.65942    96 
urbanUrban        0.83932     0.19878    4.22   2.418e-05  ***     0.46094     1.25496    93 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the district level: 
                              Coef.   Std. Err.   [95% Cred.   Interval]   ESS 
var_Intercept               0.43687     0.14548      0.22124     0.78523   186 
cov_Intercept_urbanUrban   -0.46159     0.19214     -0.92123    -0.15848   118 
var_urbanUrban              0.78897     0.33255      0.29580     1.58722    96 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the l1id level: 
                Coef.   Std. Err.   [95% Cred.   Interval]    ESS 
var_bcons_1   1.00000       1e-05      1.00000     1.00000   5000 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> fit6 <- predict(mymodel6)
> head(fit6)
[1] -0.01958227 -0.74011857  0.42486231  0.23978036 -0.53262847 -0.58450099

wfleming · Post by **wfleming** » Mon Aug 31, 2020 11:44 am

I ran these examples and they all worked no problem so after a bit of trial and error I discovered the only reason it wasn't working is because one of the model variables was a character vector. Transforming it to factor and then running the full model allowed predict() to work.

What a load of work for something so simple - as ever!

Thanks a lot for working through this with me, Chris.

ChrisCharlton · Post by **ChrisCharlton** » Mon Aug 31, 2020 6:57 pm

Thanks for letting us know what the cause turned out to be. I suspect that this may become more common in future as there was a change in the default stringsAsFactors setting in R version 4 (see https://developer.r-project.org/Blog/pu ... asfactors/). I will think about whether there is any way that R2MLwiN can provide a better warning/error message when this happens.

www.cmm.bristol.ac.uk/forum

storing fitted values (propensity scores) from mlwinfitMCMC object

storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object

Re: storing fitted values (propensity scores) from mlwinfitMCMC object