### How to impute a two-level model with repeated measures

Posted:

**Fri Feb 17, 2017 3:51 am**Dear all,

My question is how to organize data in order to impute a two-level model with 5 repeated measures.

To sum up: We have been using Stata and RealComImpute. Level 1 = patient id, Level 2 = month.

Month is a repeated measure taken from the same patient at month 0 (baseline), 1, ..., 4.

Letting i denote the i-th month (i = 0,...,4) and j = 1,...,n participants, our model is like:

y_ij = β_0 + β_1month_ij + β_2gender_j + ....+ u_0j + u_1j month_ ij + e_ ij

We have missing data for i = 3 and i = 4 only (~10% and 90% respectively). Data for the remaining months are complete.

y from month 2 is highly correlated with y from month 3 (r>0.90), y from month 2 is also highly correlated with y from month 4 (apparently - with r>0.70).

So, we would like to use y from month 2 as one of the predictors for y month 3 and 4.

In order to use y from month 2 as predictor, I created (in Stata) an additional column called pred_2, which contains y from month 2 for every subject.

In Stata, the final dataset would look like this:

*/------------------- start -----------------------------

clear

set seed 12345

set obs 1000

gene id = _n

gene age = round(runiform()*30)+20

gene gender = round(runiform())

gene covariate1 = runiform()

gene covariate2 = runiform()

forvalues i = 0/4 {

gene y`i' = round(rnormal(100,20))

}

replace y3 = . if runiform()<0.10

replace y4 = . if runiform()<0.90

gene pred_2 = y2

reshape long y, i(id) j(month)

gene cons = 1

sort month id

order id month y pred_2

realcomImpute y age gender covariate1 covariate2 pred_2 using mydata , numresponses(1) cons(cons) level2id(month)

*/------------------- end -----------------------------

Is this the correct set up for the two-level imputation?

Look forward to hearing from you.

Tiago

My question is how to organize data in order to impute a two-level model with 5 repeated measures.

To sum up: We have been using Stata and RealComImpute. Level 1 = patient id, Level 2 = month.

Month is a repeated measure taken from the same patient at month 0 (baseline), 1, ..., 4.

Letting i denote the i-th month (i = 0,...,4) and j = 1,...,n participants, our model is like:

y_ij = β_0 + β_1month_ij + β_2gender_j + ....+ u_0j + u_1j month_ ij + e_ ij

We have missing data for i = 3 and i = 4 only (~10% and 90% respectively). Data for the remaining months are complete.

y from month 2 is highly correlated with y from month 3 (r>0.90), y from month 2 is also highly correlated with y from month 4 (apparently - with r>0.70).

So, we would like to use y from month 2 as one of the predictors for y month 3 and 4.

In order to use y from month 2 as predictor, I created (in Stata) an additional column called pred_2, which contains y from month 2 for every subject.

In Stata, the final dataset would look like this:

*/------------------- start -----------------------------

clear

set seed 12345

set obs 1000

gene id = _n

gene age = round(runiform()*30)+20

gene gender = round(runiform())

gene covariate1 = runiform()

gene covariate2 = runiform()

forvalues i = 0/4 {

gene y`i' = round(rnormal(100,20))

}

replace y3 = . if runiform()<0.10

replace y4 = . if runiform()<0.90

gene pred_2 = y2

reshape long y, i(id) j(month)

gene cons = 1

sort month id

order id month y pred_2

realcomImpute y age gender covariate1 covariate2 pred_2 using mydata , numresponses(1) cons(cons) level2id(month)

*/------------------- end -----------------------------

Is this the correct set up for the two-level imputation?

Look forward to hearing from you.

Tiago