Advice on cross-classified models with large data sets
Posted: Sun Aug 06, 2017 9:53 pm
Hello. I was wondering if anyone could advise on their experience trying to fit cross-classified models with large data sets? For example, I'm examining health outcomes in individuals (level 2, n=500,000) over time(level 1), but patients may be seen by different doctors on each occasion, so the data is cross-classified. However, there are over 2,500 doctors, and although the vast majority of patients see only 1 doctor, around 15% see two or more different doctors.
When I try to identify unique clusters of "patient/doctor" I come up with around 3,000 clusters, too many for Stata to cope with (constraints have to be less than 2,000), and even with a subset of the data with fewer constraints and try fitting an appropriate model in runmlwin I get an error "too many macros". Could mlwin cope with fitting this directly?
My initial thoughts were to fit the model using IGLS estimation and then rerun using MCMC to take account of the cross-classification, however unsurprisingly, even testing this out on a very small subsample of the data there are very high levels of autocorrelation that, even with using techniques referred to at the end of the mlwin manuals which can help with mixing, it could take many days to get there using my trusty desktop PC, if at all! My thoughts are that I should really start by reducing the sample size and focusing on the subsample of patients that are most important, but even so, there are going to be thouands of doctors involved and a certain degree of cross-classification. I wondered if anyone had any advice they could share on their experiences? If I try using the multiple membership approach (wide format data, where there is only one doctor per time point with all weights =1, would that fit a cross-classified model or would it be the estimate obtained from the usual IGLS estimation where the cross-classification is not taken account of? Thanks in advance!
When I try to identify unique clusters of "patient/doctor" I come up with around 3,000 clusters, too many for Stata to cope with (constraints have to be less than 2,000), and even with a subset of the data with fewer constraints and try fitting an appropriate model in runmlwin I get an error "too many macros". Could mlwin cope with fitting this directly?
My initial thoughts were to fit the model using IGLS estimation and then rerun using MCMC to take account of the cross-classification, however unsurprisingly, even testing this out on a very small subsample of the data there are very high levels of autocorrelation that, even with using techniques referred to at the end of the mlwin manuals which can help with mixing, it could take many days to get there using my trusty desktop PC, if at all! My thoughts are that I should really start by reducing the sample size and focusing on the subsample of patients that are most important, but even so, there are going to be thouands of doctors involved and a certain degree of cross-classification. I wondered if anyone had any advice they could share on their experiences? If I try using the multiple membership approach (wide format data, where there is only one doctor per time point with all weights =1, would that fit a cross-classified model or would it be the estimate obtained from the usual IGLS estimation where the cross-classification is not taken account of? Thanks in advance!