Out-of-sample predictions: Help & guidance
Posted: Mon Oct 05, 2015 10:29 am
I've been attempting to generate out-of-sample predictions for (logged) broadband speeds. I have repeated measures broadband speed data for four years (level 1) nested within 174 local authority areas (level 2), nested within 10 UK regions (level 3).
The (logged and grand mean centred) predictors are (see attached photo for example):
- year number
- population density
- median income
- number of firms
- percentage of service sector employment
I have a range of demographic and economic projections across different high, medium and low growth scenarios. I have been using the customised prediction window to forecast the effect of these scenarios on broadband speed. I would appreciate some advice on using this please, as I have read the section in the supplementary MLwiN manual (Rasbash et al. 2014), but still need more help.
Question 1. Can I feed the model values for specific level 1 units? For example, estimate how changes in population density for London, Leeds, Newcastle etc. lead to changes in predicted values of y (broadband speed)? Ideally I want to feed the forecasted level 1 predictor data to each local authority, and get a expected predicted value of y, so that I can develop an understanding how how broadband supply might change over time based on different scenarios. I'm not sure whether I can do this in MLwiN. (If I can't do local authority predictions, I'd be happy with forecasting the regional trajectory and someone might be able to advise on how to do this by including categorical predictors as groupings maybe?)
Question 2. Say I have data for 2020 for each of the predictors, such as year number, population density, income etc. how do I estimate the overall predicted change in y (broadband speed) across all these variables at once? The output provided by the customised predictions grid was not really what I was expecting, as it breaks the data down by each individual incremental change in value for each predictor. I'm not sure how I reconcile this to say that this is the predicted value of y based on this set of forecasted values of the predictors.
Question 3 I had used a polynomial on the year predictor to capture the growth curve dynamic, but this produced implausible out-of-sample predictions of y. I then looped back around to just a linear approach with complex level 1 variance on the year predictor, which produced more sensible predictions, but still went exponential by 2025. I suspect broadband speeds work more based on a logistic function/s-curve. Are there any modelling tricks which would enable the out-of-sample predictions to bring the estimates ten years in the future to more of a flat plateau rather than increasing to infinity?
Any help would be much appreciated.
Edward
The (logged and grand mean centred) predictors are (see attached photo for example):
- year number
- population density
- median income
- number of firms
- percentage of service sector employment
I have a range of demographic and economic projections across different high, medium and low growth scenarios. I have been using the customised prediction window to forecast the effect of these scenarios on broadband speed. I would appreciate some advice on using this please, as I have read the section in the supplementary MLwiN manual (Rasbash et al. 2014), but still need more help.
Question 1. Can I feed the model values for specific level 1 units? For example, estimate how changes in population density for London, Leeds, Newcastle etc. lead to changes in predicted values of y (broadband speed)? Ideally I want to feed the forecasted level 1 predictor data to each local authority, and get a expected predicted value of y, so that I can develop an understanding how how broadband supply might change over time based on different scenarios. I'm not sure whether I can do this in MLwiN. (If I can't do local authority predictions, I'd be happy with forecasting the regional trajectory and someone might be able to advise on how to do this by including categorical predictors as groupings maybe?)
Question 2. Say I have data for 2020 for each of the predictors, such as year number, population density, income etc. how do I estimate the overall predicted change in y (broadband speed) across all these variables at once? The output provided by the customised predictions grid was not really what I was expecting, as it breaks the data down by each individual incremental change in value for each predictor. I'm not sure how I reconcile this to say that this is the predicted value of y based on this set of forecasted values of the predictors.
Question 3 I had used a polynomial on the year predictor to capture the growth curve dynamic, but this produced implausible out-of-sample predictions of y. I then looped back around to just a linear approach with complex level 1 variance on the year predictor, which produced more sensible predictions, but still went exponential by 2025. I suspect broadband speeds work more based on a logistic function/s-curve. Are there any modelling tricks which would enable the out-of-sample predictions to bring the estimates ten years in the future to more of a flat plateau rather than increasing to infinity?
Any help would be much appreciated.
Edward