Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

I'm trying to run a model with lots of dummy variables for categorical covariates and their interactions but I think runmlwin is having trouble with the number of variables as it doesn't run and gives me the error:

"too many macros"

Is there any way of getting around this or is there a limit to how many variables can be included in a model?

Thanks for spotting this. Yes runmlwin is limited in how many explanatory variables you can put into the model. You can see this by running the following code:

clear
set obs 1000
gen id = _n
gen y = rnormal()
gen cons = 1
forvalues i = 1/200 {
gen x`i' = rnormal()
}
quietly runmlwin y x1-x192, level1(id: cons) nopause
quietly runmlwin y x1-x193, level1(id: cons) nopause

You will receive the following output:

. clear
. set obs 1000
obs was 0, now 1000
. gen id = _n
. gen y = rnormal()
. gen cons = 1
. forvalues i = 1/200 {
2. gen x`i' = rnormal()
3. }
. quietly runmlwin y x1-x192, level1(id: cons) nopause
. quietly runmlwin y x1-x193, level1(id: cons) nopause
too many macros
r(920);

You can see that the first model runs fine, while the second does not and returns the same error message that you experienced with your own data. The problem relates not to the number of explanatory variables, but to the maximum number of characters that can be used to write out the full list of explanatory variables. This is a Stata limit. We will look into how we can extend this maximum number of characters and will post a reply once we have fixed this bug.

It appears that you have hit Stata's limit for the maximum number of characters that can be stored in a macro. As this is affected by the maxvar setting you can increase this (up to a maximum of 32767 characters) with a command like the following:

set maxvar 32767

Once you have done this you should find that you model will run up to the point of starting MLwiN. When you get to this point you may find that you reach MLwiN's maximum for the number of explanatory variables in a model which is by default 150. To get past this limit you will need to open MLwiN and select:

Options->Worksheet

and then select a higher number. For this change to still be in effect when runmlwin runs you will then need to click the "use as defaults button".

Please note that we haven't yet checked runmlwin with this many variables so check any results carefully.

Hello. I also have been working through these issues, when attempting to fit a cross-classified model using the "reducing storage overhead by grouping" options in runmlwin. I've set maxvar to 32767 and matsize=11000, however when attempting to fit a model with a large number of superclusters (level3) (over 200), I get a message saying that the matrix size is too small. Is there a way of estimating how many superclusters stata should be able to process in this way? If I was to run the model directly in Mlwin and set up the matrices accordingly, would it be able to process this? Thank you.

On a follow-up question, I've been successful using the optimal settings in Stata to fit a 3 -level model in runmlwin, which has 136 superclusters on the highest level. I've tried incorporating an additional random effect on the higher level, using the approaches laid out in the manual (Section 18.5 Other aspects of the SETX command) . Unfortunately Stata objects that "macro substitution results in line that is too long: The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar". As I have set maxvar and matsize to their upper levels, could anyone advise if there are other settings in Stata related to Macros I could tweak?

prior to running the problematic models, and then reporting the output above the error message? I may then be able to determine where the in -runmlwin- command the error is occurring and whether there is a workaround.

Thanks-I'll try that later and see what happens. Meanwhile I've been trying to replicate the runmlwin analyses for cross-classified models using mcmc (section 15.4 of the MCMC manual). When I run the first model for the Fife schools (ignoring the cross-classification) I get the results as tabulated in the example log file. However when I try rerunning using the commands in the log file, turning the mcmc(cc on) option on and using the results from the previous model as the starting values, I get messages about overwriting protected columns, and ultimately the message "model specified does not match last model run". I've just copied and pasted the commands as written in the log file. Maybe I'm doing something silly? I've had the same issue when trying to run my own models.

For the past few versions MLwiN has protected the columns used to store model results from accidental overwriting. To turn this off for the purposes of changing them from macros the command

needs to be issued before these changes are made. -runmlwin- has been updated to add this command, however it may be that your installed version predates this, in which case I would recommend updating to the most recent version on ssc. If you still have the problem after updating then let me know and I will investigate further.

Thanks, I've been able to update the software and attend to these issues.

On a related note, I was looking again at section 18.5 in the runmlwin examples, fitting a cross-classified model with a random intercept and random slope. In the example given the covariance between these was constrained at zero, but this may not always be appropriate, and I was looking to see if partioned matrices could be used to set up the appropriate variance matrix in this instance. As it stands, the variance-covariance matrix is 18x18. The upper left 9x9 submatrix would be a diagonal matrix with the diagonal elements constrained the same (var(s1-s19)), the lower right 9x9 submatrix would be a diagonal matrix with the diagonal elements constrained the same (var(s1xvrq-s19xvrq)), with the off diagonal submatrix 9x9, corresponding to the covariance between the random slope/intercept, with all elements equal (if I have got this right!). I could set up an 18x18 matrix A, with terms that are not to be estimated zero, those that are to be estimated one, and incorporate this into the runmlwin command using the elements(A) option. Then constraints would be used for the other terms as appropriate. Of course using MCMC estimation would be much simpler, at least in terms of syntax, but I was wondering if there were more efficient ways of setting up the syntax to estimate the covariance between a random intercept and random slope using IGLS as per the example above?

The way you describe is the correct way. This allows full flexibility. You can switch whichever elements you like on and off via this approach and you can specify whatever linear constraints you like. The downside is that this entails quite a lot of syntax and appreciate this is frustrating when all you want is a covariance between the random intercept and random slope.

The only work around would be if you only had a random-intercept for the other classification in which case you could make this classification the highest level single supercluster and put the classification of interest with the random intercept and slope nested within this supercluster at level-2. At level-2 you would then specify the random intercept and slope in the conventional way. This approach will only work if your other classification does not have too many units.