If you already own a copy of MLwiN and have not upgraded to the latest version,
please upgrade to
MLwiN for free as this may solve some problems. If you have purchased MLwiN you can also submit our technical enquiry form if you are unable to find help here.
Tip: Use your browser's 'Find' facility for this page to search for a word or phrase (Key Ctrl+F)
| enquiry |
answer(s) |
| How can I get help with using MLwiN? |
As well as this FAQ section, we now have our own MLwiN User Forum where we welcome all users to post enquiries. For further information about how to use the forum go to the Forum FAQ page. If you cannot find an answer on the forum or the FAQ page and you have bought MLwiN, you may be entitled to email support (see FAQ below: Am I entitled to email technical support?) |
| Am I entitled to email technical support? |
You are entitled to free email technical support if you have bought any version of MLwiN or MLn in the past and can supply details of your purchase which we can trace on our database. If you are a UK academic and have downloaded the free version you are not entitled to free support; if email support is required, please order MLwiN in the usual way. |
| MLwiN 2.10, 2.10 Beta and 2.1 … |
| Why can't I get 2.10? I thought it was the latest version. Which one works with the training materials? |
Our latest releases begin with 2.1… and are the same as 2.10 except that bugs have now been fixed. If your version is not the latest one you can upgrade in the usual way, and you can also check which bugs have been fixed. The training materials work with all MLwiN 2.1 versions. |
| Should I uninstall the MLwiN 2.10 Beta version I have before installing a later version? |
Yes, to ensure the correct operation of MLwiN you must remove any Beta version before installation. |
| I already have MLwiN 2.10 Beta but the upgrade to the latest version will not work |
You must also have version 2.02 installed if you wish to upgrade from previous 2.10 beta versions to subsequent 2.10 versions. If you do not own a previous version you will need to purchase the program. Note! If you have purchased or been eligible for MLwiN 2.10 (any beta) and do not have version 2.02 installed, therefore are unable to upgrade for free, please email us with proof of purchasing (eg, purchasing number) or re-apply for the free download if eligible. |
| Trial versions |
| Can I open SPSS, Stata or Minitab worksheets using the training version of MLwiN? |
- Can I use the tutorial /Lemma version of MLwiN to practice loading my own data (in SPSS) into MLwiN, or do I need the full version for this?
- Data input and output for the teaching version of MLwiN is limited to specially marked MLwiN worksheets, so you will not be able to load your SPSS files using this version.
|
| What is the difference between the training, the trial version and the full version? |
The training version of MLwiN has identical functionality to the full MLwiN except that it can only read specially marked worksheets. The trial version is exactly the same as the full version except that it expires after 30 days. |
| Realcom |
| Does multiple imputation in Realcom work with the free trial version of MLwiN? |
There are two different trial versions of MLwiN (see the FAQ What is the difference between the training, the trial version and the full version?). Multiple imputation will not work with the 30 free trial version, since this is based on version 2.02, but will work with the free training version, since this is based on version 2.10. |
| I'm experiencing problems with Realcom orRealcom-Impute |
Go to the Realcom discussion board (you will need to register or log in) |
| Operating Systems |
| Can I install MLwiN onto a Mac or Apple computer? |
If you want to run MLwiN you will have to run it under Windows, either through virtualisation software which is available commercially (for example www.vmware.com/products/fusion/ or www.parallels.com) or for free as open source software (for example Sun's VirtualBox), or by dual booting using Apple's Boot Camp. All of these require a legal copy of Windows to use on your Mac. |
| Is MLwiN compatible with Microsoft Vista? |
Yes, MLwiN 2.1 is compatible with Vista. Further information - MLwiN system requirements |
| Is MLwiN compatible with Windows Vista, Server, x64 Edition, MCE and Multi-core/Multi-processor environments? |
See system requirements |
| Is it possible to run MLwiN on a Citrix-desktop? |
- We are currently re-designing our network, which means that our MLwiN users will have to access the data MLwiN uses through a Citrix-"window" and they will not be allowed to put the data onto their local hard disk. This means that MLwiN cannot run on their computers but must reside in the secure network, and either be processed on the Citrix-server/Citrix-desktop, or running on a dedicated application-server.
- We do not have a Citrix system here to test MLwiN on, so cannot give a definitive answer, however we have had reports of people using MLwiN successfully on this and similar systems. MLwiN 2.10 should have more success that previous versions as it no longer attempts to write files temporarily to its program directory.
One thing to note is that MLwiN loads the whole data set being analysed into memory, so if your users are running models on large data sets on the same machine you may run into memory issues.
Running more complex models can also be quite CPU intensive. Whether this is an issue for you will depend on the number of processors in your server machine and the number of simultaneous MLwiN users. Each instance of MLwiN can only make use of a single processor, so as long as the number of copies running a model is less than or equal to the number of processors in the machine there should be no slowdown compared with running it on the user's machine.
|
| Getting data in and out |
| How do I get data into MLwiN? |
See our Getting data into MLwiN page |
| How do I get data out of MLwiN? |
See our Getting data out of MLwiN page |
| Are there any issues I should be aware of when opening my Stata/SPSS/Minitab files with MLwiN v2.10? |
If you have any categorical variables for which there are no observations in some of the categories, see the FAQ How do I get rid of an unneeded dummy variable? |
| Opening a .dta worksheet, I get the error message 'Duplicate name(RSTA)' |
- I'm attempting to open a new worksheet directly from .dta (STATA) format and I get the error message "Duplicate name(RSTA)."
- This error message will be produced if there are two variables with the same name in the Stata file you are trying to open. Note that MLwiN does not distinguish between small and capital letters, so for example variable1, Variable1 and VARIABLE1 will all be considered the same name. Inside Stata, change the names of any variables that MLwiN would consider to be the same, save your worksheet and try importing into MLwiN again.
- Is it possible that MLwiN has problems reading underscores ("__")? For example, reading ___ABC the same as ABC ?
- No, MLwiN should count '__ABC' and 'ABC' as different. To test whether MLwiN considers 2 names to be the same, open MLwiN without opening any worksheet (you can do this via the Programs menu, via a shortcut, or by clicking on MLwiN in its folder in your Program files, or alternatively you can double click on an MLwiN worksheet to bring MLwiN up and then select New worksheet from the File menu to close the worksheet). From the Data manipulation menu select Names. In the window that appears, highlight any row and click the Edit name button. Type in one of the names you want to test. Highlight a different row, press the Edit name button again and type in the other name. If MLwiN considers these 2 names the same you will get an error message saying 'duplicate name(NAME)' (confusingly when you press OK it appears that the variable has been given the name you chose in spite of the error message, but if you press the refresh button in the top left corner of the Names window you will see that in fact the variable has reverted to its previous name). If MLwiN considers the names to be different you will get no error message. You can delete the variables (highlight the rows and press the Delete button at the top of the Names window) and test another pair of names, or you can leave the variables and name more variables one at a time to test whether a name is the same as any that you have already used.
|
| I'm trying to export a number of variables from MLwiN, but only the first 5 appear in the text file |
- I'm trying to export data sets from MLwiN into files that have an ASCII file format (or .csv or SPSS data files). Selecting the ASCII text file output from the File menu command allows me to specify a column range c1-c10 and an output file name (and destination). But the resultant text file has the first 5 variables. I'm not sure what happened to the remaining 5 variables? Even entering in the variables separately c1 c2 c3 c4 c5 c6 … did not yield any better results. I have tried this process across several data files and on numerous occasions.
- Although it appears that only 5 variables are being exported when you use the ASCII text file output window from the File menu, in fact all 10 are exported. However you only see 5 columns of data because when you export more than 5 columns MLwiN wraps the data (see section 2 of Getting data out of MLwiN). So if you export the 10 variables in the tutorial dataset supplied with MLwiN, for example, then in the exported data the first number in your first column of data is the first entry of the variable school but the second number in your first column of data is the first entry of the variable girl. The third number in your first column is the second entry of school and the fourth number is the second entry of girl and so on. This wrapping is less obvious in the case when the number of variables you export is a multiple of 5 than it is in the example given in the Getting data out of MLwiN page because when the number of variables is a multiple of 5, there will be no gaps in the five columns of output data.
There are two solutions to this problem. The simplest is to upgrade from MLwiN 2.02 to MLwiN 2.1. The upgrade is free to anyone who already has a copy of MLwiN 2.02). You may find when using this version that you get 10 columns of data when using the ASCII text file output window to output 10 variables, i.e. MLwiN is no longer wrapping output data when outputting more than 5 variables. But in any case there are easier ways to output from this version, which are described in the section Getting data out of MLwiN 2.1. Alternatively, if you do not wish to upgrade, the other solution is to follow the advice in section 2 of Getting data out of MLwiN: export five variables at a time into Excel where you can put them together again into a worksheet.
|
| Large datasets |
| How do I know MLwiN can handle the model based on my own data? |
The capacity of MLwiN is determined entirely by the memory of your PC.
No of records x No of 'variables' = No of figures.
Memory required to store data: No of figures x 4 = Memory in bytes
Memory in bytes/1024 = Memory in kilobytes
Memory in kilobytes/1024 = Memory in megabytes
Worksheet size used by the data:
No of figures/1000 = No of K cells in MLwiN
(A K cell is a unit of approximately 1 kilobyte)
Worksheet space required for data manipulation:
2 x No of K cells |
| I’m having problems importing a large dataset from SAS |
- I'm trying to get a 2.5 million dataset from SAS into MLWiN. I've tried to use a SAS macro and also, to just export my data to a text file. When I use the macro to try to get my data into MLwiN, it says that it's "scanning data" -- it slows down at about 600,000 and stops at approximately 740,000. It doesn't stop at the same place every time! (for example, once it stopped at 733,800 and once at 759,700). I tried to increase the worksheet size to 50,000 and when I did, it stopped at 642,600. An error message pops up in a window: EXE file has encountered a problem and needs to close. We are sorry for the inconvenience. When I just try to import the text file, the same error message pops up.
Wondering if you have any thoughts?!
- I would suggest importing a subset of your data, eg, a random sub sample of your higher level units.
Even if you manage to get all the data in all but the simplest models would take a very long time to converge.
A side benefit of using a sub-sample of data will be that you can check that you haven't overfit your model to your sub-sample of data by testing the final model on alternative sub-samples of the data.
When model building you will want a considerably small sub-sample of your data, say 25,000 if the model is not too complex you can then estimate your final model on say 250,000.
|
| I’m having problems working with a large dataset in MLwiN. Does MLwiN have a limit to the size of dataset that it can handle? |
- I am doing 2-level logistic modeling using MLwiN. My data has >250,000 individuals within >700 clusters. I have less
than 10 covariates.
- Try using a random sub-sample of your data. With only 2 levels, 700 level 2 units and 10 covariates you do not need nearly as many as 250000 observations to get precise estimates. You will be able to work a lot more quickly with a smaller sample and therefore be able to explore more avenues and potential models etc
- Basically, it doesn't work for the whole data on MLwiN (even single level logistic) although it works for a subset smaller data (about 5000). I don't realize any problems for importing the entire whole data.
- You can check whether the entire dataset has been imported properly by comparing the summary statistics: are they the same in MLwiN as in your other stats package?
- Errors: There are three types of errors. One shows me 'out of memory'. And the 2nd one shows me 'no sufficient worksheet size. The 3rd one is that MLwiN is just crashed and turn off. I have two questions. 1)can MLwiN handle such a huge data?
- Yes. But depends of course on the spec of your computer. See faq
- 2)How to re-configure/specify worksheet size from the 'option'?
- Go to options
worksheet then increase worksheet size to a large amount (experiment)
|
| Worksheets |
| How do I increase the worksheet size? |
- I got an error message that said 'Insufficient space to run model. Increase worksheet size to at least 10000k (offs)'. How do I increase the worksheet size?
- Select Worksheet... from the Options menu then type a larger number in the box next to worksheet size(k cells) and click Done. You may need to experiment to find a size that is large enough for what you are trying to do. (Note that there is a limit to the size of worksheet that MLwiN can allocate; this depends on your computer but is usually around 20,000 k cells. Also note that MLwiN can encounter problems if you make the worksheet size too much bigger than necessary).
Note that resizing the worksheet will close the worksheet, so you should make sure that you have saved it, and you will have to re-open it again after resizing.
|
| Does each row correspond to a single record right across the MLwiN worksheet? |
No, MLwiN worksheets do not follow the principle that one row of the worksheet should correspond to one record right across all columns. Instead, it is up to the user to be aware of how the data is arranged in each column. Thus, a user might start with an imported dataset in which each row corresponds to one level 1 unit. They might then go on to create some variables in which each row still corresponds to one level 1 unit, for example by adding two of their original variables together, recoding a variable, or centring a variable around some value. They might then create some further variables in which each row corresponds to a level 2 unit, for example by taking the mean of some variable for each level 2 unit and then only keeping one record per level 2 unit. Or they might create a variable which had one row per level 1 unit but only contained observations on girls, the boys' observations being deleted from the variable. MLwiN will not complain if the columns in a worksheet are not the same length, and it will not assume that the rows of the worksheet correspond to the same records right across the columns. The user must be aware of what is in each column and use it appropriately, which means for example that when entering variables into a model or using them to plot a graph the user should be sure that each row corresponds to the same observation across all the variables. (Note that if you try to enter two variables of different lengths into a model or plot them against each other in a graph, then MLwiN will give an error message, but if you do this with two variables of the same length for which the rows do not correspond to the same observations, MLwiN will not know that there is a problem. This could happen if you have not included one of the variables when sorting your dataset or if you have two groups with the same number of individuals and have created two sets of variables, each set containing observations for just one group, and are now using one variable from each set). |
| Does it matter what order the data appears in in the worksheet? |
For some purposes the order of the rows of data in the worksheet is not
important, but if you want to run a model it is very important that the
data is arranged so that all the cases belonging to the same highest level
unit are on adjacent rows, and within highest level units, that all the
cases belonging to the same unit at the next level down are on adjacent
rows, and so on, right down to (within level 3 units) cases belonging to
the same level 2 unit being on adjacent rows. In other words, the data
should be sorted by level 1 units within level 2 units within ... within
highest level units. (The exception to this is if you are using MCMC to
fit a cross-classified model, since this does not rely on the order of the
cases to determine which unit each case belongs to at each level; however
it is still wise to sort the data in this way as far as possible since
when IGLS is run this will give better starting values for MCMC).
Chapter 8 of the User's Guide to MLwiN (available to download for free
from here) gives details of how to sort the data in MLwiN; alternatively
the data can be sorted in another package before importing to MLwiN, but
the user should then check that the data is indeed correctly sorted after
importing. See also the FAQ: Implausible results or convergence problems. |
Other questions about worksheets
See also the Large datasets and Getting data in and out sections |
|
| Sample size, significance testing and suitability of data |
| Sample Sizes for Multilevel Models |
Sample Sizes for Multilevel Models - more details |
| How can I calculate confidence intervals in MLwiN? |
You can calculate the confidence intervals by multiplying the standard errors of the parameters by the appropriate amount using the Calculate window (available from the Data Manipulation menu) or the CALC command, or in a different package of your choice, or using a pocket calculator. The standard errors can be found in the Equations window displayed in brackets after the parameter estimate. If you have many parameters, you may find it quicker to use a macro. There is a macro available on our website which will calculate the z-ratios and confidence intervals. If you want to write your own, then information on where the standard errors are stored in the worksheet can be found here
If you are using MCMC, then instead of confidence intervals based on standard errors and the assumption that the sampling distribution of each parameter is Normal, you may want confidence intervals derived from quantiles of the chains of parameter estimates. These can be obtained by selecting Trajectories from the Model menu, and then in the window that appears clicking on the graph for one of the parameters (and clicking Yes in answer to the question that appears). This brings up the MCMC diagnostics window which contains some information based on the chain of estimates for that parameter. The information you want is at the bottom, in the second row of the Summary Statistics section. This gives the 2.5%, 5%, 50%, 95% and 97.5% quantiles. If you want different quantiles or if you want to calculate the quantiles for many parameters at once without having to click on each graph individually to bring up the relevant window, then again you might want to write a macro. The MCMC manual ('MCMC Estimation in MLwiN') gives details of where to find the stored chains of parameter estimates and how to split the one long variable containing this information into a separate variable for each parameter (p58-59). |
| What significance tests for both random and fixed parameters are available in MLwiN? |
For Normal response models, Wald tests and, likelihood ratio tests as well as t-tests can be conducted for testing the significance of fixed parameters. They all produce similar conclusions when testing on a single parameter. The Wald test can be carried out through the Intervals and Tests window available from the Model drop down menu (for some examples of tests using this window see How do I test whether two coefficients are equal? and How do I test if an interaction between a continuous and a categorical variable is significant?). The likelihood ratio test can be calculated by taking the difference in the -2log-likelihood values of any two nested models, available from column C1091. The degrees of freedom for the test is the difference in the number of parameters between the two models. The t-test statistic is given by the ratio of the estimate over its standard error. For the random parameter estimates, we recommend using the Wald test or the likelihood ratio test. For non-linear models, no likelihood values are available as these models are fitted using quasi likelihood, hence the likelihood ratio test is not available. One can use the Wald test for an approximate answer. In some extreme cases, MCMC and bootstrapping estimation procedures available in MLwiN should be considered. Both methods provide parameter chains for the empirical distribution or confidence intervals for each parameter estimate. |
| How do I test whether two coefficients are equal? |
- I have a categorical variable with 6 categories which I have entered into my model as an explanatory variable, and I want to test whether the coefficients for two of the dummy variables are significantly different from each other. How do I do this?
- After running the model, select Intervals and tests from the Model menu. In the window that appears, select fixed (at the bottom of the window) and in the box # of functions type 1 (if it does not already say 1). In the top part of the window type a 1 in the box next to one of the dummy variables and a -1 in the box next to the other (it does not matter which way round you do this). In the box next to constant(k) type 0 (if it does not already say 0). Now press the Calc button at the bottom of the window. The box chi sq, (f-k)=0. (1df) now contains the test statistic to be compared against the
distribution with 1 degree of freedom, and we can make the comparison using the Tail Areas window from the Basic Statistics menu. The null hypothesis is that there is no difference between the coefficients. (Important note: it may be necessary to resize the column in order to make sure you can see the entire number in the chi sq, (f-k)=0. (1df) box. To do this put the cursor over the column divider right at the top (in the grey row) and it should become a double arrow; you can then click and drag to resize).
For example suppose that we are working with the tutorial dataset supplied with MLwiN and that we have set up in the Equations window a random intercepts model with explanatory variables standlrt and vrband (with vb1 as the reference category). Suppose we want to test whether the coefficients of vb2 and vb3 are equal. We run the model, and then select Intervals and tests from the Model menu. In the window that appears, we select fixed (at the bottom of the window) and in the box # of functions we type 1. In the top part of the window we type a 1 in the box next to fixed : vb2 and a -1 in the box next to the fixed : vb3. In the box next to constant(k) we type 0 (which is the default so may already be entered). Then we press the Calc button at the bottom of the window. We see from the box chi sq, (f-k)=0. (1df) that our test statistic is 73.879 (we check it really is 73.879 and not 273.879 or 5673.879 by resizing the column). We select Tail Areas window from the Basic Statistics menu. In the window that appears, we leave Chi Squared selected, in the Value box we type 73.879, and in the Degrees of freedom box we type 1. We press Calculate and the Output window appears giving us a probability of (= 0.0000000000000000083055). We can thus reject the null hypothesis that there is no difference between the coefficients of vb2 and vb3 and conclude that the coefficients are significantly different from each other.
An important point to note is that we put a 1 next to one of the coefficients we wish to test and a
-1 next to the other. This is because the Intervals and tests window adds together everything in the column and tests whether the result is significantly different from 0. So by filling in the column in this way we are testing whether , which is equivalent to testing whether . We are testing whether there is a difference between the two coefficients as was our aim. If we instead put a 1 next to each of the coefficients we would be testing whether , which is equivalent to testing whether . We would be testing whether the coefficients had the same magnitude but opposite signs, which is not what we wanted to test.
Note that an alternative way to test whether two categories have equal coefficients is to refit the model so that one of the categories is the reference. In that case the coefficient of the other category will be a contrast between that category and the omitted category, and the usual Z-test can be carried out. |
| How do I test whether more than 2 coefficients are equal? |
- I have a categorical variable with 6 categories which I have entered into my
model as an explanatory variable, and I want to test whether there is a significant difference between any of the coefficients
for the dummy variables, or whether all the coefficients are equal. How do I do this?
- To test whether more than 2 coefficients are equal, we use the Intervals and tests window as in the FAQ How do I test whether two coefficients are equal? We proceed as described in the answer to that question, but in the box # of functions we type the number of pairs of coefficients that we need to compare. For example, if we wanted to test whether 3 coefficients
, and are equal, we would type 2 because we need enough pairs for each coefficient to enter into at least one comparison (so for example we might compare with and with ). If we wanted to test whether 5 coefficients were equal (as in the case of testing whether the coefficients for all dummy variables for a 6-category variable are equal) we would enter 4. In general, we enter n -1 where n is the number of coefficients we want to test.
We fill out the columns that appear in the top part of the window according to the same principle as when testing whether two coefficients are equal: each column corresponds to one comparison between two coefficients, and in that column we enter a 1 in the row corresponding to one of the coefficients and a -1 in the row corresponding to the other. There are many possible ways of choosing the pairs to compare, but one way that will always work is to take the first coefficient and compare it with each of the others. Thus when comparing 3 coefficients we can compare with in the first column and with in the second column; and when comparing 5 coefficients we can compare with in the first column, with in the second column, with in the third column, and with in the fourth column. So when comparing 3 coefficients, supposing that the model consists of cons as the first explanatory variable and the 4-category explanatory variable as the only other explanatory variable, one correct way to fill in the columns (down to the constant(k) row) would be:
| Column 1 |
Column 2 |
| 0 |
0 |
| 1 |
1 |
| -1 |
0 |
| 0 |
-1 |
| 0 |
0 |
and when comparing 5 coefficients (again with cons and the 6-category explanatory variable
as the only explanatory variables in the model), one correct way of filling in the columns would be:
| Col1 |
Col2 |
Col3 |
Col4 |
| 0 |
0 |
0 |
0 |
| 1 |
1 |
1 |
1 |
| -1 |
0 |
0 |
0 |
| 0 |
-1 |
0 |
0 |
| 0 |
0 |
-1 |
0 |
| 0 |
0 |
0 |
-1 |
| 0 |
0 |
0 |
0 |
If taking this approach, the 1s should all be in the same row and the -1s should form a diagonal line going
downwards from left to right.
As for testing whether 2 coefficients are equal, we put 0 in each column in the constant(k) row (and in any rows corresponding to other explanatory variables) and press Calc.
In the chi sq, (f-k)=0. (1df) row of each column appears a test statistic which allows us to make the comparison between the particular pair of coefficients set up in that column, but the statistic we need to make our test that all the coefficients are equal is to be found near the bottom of the window where it says joint chi sq test(?df) (where the question mark will be the number of comparisons we made in the top part of the window). We can get a p-value for this test statistic using the Tail Areas window as described in the answer to How do I test whether two coefficients are equal? (remembering to input the appropriate value for Degrees of freedom, which will not now be 1). This will be the p-value for the test with null hypothesis that all the coefficients are equal; thus if the p-value is small enough we can reject this hypothesis and conclude that (at least some of) the coefficients are significantly different from each other (although there may be no significant difference between some pairs and we would probably want to investigate further to determine for which pairs there is a significant difference and for which we cannot reject the hypothesis that there is no difference).
What exactly is the test statistic that appears at the bottom of the window next to joint chi sq test (?df) and why does it test that all the coefficients are equal? This test statistic is the test statistic for making all the comparisons specified in the columns simultaneously: for example, in the test described above to compare 3 coefficients, it is the statistic for the test that AND ALSO . In other words, when we reject the null hypothesis (that both these equalities hold), this implies that there is a significant difference between and and/or a significant difference between and and/or a significant difference between and (at whatever level of significance we are using). Note that last point: although none of our columns made a comparison between and , if then our null hypothesis is not true. This is because and means . If there is a significant difference between and then it cannot simultaneously be true that and (even though the test statistics in the chi sq, (f-k)=0. (1df) row may indicate that when we make the comparisons separately, we find there is no significant difference between and and that there is no significant difference between and ). Therefore another way to describe the joint test is to say that it is a test of whether . So we can see that indeed the joint test is the test that we wanted, that tests whether all the coefficients are equal.
|
| How do I test if an interaction between a continuous and a categorical variable is significant? |
After running the model, select Intervals and tests from the Model menu. In the window that appears, select fixed (at the bottom of the window) and in the box # of functions enter the number of terms in the interaction (which will usually be the number of dummy variables that the categorical variable is entered as). In the top part of the window you will now see a number of columns equal to the number you have just typed in the box. In the first column, enter a 1 in the row corresponding to the first interaction term, in the second column enter a 1 in the row corresponding to the second interaction term, and so on for all the columns. Check that all the other rows in each column from the top to the constant(k) row read 0.000 (if not change the number to 0). Click the Calc button at the bottom of the window. The numbers in the chi sq, (f-k)=0. (1df) row now give the test statistic for each separate test (that the coefficient of the interaction term is 0). So for example the number in this row in the first column is the test statistic for the test that the coefficient of the first interaction term is 0. If we compare any of these statistics to the distribution with 1 degree of freedom we should get very similar results to if we were to divide the coefficient of the corresponding interaction term by its standard error and compare against the standard normal distribution. We use these statistics to test separately whether the coefficient of each interaction term is significantly different from 0. (Important note: it may be necessary to resize the columns in order to make sure you can see the entire numbers in the chi sq, (f-k)=0. (1df) row. To resize a column put the cursor over the column divider right at the top (in the grey row) and it should become a double arrow; you can then click and drag to resize). At the bottom of the window there appears a number next to joint chi sq test(?df) = (where the ? is the number of columns at the top of the window). This is the test statistic for the test of the null hypothesis that all of the coefficients are equal to 0. If the null hypothesis can be rejected then at least one of the coefficients of the interaction terms is significantly different from zero. In this case, it is common to leave all the interaction terms in the model, just as it is common to leave in all dummy variables for a categorical variable when the coefficients of only some of them are found to be significantly different from zero. In some cases, it may be possible to group together categories with interaction coefficients not significantly different from zero (retaining the full set of categories as main effects if these are significantly different from one another). However, as usual, care should be taken to ensure that it is substantively sensible to combine categories.
For example suppose that we are working with the tutorial dataset supplied with MLwiN and that we have set up in the Equations window a random intercepts model with explanatory variables standlrt, vrband (with vb1 as the reference category), and an interaction between standlrt and vrband (again with vb1 as the reference category). We now run the model and select Intervals and tests from the Model menu. In the window that appears, we select fixed (at the bottom of the window) and in the box # of functions we type 2 because there are 2 terms in the interaction, standlrt.vb2 and standlrt.vb3. In the top part of the window there are now two columns. In the first column, we enter a 1 in the fifth row since this row corresponds to standlrt.vb2, and in the second column we enter a 1 in the sixth row since this row corresponds to standlrt.vb3. We then click the Calc button at the bottom of the window. Looking at the bottom of the Intervals and tests window we see joint chi sq test(2df) = 5.141. Using the Tail Areas window, we find that the p-value for this test is 0.076497. Therefore we cannot reject at the 5% level the null hypothesis that the coefficients of both the interaction terms are zero, and we might wish to remove the interaction terms from our model. However, in the chi sq, (f-k)=0. (1df) row we now see 1.745 in the first column and 5.086 in the second column (we resize the columns to check these numbers really are 1.745 and 5.086, and not for example 11.745 and 35.086). Using the Tail Areas window from the Basic Statistics menu, we discover that the p-value for the first column is 0.18651 and the p-value for the second column is 0.024120. Thus looking at the interaction terms separately, we could reject at the 5% level the null hypothesis that the coefficient of standlrt.vb3 is 0, but we could not reject at the 5% level the null hypothesis that the coefficient of standlrt.vb2 is 0. From a statistical point of view, this might lead us to drop the standlrt.vb2 term from the model while retaining standlrt.vb3. This would imply that the effect of standlrt is the same in categories 1 and 2 of vb, but different in category 3 of vb. If the main effects of vb2 and vb3 are both significantly different from zero, we would retain these in the model. So the model would include main effects of standlrt, vb2 and vb3 and standlrt.vb3. Whether we decide to use this model, to drop the interaction terms altogether, or to retain both of them will depend on how much sense it makes substantively to fit a model which specifies the same effect of standlrt in categories 1 and 2 of vb. |
| Errors and crashes:
Error messages during estimation/implausible results |
| I get an error message when I run my model/ numerical problems |
The linear multilevel model can be written in the general form
(1)
The matrix V is constructed from the random parameters - the variances and covariances of the
random effects in E and an estimate of it used in the above expression. The estimates of the random
parameters are also functions of the data and V (Goldstein, 2003).
The RIGLS/IGLS algorithm iteratively estimates the random and fixed parameters. In some circumstances
it is possible to obtain inadmissible estimates for components of the model and MLwiN will issue one
of the following warnings. You may need to take action, to ignore the warning or to proceed. Note that
when carrying out a bootstrap any of these conditions will result in that particular bootstrap replicate
being ignored and an extra failed replicate will be registered.
- V has gone negative definite
- The V matrix for a higher level unit or block has become negative definite. This can occur
when, for example, the variance function for a particular combination of explanatory variable values
becomes negative. Common cases are with repeated measures models where high order polynomials are fitted
or in models where the level 1 variance is fitted as a function of a continuous explanatory variable
such as age. MLwiN will automatically approximate V by the nearest positive definite matrix using a
singular value decomposition.
- This problem may only occur during the course of iterations and convergence may eventually be achieved
smoothly. If the problem persists and even if convergence is obtained you should carefully examine the
specification of your model to see whether there is an obvious problem which can be fixed by adding or
removing parameters.
- SSP matrix for fixed (random) part has gone negative definite
- The matrix
has become negative definite. This can occur in similar situations to those above
and MLwiN will approximate to the nearest positive definite matrix.
- Matrix in constraint procedure has gone negative definite
- When constraints are used, for example in cross classified models and generalised linear models, a further
matrix which should be positive definite is created involving the constraint vectors and in some circumstances
this can become negative definite. The matrix is approximated by the nearest positive definite matrix.
Less is known about why this happens, and you should examine your model and any imposed constraints carefully.
- Random part has gone to zero - cannot continue
- Sometimes, in complex models, all the random parameters obtain zero values during iterations. In this situation
the estimation cannot continue and you will need to restart and you should look carefully at your model.
This sometimes occurs because the starting values used are inappropriate.
- Try running a simpler model than the one you wish to fit and then add further terms when convergence has
been achieved. If you have already built up your model in this way try restarting with the most complex
model specified.
|
| I got an error message 'cannot allocate rmatrix(ws)(like)' |
This message is probably produced because your worksheet is not large enough. See this FAQ for details of how to increase the worksheet size. |
| I tried to run a binomial model and got an error message |
- I (successfully) ran a multinomial model, then cleared the Equations window and set up a binomial model. When I pressed Start, I got an error message which said
error while obeying batch file C:\Program Files\MLwiN v2.10\discrete\pre at line number 11: name c1180 'H' C1181 'F~(H)' C1182 'H*' C1183 'P' c1184 'MASK' c1189 'Y-VAR'
Duplicate column name.
- This error message occurs because when MLwiN estimates a binomial or multinomial model, it sets up some special columns in the worksheet. When it does this it gives them certain names. The names for the columns used when running a multinomial model are P, -P, H, F~(H), H*, Pi and Y-VAR, and these occupy c1178 to c1183 and c1189. When the model is cleared, these columns are not deleted. If another multinomial model is set up and run, c1178 to c1183 and c1189 get given these names again. This does not cause a problem because the same column is being given the same name. However, if instead a binomial model is set up and run, the names and locations of some of the special columns are different: H, F~(H), H*, P and Y-VAR are put in c1180 to c1183 and c1189. When MLwiN tries to give these names to these columns, it discovers that P is already the name of c1178. Since there can't be two columns with the same name, the error message is produced.
To solve the problem, after you clear the Equations window simply scroll down in the Names window to find c1178 to c1189, highlight these columns and click the Delete button at the top of the window. This removes the contents of the columns and renames them c1178, c1179, and so on. When you set up and run the binomial model MLwiN is then able to properly create the special columns it needs.
Note that the same problem does not occur if you try to run a multinomial model after having run a binomial model. This is because, although c1183 is already named P when MLwiN tries to name c1178 P, c1183 is being given a new name in the same command, and so c1178 is successfully named P without there being 2 columns with the same name.
|
| I tried to run a binomial model and got an error message (2) |
- I tried to set up and run a binomial model using a macro, but got the error message 'Variables random at bottom
level should not be used in model'
- It sounds as though you have included a command in your macro like this:
setv 1 'cons'
and this is causing the error message that you see. When fitting a binomial model, it is not necessary to specify an error term at level 1. This is because the level 1 error term is already included in the model, with a binomial distribution, when you specify a binomial response (It is attached to a specially created variable, which does not appear elsewhere in the model). When specifying models in the Equations window using the Graphical User Interface, you are not given the option to specify that any variable be random at level 1. Using commands or macros it is of course possible to issue a command that makes this specification, but this causes an error when MLwiN attempts to estimate the model. If you press the Start button after setting up a model like this using commands or a macro you will see the same error message. If you remove the command "setv 1 'cons' " from your macro the model should run as expected.
For more information about binomial response models and level 1 residuals, see section C6.4 of the module on binary response models in our online training materials.
|
| When do I allow negative variances? |
A common situation where you may wish to allow negative variances is when carrying out an iterated
bootstrap for bias correction. In some cases allowing a variance, especially a level 1 variance, to go negative during the iterative
procedure, can prevent estimation failure. For example, if the level 1 variance is very small compared to that at higher levels,
the IGLS algorithm can stick at a zero estimate of the level 1 variance if a negative level 1 variance is encountered at an iteration and reset to zero -
and at the subsequent iteration also be reset to zero so that convergence appears to have been achieved. Allowing it to go
negative can avoid this problem and result in a final positive converged value. The most important case is when modelling complex level 1 variation with the variance a quadratic function of an explanatory variable. Here it is perfectly legitimate to have a negative variance so long as the total variance is not negative. |
| I get an error message 'wrong parameter...' when using MCMC |
- I just tried to use MCMC methods in MLwiN but I always get error messages (wrong parameter...) even if I try to follow the tutorial in Chapter 10 of the MCMC manual.
- This error may be caused by using a comma as the decimal separator. If you do not have the most recent version of MLwiN, you may find that upgrading to this version solves the problem. If you still experience the problem when using the most recent version, then try changing the decimal separator that your computer uses to a '.' (via Regional and Language Options in the Windows Control Panel).
|
| When I get the error message "MCMC error 0315: prior variance matrix is not positive definitive" while running a multinomial model using MCMC, what exactly do I need to do to the numbers in C1096? |
- I'm trying to run a two-level multinomial model using MCMC: the outcome variable has 4 categories (cat 1 n=6168, cat 2 n=1160, cat 3 n=1608, cat 4 n=1393). The dataset comprises 200 areas and 10,329 individuals. When switching the estimation control from mql/pql to MCMC I get the following error message: "MCMC error 0315: prior variance matrix is not positive definitive" followed by "Error in MCMC start-up - estimation halted". I checked HELP in MLwiN and it indicated I needed to manually change a setting at column 1096 but insufficient information is provided as to how to do this. I subsequently sent an email to the multilevel listserv and received the following helpful response from Bill Browne…
- "the MQL method must have produced a variance matrix at a higher level that is non positive definite i.e. it estimates a correlation between random effects outside the range -1 to 1. In C1096 the numbers should correspond to the ones you see in the equations window and are starting from the highest level the lower diagonal elements of all the variance matrices. My advice is to run in MQL/PQL then before switching to MCMC go into c1096 and switch off diagonal elements to 0 - to check you are doing this correctly if you have the equations window up next to the data window then when you change the value in the data window the corresponding value in the equations window will change. Once you have changed all the off-diagonal elements to zero switch to MCMC and things should run".
- I followed Bill's advice and MCMC ran fine (which suggests I followed the advice correctly). However, I then added additional variables to my model and again went through the process of generating starting values using mql/pql and then switching to MCMC and the same two error messages returned, however, this time resetting the off-diagonal values to zero in c1096 did not fix the problem. And unfortunately my technical/programming knowledge of MCMC is very limited.
- Bill's advice sounds spot on and I am surprised that it didn't solve the problem the second time round. Note, if you add any additional random effects then the rows of c1096 that you set to zero the second time will differ from the first time. Make sure that the second time you run the model the variance-covariance matrices look as Bill described before starting the MCMC chain. Also check whether any of the on-diagonal terms (i.e. the variances as opposed to covariances) are zero. If so, try changing them to 0.001.
Note that not positive definite does NOT mean the matrix contains some elements which are negative. It is perfectly possible for a matrix which contains only positive elements to not be positive definite. Therefore when following the advice above it is important to change ALL the off-diagonal elements to 0, not just any negative off-diagonal elements.
|
| I get an error message asking me to 'generate unique ID codes'. How do I do this? |
- I am attempting to fit a 3-level proportional odds model with time (measured in months) as the first level, person (RID) as the second level, and institution (FID) as the third level. In setting up an ordinal model I get the error message
"Unexpanded level 2 id column (Month)must not contain consecutive repeated codes, in multivariate or multinomial models. Use GENErate command to generate unique ID codes".
I do not understand what is needed of the data and it isn't clear from the data manipulation menu how to use the GENE command. Please advise me on this.
- When you fit a multinomial model (whether ordinal or unordered), MLwiN needs your level 1 ID codes to obey the condition that the same value does not appear on adjacent rows of the dataset. So if your level 1 is Month, you can't for example have values of 1,2,3 for Individual 1 and 3,4,5 for Individual 2 because then the level 1 ID column will read '1,2,3,3,4,5,…' and that repeated 3 is a problem. (This does not apply to Normal response models, where repeated values like this are not a problem). To get around this, the simplest thing is to create a new variable to be your level 1 ID which has values 1,2,3,4,...,n where n is the length of your dataset: in other words a variable which is just a seqence from 1 to the length of the dataset, and which takes no account of any information about individuals or time points. This is what the error message is asking you to do when it tells you to use the GENErate command (more on this in a moment). You might think using a sequence like this as your level 1 ID is a problem because you are discarding all your information about when people were measured. However this information can still go into this model if you put Month in as an explanatory variable (you will probably want to do this anyway). The explanatory variable is the proper place for this information; all that is needed for the level 1 ID is that it should change from row to row, to signify that each row is a different occasion, it doesn't matter for the level 1 ID how far apart the occasions are or which people are measured on the same occasion.
The reason that repeated values are a problem is that MLwiN needs to create an expanded dataset when you fit a multinomial model, as described in the User's Guide. (With a Normal response model, MLwiN does not expand the dataset). The process that it uses to expand the dataset requires that the level 1 IDs satisfy this condition. In the expanded dataset, a new level 1 is created and the level you specified as level 1 is now level 2, so the level 2 referred to in the error message is the level you specified as level 1.
The sequence to use for level 1 in place of Months can be created either using a command or via the GUI. To create using the GUI, select 'Generate vector' from the 'Data Manipulation' menu. In the window that appears, select 'Sequence' under 'Type of vector'. Select any free column next to 'Output column' and type 1 next to 'Start number', the length of your dataset (this would be for example the length of your Month variable) next to 'End number' and 1 next to 'Step value' and click 'Generate'. To create using a command, in the Command interface window type 'Gene 1 n 1 c100', replacing n by the length of your dataset and c100 by any free column; for example if your Months variable is of length 4587 and column 48 is free you could type 'Gene 1 4587 1 c48'.
|
| Implausible results or convergence problems |
You may sometimes obtain estimates for a model which seem to be incorrect. Possibilities are:
- A value is clearly inconsistent with the model setup. This will often arise when you are running models which rely upon macros for implementation, such as MQL/PQL discrete response models. Check the settings carefully and in particular make sure that you have not overwritten any reserved columns or boxes.
- The estimates are unexpected. In this case you may, of course, have a genuinely new result, but you should also make sure that you check every aspect of your model setup carefully. Try taking out parameters and reinserting them and look carefully at residuals and associated Normal plots etc.
- Make sure that you have sorted the data within the worksheet to correspond to the multilevel structure you are modelling.
- It is often a good idea to centre continuous (but not discrete) variables around their grand mean - this can avoid numerical problems.
- Try building up your model in a different way. For example, start with a variance components model and add random coefficients rather than trying to fit many random coefficients from scratch.
|
| MLwiN is giving the standard errors of parameter estimates as 0, but I know from comparison with other software packages that the standard errors should not be 0 |
- I have been using MLwiN to fit an ordered categorical model. In the first instance I have been doing this for a single level model and have been replicating the analysis in SPSS. When I fit a model including only the ordered response variable SPSS and MLwiN give the same answers more or less. When I add a categorical fixed effect explanatory variable, all but two of the standard errors of the parameter estimates are set to zero in MLWin. They are not in SPSS. Is there any explanation for this? I have suppressed the warning of negative variance in the iterative procedure to obtain convergence, is this the key? Is there any way round it? I am keen to pursue the use of MLWin because it is more powerful and I am particularly interested in the standard errors of the predictions. Thanks.
- You can check whether these standard errors have really been set to zero or if they have been estimated but are very small. Check this by changing the display format for numbers from 3 decimal places to 3 significant figures. Go to:
options
Numbers (Display precision and missing value code)
and change the selection from ‘decimal format’ to ‘signif digit format’; in the box ‘# significant digits’ select 3. If standard errors are still zero then you have some kind of estimation problem. One reason may well be: MLwiN is more sensitive than SPSS and Stata and can fail to estimate models that SPSS and Stata provide numbers for. This can be a good thing as MLwiN will fail when a model is poorly specified whilst SPSS/Stata will throw out results and you may not realise you have a problem. So make sure that your model makes sense and that all the variables in the model are sensible and have sensible distributions. You could try to see if the problem applies with other explanatory variables and an alternative ordered response. Other possibilities are: You have done something wrong in setting up the model in MLwiN. Check very carefully that the summary statistics for your variables look the same as in SPSS. Make sure that you have read the relevant chapter in the User Guide (chapters 9 and 11).Then try repeating your MLwiN/SPSS comparison for the simpler case of a binary response (i.e. collapse your ordered outcome down to two categories). As for ‘suppressing the warning of negative variance in the iteratice procedure’, there is no variance (apart from the implicit probit or logit variance) in single level ordered models so you should not be doing this.
|
| Errors and crashes: Crashes/ problems opening MLwiN or worksheets |
| Why has MLwiN crashed? |
The most common reasons are:
- There is insufficient workspace - remedy - Increase worksheet size
- Too many windows programs open - remedy - Close some programs
|
| The European convention of using a comma as a decimal point causes MLwiN 2.10 to crash when using the Intervals and Tests window |
Changing the windows decimal separator to a point (.) resolves this problem. You can do this via the regional settings screen available from the Windows control panel. |
| MLwiN fails to open after changing the default worksheet settings |
Default Worksheet size set too high
If you set the default worksheet size to too high a level for the amount of RAM on your computer,
MLwiN will not be able to open on subsequent attempts to launch the software.
If you run into this problem do the following:
- Open the windows command line (Start
Run)
- Type the following in the windows command line:mlwin /sheetsize 5000. MLwiN will launch with its default worksheet size (5000k)
- In MLwiN, open the worksheet window (Options
worksheet)
- Set the default worksheet size to a lower level which is appropriate for your computer and then click the use as defaults button. MLwiN will now be able to open on subsequent attempts to launch the software.
|
| When I run MLwiN I get the error message "Pre or Post file does not exist (post)" |
- I have just successfully installed MLwiN. However upon initiation of MLwiN, I now receive the message "Pre or Post file does not exist (post)". Is this OK, or should it be repaired?
- The pre- and postfiles should be located inside a directory called discrete in your MLwiN directory (in C:\Program Files). The first thing is to check whether they are actually there. The file names should be 'PRE' and 'POST'.
If the files do not seem to be there, then try downloading and installing again. If you can see the files, then check that MLwiN is looking in the right place for them: from the Options menu select dIrectories and check that in the Fpath box it gives the path of the discrete folder (e.g. 'C:\Program Files\MLwiN v2.10\discrete') and in the Pre file box it says 'pre' and in the Post file box it says 'post'. If any of this has been mis-specified you can change it by selecting User defined settings above the boxes and then changing the text in the boxes as appropriate and clicking Done.
If this problem has been encountered when running a macro that specifies a path for these files, check that the path is correctly specified in the macro.
|
| I have a worksheet saved on a network drive that I can't open using MLwiN |
It is possible to open, work with and save worksheets on a network drive. However, you will need to have the drive mapped to a drive letter (and to type the filepath in this form, if you are opening/ saving the worksheet via commands). This is because MLwiN does not recognise the more recent notation for accessing a network drive directly. If using the GUI to open and save files, then provided the drive you want to access has been mapped to a drive letter, the procedure is just the same as when opening files saved on your local computer: simply browse to find the network drive (which will probably have its name in the longer more recent notation but with the letter after it in brackets), double click on it just as with a folder, and continue browsing to find your worksheet, then click on the worksheet and click Open. If using commands to open and save files, then for example if you have a network drive mapped to the letter Q and want to open 'file1.ws' in the Data folder in this drive, you would type:
"RETR 'Q:\Data\file1.ws' "
… instead of (for example):
"RETR '\\computername\sharedfolder\Data\file1.ws' ".
If you have a worksheet saved to a network drive which is not mapped to a drive letter then the simplest thing to do is to copy the worksheet to your local computer disk (or a memory stick) and open it from there.
This applies to both MLwiN worksheets and to SPSS, Stata, and Minitab worksheets (if using MLwiN version 2.10). |
| When I try to open the
worksheets for the LEMMA course I get an error message |
- I'm taking the LEMMA Course and until yesterday everything was working perfectly, but today I just couldn't open the MLwiN Datafiles. Every time I try I get this message: "Run-time error '5': Invalid procedure call or argument". I've already install the latest version (2.10 beta 6), but it didn't work. I tried several times uninstalling and reinstalling the software, but it didn't work… Any ideas of what can I do? Many thanks in advance.
- Has anything on your system changed between when you were able to open the worksheets and now? Examples might be other software being installed or removed. It might be worth manually clearing the MLwiN installation directory after removing it, before attempting to reinstall it. This is the directory where you chose to install it (usually C:\Program Files\MLwiN v2.10). If you have an account with administrative privileges on that machine it might also be worth checking to see if you have the same problems when running as that user.
- As I don't have administrator rights in my computer, it is a pain each time I need to install a software. Finally I could open the files. As I realized I can open the old example files, what I did was to save the files in my computer and then change the extension from '.wsz' to '.ws' I know it is not the best, but it worked.
|
| When I double click on one of the worksheets for the LEMMA course, it opens with notepad not MLwiN |
- When I click on the icon for 5.1.wsz in Module 5.1 I seem to get a notepad file that has a lot of blank characters, not a data file.
- First note that the datasets for the LEMMA course will not open with MLwiN version 2.02: you will need version
2.10 or the teaching version (which you can download for free here). Make sure that you have one of these versions of MLwiN. If you do have the appropriate version but are experiencing this problem, then first try reinstalling your copy of MLwiN. If this fails, you can try downloading again and then reinstalling. If that doesn't work or if you cannot do this, there are two options: a work around or a fix.
Option 1: Work around
When you click on the link to the datafile, choose Save to computer instead of Open. Then start MLwiN and open the data file from MLwiN's File menu.
Option 2: Fix
The fix is to reassociate the data file's .wsz extension with MLwiN (so that Windows opens them in MLwiN if you double click them,
or if you choose Open instead of Save to computer when downloading), which is pretty easy and takes just a minute.
Again, when you click on the link to the datafile in your browser choose Save to computer rather than Open, and save for example to My Documents. Then open up My Documents or whichever folder you have saved the file into in Windows Explorer - so you can see the data file's icon. If the cause of the problem with opening the files is the situation this fix is designed for, then it should not have the MLwiN logo on it. (If the data file does have the MLwiN logo then there is probably a different cause of the problem and this fix will probably not work).
The next part of these instructions is written correctly for Windows XP, but probably something similar will work for Windows Vista. Right click on the data file (i.e. click with the right mouse button, not the left one that people usually use), and choose the menu option Open With, then in the submenu select Choose Program. You should get a new window with a list of programs and a tick box labelled "Always use the selected program to open this kind of file" WHICH YOU
SHOULD TICK, then select MLwiN, then click OK. If MLwiN is not in the list of programs, click the Browse button and locate and open the "mlwin.exe" file. It would normally be in the folder C:\Program Files\MLwiN v2\ or C:\Program Files\MLwiN v2.10. (Note that the icon for the program should be pale blue and white, not yellow and dark blue: the program with the yellow and dark blue icon is version 2.02 which will not open the LEMMA datafiles).
After doing this, the data file on your computer should now have the MLwiN logo, and when you double click it should open up in MLwiN, and
when you open files directly from your browser, they should open in MLwiN too.
- I didn't reinstall, but I tried the work around (didn't work), and the fix (which did).
|
| Errors and crashes: Other errors/ problems |
| When trying to plot a graph, I get the error message 'Failed to open Graphics Server' |
- I have worked through Module 5 up to p. 8 (P5.1.2) but when I try to produce the caterpillar plot (clicking 'apply' in the Plots tab), I get the error message: 'Failed to open Graphics Server', Clicking OK or closing produces the error message: 'The server is already running',
Clicking or closing this message repeats the earlier message, and so on …
- The following may fix this
1) Save your worksheet
2) Close all open copies of MLwiN
3) Open task manager (right click the windows taskbar and click "Task Manager")
4) Select the "Processes" tab
5) Locate and select "gsw32.exe" from the list and click "End Process"
6) Open MLwiN
7) Load your saved worksheet
8) Try producing the graph again
|
| Why has my data become corrupted? |
It is possible that data in the worksheet columns can become corrupted unintentionally for a number of reasons:
- When fitting discrete response models the data are transformed during the course of an iteration and later transformed back to their original values. If a problem occurs which interrupts an iteration the data may remain in their transformed state. If your data have been corrupted you must retrieve a correct worksheet. Corruption of data in this way will often become apparent when highly implausible results are seen
- An abnormal termination while running a generalised linear model or a macro may result in data corruption.
- You may have read data (at input or subsequently) into a protected or reserved column and since these columns are used by MLwiN to store various parameters anything in these columns will be overwritten.
You should always make sure that you have a recent backed up worksheet copy which you can retrieve if you get into this situation |
| I can't get the LOGOn command to work |
The command LOGO <filename> does not work in the current version of MLwiN. However the commands LOGO <filename> 1 to switch on logging and LOGO 0 to switch logging off should do exactly the same thing the LOGO <filename> command used to do. Note that the <filename> needs to be the full path of the file, unless you have set the working directory to the directory that the file is in using Options dIrectories and then setting the directory using the current directory box.
To make things clearer, here is an example of using the LOGO command that should work:
ECHO 1
LOGO 'C:\Documents and Settings\All Users\Documents\logfile.txt' 1
SAY 'Hello World'
LOGO 0
(Note that 'ECHO 1' turns on echoing, which means that commands coded in macros will be displayed).
If this does not work an alternative way to preserve a record of what has been done in a session is to copy the commands from the Command interface window (see When I set up a model in the Equations window, is there any way to record the commands that will do the same thing?). Also note that you can save sequences of commands in a macro to execute again later (see How can I easily save and run commands?) |
| I noticed a difference in values I get between versions of MLwiN |
- I fitted a model using version 2.02 and then upgraded to version 2.10
Beta and fitted the model again. All the estimates of the coefficients and
variance parameters were the same, but the -2*log(likelihood) value given
was different.
- We have found in some rare cases that MLwiN 2.10 Beta (subversions 1-9) produced incorrect likelihoods for models estimated with (R)IGLS. The problem does not apply to models estimated in MCMC (since there is no likelihood calculated for these models; the problem does not apply to the DIC). The problem was not present in previous release versions (2.02 and earlier). The problem was fixed in MLwiN Beta 2.10.
Our research has found the problem to only occur in single level models with a large number of cases (>10,000). Furthermore, the problem does not occur if a second level is declared but no variance component is fitted at level 2.. If users have fitted a single level model in MLwiN 2.10 Beta (subversions 1-9), with only a single level hierarchy defined, the -2*log(likelihood) should be regarded as suspect and the model should be re-run in MLwiN 2.10. Further details…
|
| Other (apparent) errors/problems |
|
| How do I…? Categorical variables |
| How do I update the category labels for a categorical variable? |
Highlight the variable in the Names window and press the Toggle Categorical button at the top of the window twice. This switches the variable to continuous and back to categorical, and when it is switched back to categorical the category names are re-created. (If you have specific names you want to give the categories you will need to re-enter these by clicking on Categories). The variable will now have one category per distinct value (and any categories which had no observations will have disappeared).
The commands to do this are CATN 0 C (changes the variable C from categorical to continuous) and NTOC C (changes the variable from continuous to categorical, assigning default category labels to the codes). To give specific labels to the categories, instead of NTOC use CATN as described below.
An alternative is to use the command CATN to just change the categories that are wrong. An unwanted category (usually a category with no observations) can be removed by using CATN 0 C N, where C is the name of the column containing the categorical variable and N is the number of an unwanted category. This command removes category number N from the list of categories for the variable. For example, if you are working with the tutorial dataset and have deleted all records for which vrband = vb3, you could type CATN 0 'vrband' 3 and vrband would then have two categories, vb1 and vb2. To assign a category label to a code, use CATN 1 C N name, where C is again the name of a column containing the categorical variable, N is the code that you wish to assign a label, and name is the label you wish to assign. For example, suppose you have recoded vrband so that all the observations that were in category 3 now have code 4, but you still wish these observations to be in category vb3. (This is not something that in general it would be useful to do; this example is only for illustration of how the command works). You would type CATN 0 'vrband' 3 to remove the category with code 3, and then CATN 1 'vrband' 4 'vb3' to assign the label vb3 to the category with code 4. Note that you can change several categories at once. For example to assign the label vb3 to code 4 and vb2 to code 5, you can use CATN 1 'vrband' 4 'vb3' 5 'vb2'.
For documentation of CATN, see p18 of the Command Manual, and for documentation of NTOC see p22. |
| I've recoded a categorical variable, but the list of category codes I see when I press Categories has not been updated |
When you recode a categorical variable, the category code information is not automatically updated. So if you recode a variable so as to collapse 4 categories into 3, the variable will still be considered to have 4 categories (though one will have no observations); or if you recode all observations in category 3 to have the value 10, and you do not already have a category with code 10, then category 3 will still have its original code (and will have no observations) and observations with code 10 will not be considered to belong to any category. In order to update the category information after recoding, follow the procedure described in the FAQ How do I update the category labels for a categorical variable? |
| How do I get rid of an unneeded dummy variable/ response category? |
- I entered a categorical variable into my model as an explanatory variable, and although there are no observations in one of the categories of this variable, a dummy variable for this category was still included in the model. How can I remove it without removing all the dummy variables for this categorical variable?
- When you enter a categorical variable as an explanatory variable (or as the response of a multinomial model), MLwiN will enter in all the categories of the explanatory variable even if some categories have no observations. The situation where some categories of a variable have no observations may occur if the variable had some categories with no observations in the original dataset you imported it from (perhaps a Stata or SPSS worksheet), or if you have recoded the variable so that some categories are combined (for example if you recode so that all observations in Category 3 are now in Category 2), or if you have deleted all records in that category of the variable from the dataset.
An unneeded dummy explanatory variable can be removed from the model as described in the FAQ How can I constrain fixed parameters to zero?. Alternatively, before entering it into the model, you can update the category labels as described in the FAQ How do I update the category labels for a categorical variable? Now when you enter the variable into a model (whether as an explanatory variable or as the response of a multinomial model), only the new set of categories will be entered: any of the categories which had no observations will no longer be included. Note that if after following this procedure you recode the variable so that again some categories have no observations, or alter the data in any other way so that this happens, the categories with no observations will not automatically disappear: you will need to update the category labels again.
A final possibility, if you know the code of the category which has no observations, is to use CATN to remove just the category/categories with no observations. How to do this is described in the FAQ How do I update the category labels for a categorical variable?; alternatively see p18 of the Command Manual. Since this removes the category from the variable, it will mean both that no dummy for that category is included when the categorical variable is entered as an explanatory variable and that the category will not be included when the categorical variable is used as the response of a multinomial model.
|
| Other questions about categorical variables |
|
| How do I…? Commands and macros |
| How can I easily save and run commands? |
- Is there an easier way to save and run a set of commands than copying into a text document and pasting one by one into the Command interface?
- Yes, if you have a particular sequence of commands you want to run again, then perhaps the easiest thing is to put them in a macro. From the File menu, select New Macro. In the window that appears you can type your commands. Nothing will happen until you press the Execute button at the bottom. When you press the button, all the commands will be carried out. Most commands can be put into a macro in exactly the same way as you enter them in the Command interface window (there are a few exceptions). The macro can be saved at any time by selecting Save Macro or Save Macro as from the File menu (while the macro window is the active window). It is also possible to have open several macros at once.
For more information on macros, see pp 82-84 of the Command Manual.
|
| When I work with MLwiN using the various windows, is there any way to record the commands that will do the same thing? |
- Is there any way I can capture from the Equations window what I am setting up directly into a macro? The re-modelling in the Equations window is time consuming, and if I could obtain this in a macro, I could simply edit some of the macro commands.
- Yes, you can. The easiest way to do this is to open the Command interface window (Data Manipulation
Command interface) and deselect the User check box at the top. Now when you set up or make changes to your model in the Equations window the corresponding commands will appear in the Command interface window. Note that the corresponding commands will also appear when you use any other window: this does not only apply to the Equations window. Make sure you have the MLwiN command manual handy for interpreting the syntax: MLwiN will sometimes also issue some commands that you don’t need and this can cause problems (see here for an example of how to pick out the commands you want). The Command manual has not been updated recently however, so see also the Manual supplement for version 2.10 which documents commands for the new features in version 2.10 and also in section 8.4 includes some commands which work with version 2.02 as well as version 2.10 (note that not all the commands in this section work with version 2.02; if you are using version 2.02 you will need to experiment to see which work with this version). See also the MCMC manual, which includes some examples of writing and using macros which provide useful tips that apply in a more general context than just MCMC estimation.
|
| I copied commands generated by
MLwiN when I was using the GUI and pasted them into a macro, but when I
ran the macro I got an error message |
See: Interpreting commands generated by MLwiN |
| Output from my macro is not appearing in the Output window |
To see output in the Output window, you need to turn echoing on. This can be done by putting the line 'echo 1' at the top of your macro, or by typing 'echo 1' in the Command interface window before you start running macros (it only needs to be typed once per session, but it doesn't matter if you type it more times or if it is produced by a macro every time it is run). Note that the ‘print’ command will not do anything unless echoing is on. |
| Can I run my old macros written in MLn under MLwiN? |
Yes, you can simply run the old macros under the new window system. However, MLwiN has modified
some commands from MLn and included several new commands. Some toggle commands in MLn now have a parameter option to specify the exact mode, such as EXPL which is now EXPL 1 (or 0), START which is now START 1 (or 0), LOGO which is now LOGO 1 (or 0), etc. Therefore if an error message
occurs in MLwiN when running a MLn macro, check the online help of MLwiN by looking for the Command **** (the command name) on which the error occurred to find the correct syntax. See also missing data web site |
| Will macros written for version 2.02 work with earlier versions of MLwiN? |
A number of new commands were introduced between versions 1.2 and 2.02, but unfortunately we do not have a list of exactly which ones these were. (Also a small number of commands do not work in exactly the same way in newer compared to older versions: for example some commands, e.g. EXPL, require a 1 after them in newer versions but not older versions). The majority of commands, however, should be identical in 1.2 and 2.02. Please also note that in version 1.2 the default number of worksheet columns was lower and therefore the model estimates are not stored in c1090 to c1100 as in version 2.02 but in earlier columns.
(See also the FAQ When I work with MLwiN using the various windows, is there any way to record the commands that will do the same thing?) |
| Other questions about commands and macros |
|
| How do I…? Discrete response models |
| How can I specify a binomial response model using a macro? |
The Manual Supplement for version 2.10 (available to download for free) gives an example of running a logistic regression from a macro on pp 86-87, and this is followed by more information about the special commands needed (pp 88-89) which allow you to specify that you have a binomial response, which link function you want to use, the estimation method, and the denominator. |
| What is the command for specifying the denominator when fitting a binomial or multinomial model? |
The command is DOFFs. For documentation see p88 of the MLwiN version 2.10 Beta manual supplement.
Note that although the documentation for the command is in the manual supplement for version 2.10, this command does also work in version 2.02, as do most of the other commands documented under the heading 'Other useful commands for specifying discrete response models in macros'. |
| Why is the log-likelihood value not displayed in the Equation Window when a Binomial or Poisson model has been fitted? |
For these models the likelihood value produced by the IGLS algorithm is an approximation, and could be unreliable in some extreme cases. For making statistical inference, you may use Wald test from the Interval and Tests window. But you can still get an approximate 2log-likelihood value for a model using the command LIKE from the Command Interface window |
| MLwiN gives different results to SAS and Stata |
- I have a logistic multilevel model (random intercept) and calculated the same model in Stata (xtmelogit), in SAS (proc glimmix) and in MLwiN. The fixed effects are quite similar but the random effects and the intercept differ. They are the same for SAS and Stata but not for MLwiN. In Stata and SAS I do get significant random variation in the intercept but in MLwiN I don't. How could that be? Do you have explanations??
- SAS and Stata use maximum likelihood estimation for discrete response models while MLwiN uses quasi-likelihood methods (see p128 of the User's Guide for more details). The different estimation methods can produce slightly different results, and in the case where results lie close to the borderline for significance it is possible to find parameter values significantly different from 0 using one estimation method but not the other.
It is known that quasi-likelihood methods give estimates for the random parameters which are biased downwards (see Rodríguez and Goldman, 2001). For this reason, we do not recommend that users take results from these estimation methods as final, but instead recommend that they should use MCMC estimation (after using quasi-likelihood estimation to get starting values). MCMC estimation gives unbiased parameter estimates; note however that the parameter chains and other diagnostics should be carefully checked so as to be sure that the burn in has been long enough for the chains to move away from what will probably be bad starting values, and that after burn in the chains have been run long enough to get accurate estimates. See MCMC estimation in MLwiN for more details of the diagnostics available and how to tell whether burn-ins and chains are long enough; and see our free online training materials (Section C 7.7* and the Technical Appendix to Module 7*) for more detailed discussion of estimation methods for binary response models.
Reference: Rodríguez G. and Goldman N. (2001) Improved estimation procedures for multilevel models with binary response: a case-study. Journal of the Royal Statistical Society, Series A, 164, 339-355
*You will need to log in to our training materials to view these web pages. Further details about the training materials
|
| How do I interpret the estimates for the fixed part parameters in a Poisson model? |
The interpretation of the output depends on how you have set your model up, in particular on what you have used as the offsets. If you have set your model up so that the response variable is for example the number of deaths and the offset variable is the log of the number at risk, then the estimates that you see in the Equations window for the coefficients (in the fixed part of the model) will be the effects on the log rate, so that you need to take the exponential of each coefficient to get the effect on the rate. If on the other hand you do as in the example in Chapter 12 of the User's Guide and use as the offset variable the log of the expected number of deaths, then the estimates that you see in the Equations window for the coefficients of the fixed part of the model will be the effects on the log ratio of observed to expected deaths, and the exponentials of the coefficients will be the effects on the ratio of observed to expected deaths. (Note that it is more usual to use the log of the number at risk rather than the log of the expected number of deaths as the offset variable where information on the number at risk is available). So you will need to interpret your results based on how your model is set up. |
| Other questions about discrete response models |
|
| How do I…? Longitudinal data |
| Does MLwiN need repeated measures data to be in long or in wide format? |
- I'm trying to set up a model using repeated measures data (with occasion within student within school). Looking at the example data, I see that level 1 ID has to be unique. Does this mean that the repeated measures have to be stored in the "wide" format?
- No, MLwiN actually requires repeated measures data to be in the long, not the wide, format when it comes to analysis. The User's Guide (chapter 13, section 13.2) shows how to convert data from wide to long format. However this does not mean that the level 1 IDs are no longer unique. In repeated measures analysis, occasion (time) is regarded as level 1, student (in your case) becomes level 2, and so on. Nor does it matter that the natural level 1 ID, Occasion Number, will have the same value for more than one row in the dataset since we do not have to use this as our level 1 ID but can create a new ID variable. A unique level 1 ID, which will give a different number to each occasion-person combination, can be created by selecting Generate vector from the Data manipulation menu, and selecting Sequence under Type of vector, choosing a free Output column, typing 1 next to Start number, the length of your long dataset next to End number, and 1 next to Step value, and then clicking Generate. This can then be used as level 1 in the model, and a separate variable such as Time, Year or Occasion Number can be used as an explanatory variable.
|
| Is it possible to define an AR process for the random effects at level 1? |
- Is it possible to define an AR process (autocorrelation structure) for the level 1 random effects when we have repeated measures data?
- Yes. For details of how to do this (with an example), see Section 5 on p72 of the MLwiN version 2.10 manual supplement. Note that although this is description appears in the MLwiN version 2.10 manual supplement, in fact the process is exactly the same if working in MLwiN version 2.02.
|
| When should I allow level 1 errors to be autocorrelated? |
Generally, you would allow level 1 error terms to be correlated when the observations are close together in time so that you would expect that an observation will be similar to the previous observation (the phenomenon known as 'autocorrelation'). When the observations are further apart in time so that you expect that, after you have taken the mean response for the individual into account, there is no relationship between an observation and the one before it, the observations are not autocorrelated and you do not need to allow the level 1 error terms to be correlated. What counts as 'close together in time' and what counts as 'further apart in time' depends on exactly what you are measuring. Sometimes it is hard to decide before modelling whether the level 1 error terms should be correlated or not. In that case it is a good idea to fit a model which allows the level 1 error terms to be correlated and compare that model with one which does not allow the level 1 error terms to be correlated to see whether the level 1 error terms do need to be allowed to be correlated. |
| Where are the time series macros? |
- I have a repeated measures dataset and would like to allow the level 1 error terms to be autocorrelated. I heard there were some macros which would do this, but cannot find them anywhere on your site.
- The macros were removed when version 2.02 was released since they were found to be unstable. We now recommend an alternative method for setting up and running these models. This method is described in detail in Section 5 of the MLwiN 2.10 Beta Manual Supplement which is available from here. (Note that although this appears in the MLwiN 2.10 Beta Manual Supplement, exactly the same method can be used with MLwiN version 2.02). However, this method will only work with a Normal response model. Unfortunately we no longer provide the facility to estimate a binary response model with autocorrelated errors in MLwiN.
|
| How do I…? MCMC |
| Can I change the length of the burn-in without re-running the model? |
- I've run a model using MCMC, and when I looked at the trajectories for the parameters I realised that I had not set a long enough burn-in period. Can I change this retrospectively, or will I have to run the model again specifying the longer burn-in?
- It is possible to get parameter estimates based on a longer burn-in after you have run a model. If you click on the trajectory graph for any parameter and answer Yes to the message Calculate MCMC diagnostics? then the MCMC diagnostics window will appear, and if you click on the Diagnostic Settings button at the bottom of this window you will see a box at the bottom of the window that appears, with Increase burn-in length by next to it. Type in here the iteration number up to which you want the burn-in period to run (this will give a burn-in length of the number you type in plus the burn-in length you were already using to run MCMC) and click Apply and Done. The 'posterior mean' estimate in the MCMC diagnostics window is now the parameter estimate under this longer burn-in and the 'SD' is the standard error of the parameter estimate. You can now click on the trajectory graph for any other parameter and the 'posterior mean' and the 'SD' will be adjusted to take into account the longer burn in (i.e. you do not need to click the Diagnostic Settings button for all the other parameters). Note however that the estimates displayed above the trajectory graphs in the Trajectories window will not be updated, and nor will the estimates displayed in the Equations window, the estimates stored in c1096-c1099, or the stacked parameter chains in c1090. Note also that the length of the monitoring chain (the iterations after the burn-in period) will be reduced by the same amount that the burn-in is increased.
|
| Running MCMC from a macro doesn't appear to work |
When you run MCMC from a macro, it can appear as if it gets stuck in the burn-in period. This is due to the behaviour of the display: when MCMC is run from a macro, the 'Burning in' message is displayed just as it is when you use the GUI, but when the burn in is finished and the monitoring chain iterations start, the progress bar that you see when running from the GUI does not appear (and so the 'Burning in' message does not disappear). You can check that the requested iterations have in fact taken place by scrolling down to c1090 in the Names window (this being the column where the stacked chains of parameter estimates are stored). c1090 should contain a number of entries corresponding to the number of requested iterations times the number of parameters.
There is no way to get the progress bar to appear at the bottom during the monitoring chain phase; however you can make the numbers in the Equations window update during estimation as they do when you use the GUI using 'Start 1' instead of 'Start' (see section 8.4.1 of the Manual Supplement for v2.10 for more details). |
| Other MCMC questions |
|
| How do I…? Parameters and parameter estimates |
| Where is the model fitting information stored in MLwiN? |
Parameter storage columns: MLwiN routinely stores the random parameter estimates in C1096 (for more information see the question 'Where can I find the random parameter matrix in MLwiN? How do I use this to work out the residual ('unexplained') variance at each level?'), and their variance-covariance estimates in C1097. Since the variance-covariance matrix is symmetric, C1097 contains only the elements in the lower triangle of the matrix. The fixed parameters and their variance-covariance matrix are stored in C1098 and C1099 respectively. They are the final estimates at convergence of the model. The fixed and random parameter estimates for each iteration, are stacked and stored. The -2log-likelihood values for each iteration are stacked and stored in C1091 (for normal response models only). If missing values are defined in the worksheet through the Options window, MLwiN stores in C1094 the binary code as 1 for missing and 0 for others. These columns should not be used by users to store data. If they do, their data in these columns will be overwritten without warning during estimation. |
| Is it possible to output the error variance-covariance matrices for parameters? |
Yes. The error variance-covariance matrix for random parameters is stored in c1097 and the error variance-covariance matrix for fixed parameters is stored in c1099 (see the FAQ Where is the model fitting information stored in MLwiN? for more details). Since these matrices are stored as columns in MLwiN, they can be output in the same way as any other column. For MLwiN version 2.10 Beta the simplest way is to use the Copy button in the Names window; for details of how to do this see Section 8.2.4 on p82 of the MLwiN version 2.10 Manual supplement. For MLwiN version 2.02, the way to output columns is using the ASCII text file output option from the File menu; for details of how to do this see our Getting data out of MLwiN page. |
| Is it possible to constrain parameters in MLwiN? |
- When running a model in MLwiN, is it possible to constrain some of the parameters so they are forced to take on specified values instead of being freely estimated?
|
| How can I constrain random parameters to zero? |
- Is there a way of constraining some of the covariances between the random effects to equal 0?
- Yes, it is easy to constrain covariances between random effects to be zero. Simply click on the relevant covariance
in the matrix in the Equations window, and in the window that appears asking whether you want to remove
the covariance click Yes. The covariance will then be constrained to be zero as you will see since it
will no longer appear as a coloured number with 3 decimal places but as a single digit 0 in black.
It is also possible to do this using the command CLRE; for documentation of this see p35 of the Command Manual.
See also How can I constrain parameters to values other than zero? and How can I constrain fixed parameters to zero?
|
| How can I constrain fixed parameters to zero? |
It is possible to constrain a fixed parameter to be zero; however it is generally more sensible to remove the fixed parameter from the model (which is equivalent to constraining it to zero). This can be achieved by clicking on the term in the Equations window and unticking the first box, labelled Fixed Parameter, in the window that appears, then clicking Done. Any random effects for that variable will remain in the model (since any ticked boxes below the Fixed Parameter box remain ticked), and if there are no random effects for that variable, it will disappear entirely. Therefore this method will allow you to specify a model with random effects for a variable but no fixed effects for that variable (that is, fixed effects constrained to zero), or to delete the dummy variable for one category of a categorical variable while leaving the dummy variables for the other categories in. Note that it is important when doing either of these things to be sure that this is something that makes sense given your data, model and research question.
It is also possible to remove a fixed parameter from the model using the command FPAR. Type FPAR 0 C, where C is the variable for which you do not want a fixed parameter to appear in the model. For example, if you are working with the tutorial dataset and wish to add vrband as an explanatory variable (with reference category vb1) but only want a dummy for vb2 to appear in the model, then after adding vrband in the usual way, type FPAR 0 'vb3'. FPAR 1 C will put variable C back in the fixed part of the model. For documentation of FPAR, see the Command Manual |
| How can I constrain parameters to values other than zero? |
NB see also the FAQs How can I constrain random parameters to zero? and How can I constrain fixed parameters to zero?
To do this using the GUI, first set up your model in the Equations window and then select Constrain Parameters from the Model menu. In the window that appears, select random or fixed as appropriate (near the bottom of the window), select a free column from the drop down box store constraint matrix for random [or fixed] parameters in and in the box # of constraints type the number of (fixed or random) parameters you want to constrain. A column will appear at the top of the window for each constraint. Using one column for each parameter, type a 1 in the row corresponding to that parameter, and the number you want to constrain it to in the row to equal. When you have entered all the constraints, click the button at the bottom of the window attach fixed constraints or attach random constraints.
For example, suppose you are working with the tutorial dataset supplied with MLwiN and want to estimate a two level model (with school as level 2 and student as level 1) with a random slope on standlrt, and that you want to constrain the variance of the random slope to be 0.5 and the covariance between intercepts and slopes to be 0.001. (Note that this is just for the purposes of illustration and it should not be inferred that this would be a sensible thing to do or that these would be sensible values to choose in this particular situation). First of all, set up the random slopes model in the Equations window. Then select Constrain Parameters from the Model menu. In the window that appears select random at the bottom because we are constraining variances and covariances which are random parameters. In the box # of constraints type 2 because we are constraining two parameters, the variance of the slopes and the covariance between intercepts and slopes. Now in the first column at the top of the window type 1 in the third row (because this is labelled school: standlrt/standlrt so it refers to the variance of the level 2 random effect on standlrt: i.e. it is the variance of the random slopes) and in the last row type 0.5 (because this is the value we want to constrain the variance to equal). Then in the second column type a 1 in the second row (because this is labelled school: standlrt/cons so it refers to the covariance between the level 2 random effect on cons and the level 2 random effect on standlrt so it is the covariance between intercepts and slopes) and in the last row type 0.001. Then select c11 (or any other free column) from the drop down box store constraint matrix for random parameters in and click the button attach random constraints at the bottom of the window.
Note:
- the constraints remain stored in the column you have chosen (c11 in this case) and if you want to use them again you can click the load matrix from button and select c11 from the drop down box that appears, rather than enter them in the columns
- more complicated constraints can also be specified this way: for example you can specify that two parameters should be equal (but not take on any specific value). See the FAQ How can I constrain two parameters to be equal? for how to do this.
- this method can also be used to constrain parameters to zero, though there are quicker and easier ways to constrain parameters to zero via the Equations window (see FAQs How can I constrain random parameters to zero? and How can I constrain fixed parameters to zero?)
There are also commands to do this. For random parameters (the variances and covariances) the command is RCON and documentation can be found on p40 of the Command manual. For fixed parameters (the betas) the command is FCON and documentation can be found on p37 of the Command manual. |
| How can I constrain two parameters to be equal? |
NB see also the FAQs How can I constrain random parameters to zero?, How can I constrain fixed parameters to zero? and How can I constrain parameters to values other than zero?
Constraining two parameters to be equal can be done via the Parameter constraints window accessed from the Constrain Parameters option from the Model menu (see the FAQ How can I constrain parameters to values other than zero?). Select fixed or random as appropriate at the bottom of the window, and in the # of constraints box, type 1 (this is the default so may already be entered). In the column labelled #1 at the top of the window, type a 1 nex t to one of the parameters and a -1 next to the other (it doesn't matter which way round). Next to to equal, type 0 (this is the default so it may already be entered). Choose a free column from the store constraint matrix for fixed parameters in drop-down box and click the attach fixed constraints or attach random constraints button at the bottom of the window. The difference between the two parameters is now constrained to be 0, which is equivalent to constraining the two parameters to be equal.
Alternatively, this can be done via a command. For random parameters (the variances and covariances) the command is RCON and documentation can be found on p40 of the Command manual. For fixed parameters (the betas) the command is FCON and documentation can be found on p37 of the Command manual. |
| Other questions about parameters |
|
| How do I…? Partitioning variance and the intra-cluster correlation |
| Partitioning variation across levels |
Partitioning variation across levels - more information |
| Where can I find the random parameter matrix in MLwiN? How do I use this to work out the residual (‘unexplained’) variance at each level? |
Variances and covariances are stored in column c1096. (See question Where is the model fitting information stored in MLwiN.) The values from the matrix at each level are stacked in this column, starting with the highest level and ending with the lowest level. For each level, only the lower triangular matrix is stored (since the full matrix is symmetrical), and the values in the matrix are entered row by row. So for example if you have set up a two-level model with a random slope (at level 2) on just one variable, then C1096 will contain in this order: the variance of the random intercepts, the covariance between the random intercepts and the random slopes, the variance of the random slopes, the level 1 variance. The matrices of random parameters can also be seen more clearly in the Equations window. Finally, in MLwiN 2.10 the same information can be found in the Results Table if the model is stored (by clicking ‘Store’ at the bottom of the Equations window and entering a name for the model when prompted). In the Results Table the information appears under ‘Random Part’, in the same order as in C1096, but there are row labels in the table indicating where the divisions between levels come and which variable is associated with each random parameter, which make it easier to understand.When fitting multilevel models, we usually talk about the ‘residual variance’ rather than the ‘unexplained variance’ as this is a more appropriate term. Working out the residual variance at each level is simple when dealing with a variance components or random intercepts model: in this case there is only one random parameter at each level and this is the residual variance at that level. For more complicated models (e.g. random slope models) this is not quite so straight forward and it is necessary to use MLwiN’s Variance function window (from the Model menu). Details of how to use this are given in Chapter 7 of the User Guide (on pp76-92) |
| How do I calculate the amount of variation at each level for nonlinear models? |
Partitioning variation in nonlinear models: MLwiN does not calculate the amount of variation at each level in nonlinear multilevel models unlike some other statistical packages. Due to the non-linear estimation procedure, the level 1 variance is no longer on the same scale as the level 2 variance, and so the terms at all levels do not add up to give the total variance for the model fitted. Therefore, the formula in a Normal response model cannot be used here. There are some other ways to obtain such measures. See further details. |
| What is the intra cluster correlation? |
Sometimes referred to as 'intra class correlation' in survey work, this strictly measures correlation between units within a higher level unit. For simple two-level variance component or random intercept models this is equivalent to the proportion of variance at the higher level, and so equal to the variance partition coefficient (VPC). |
How do I…? Residuals (Also: Watch Residuals slide presentation with voice-over and subtitles) |
| My residuals column is too short |
- I calculated the level 2 residuals for my model (with student as level 1 and tutorial group as level 2). I have 2124 individuals, and as expected I got 2124 residuals. But these residuals are all in the first 2124 rows of the worksheet. Since I have 3 students (level 1 units) per tutorial group (level 2 unit), the first 2124 rows should contain 2124/3 = 708 of the residuals, with the remaining residuals spread through the rest of the dataset. The rows after 2124 have no value in the residuals columns. How can I get the correct values of the residuals?
- The output that you describe here is correct. MLwiN worksheets are not organised with one row corresponding to one record across the whole worksheet (see the FAQ Does each row correspond to a single record right across the MLwiN worksheet?). When you use the Residuals window to calculate the residuals at any level, the output MLwiN produces consists of just one entry in each output column for each unit at that level. The output is not arranged with missing values in all other rows corresponding to that unit in your original dataset, instead the values occupy consecutive places in their column with no gaps. So for example the residual in place 10 of c300 will be the residual for tutorial group 10, not the residual for tutorial group 4, even though row 10 of your original dataset corresponds to student 1 for tutorial group 4. The residuals will be in the same order as the tutorial groups in your original dataset, so in order to see which residual belongs to which tutorial group you will need to create a short column which contains the names/ codes of the tutorial groups, but only has one entry per tutorial group (so 2124 entries in all). To do this, from the Data manipulation menu select 'unreplicate'. The Take data window will appear. In the 'Take first entry in blocks defined by' drop down box, select the column which contains your tutorial groups, and also select this column under 'Input columns'. Under 'Output columns' select any free column. Click 'Add to action list' and 'Execute'. The column you chose under 'Output columns' will now have 2124 entries consisting of the code for each tutorial group in your dataset, and the rows of this column will correspond to the rows of your residuals columns, so you will be able to see which residual corresponds to which tutorial group.
|
| Should the comparative SD output when I calculate the residuals be different for each row? |
Yes, there will be a different comparative SD for each level 2 unit. This is because the comparative SD will depend on the number of level 1 units within each level 2 unit, and, in the case of slope residuals, also on the value of the explanatory variable the slope is associated with. To see how these things will affect the value, see the formula for the 'comparative covariance matrix' which is given in Harvey Goldstein's book Multilevel Statistical Models on p53 of the 3rd edition or p58-59 of the 2nd edition (PDF downloadable for free). You can see an example of how the comparative SDs are different for each residual if you look at the caterpillar plot in Chapter 3 of the User's Guide (on p49). The error bars for each residual are a different length. The error bars are simply representing the residual +/- 1.96 times the comparative SD calculated by the residuals window, so this means that the value of the comparative SD is different for each residual. |
| How do I…? Standard errors |
| Sandwich estimators for standard errors |
Sandwich estimators for standard errors - more information |
| Can MLwiN produce robust standard errors? |
Yes: see this FAQ on sandwich estimators |
| Other questions about standard errors |
|
| How do I…? Using weights |
| Is the weights facility in MLwiN still experimental? |
- I seem to remember that the facility to specify sampling
weights (via the Weights option from the Model menu or the WEIGhts
command) was introduced as an experimental feature. Is this still the
case, or has thorough testing now been completed?
- Yes, this feature is still experimental. We are currently checking that the weights facility works correctly for continuous response models. We will publish the results on our website when finished. Until then, users should continue to regard the weights facility as experimental. Users should also note that MLwiN does not offer the facility to use weights when using MCMC estimation: any weights specified will by ignored by MCMC. For this reason we do not recommend the use of weights in discrete response models (since we recommend MCMC estimation for these models), and will not be checking that the weights facility works correctly for discrete response models.
|
| Differential weightings |
Differential weightings - more information |
| How do I use weights in a multiple membership model? |
- I am hoping to use MLwiN to estimate a multiple membership model. However, I'm a little confused. My factor that is the multiple membership factor is "rater". So "rater" is analogous to secondary school in the example in the Mlwin manual. I'm trying to use the "WTCOl" command but it tells me that "categories in the ID column must run from 1..N with no gaps. Use MLREcode to create consecutive codes". However, this does not make sense for multiple membership models as there are several ID columns which would all need recoding together.
- We strongly suggest that you estimate cross-classified and multiple membership models using MCMC estimation within MLwiN rather than IGLS. For full details see the MCMC manual. The manual details how the data needs to be set up and how to choose the weights etc
|
| Miscellaneous |
| Is it possible to fit splines in MLwiN? |
Yes, it is possible to fit splines in MLwiN, but there is no special facility for doing this: it has to be done manually.
- Create an indicator variable for each interval you want to fit a different polynomial to. In other words, if the cut points for your spline are 2, 7, 10, and expvar is your explanatory variable, and c20 onwards are all free columns then type in the Command interface:
calc c20 = 'expvar' < 2
calc c21 = ('expvar' >=2) & ('expvar' < 7)
calc c22 = ('expvar' >= 7) & ('expvar' < 10)
calc c23 = 'expvar' >= 10
name c20 'int1'
name c21 'int2'
name c22 'int3'
name c23 'int4'
(Or you can use the Recode by range window with a new destination variable each time)
- Create a new explanatory variable for each interval by multiplying 'expvar' by the indicator variable
calc c24 = 'expvar' * 'int1'
calc c25 = 'expvar' * 'int2'
calc c26 = 'expvar' * 'int3'
calc c27 = 'expvar' * 'int4'
name c24 'expvar1'
name c25 'expvar2'
name c26 'expvar3'
name c27 'expvar4'
- Add each explanatory variable as a polynomial (first adding in the indicator variable for that interval to give each polynomial a different intercept)
- Click Add Term at the bottom of the Equations window
- Select int1 and click Done
- Click Add Term again, select expvar1, tick the polynomial box and select the appropriate poly degree from the drop down box. Click Done
- Repeat for the other 3 intervals
- Set up constraints: the spline needs adjacent polynomials to have the same value and the same derivatives (up to the (n-1)th, where n is the degree of the polynomials) at the point where they meet. If there are random effects on any of the terms in the spline, we also need to make sure that the variance at each level is the same for adjacent polynomials at the point where they meet. For details of how to specify constraints on the parameters see the FCON and RCON commands in the Command manual, or in the Help see Introduction to MLwiN
MLwiN Interface Menu items Model Menu Constrained parameters.
For example, if we have a cubic for each polynomial and have added the terms in the order described above so that the Equations window shows
response = int1ij + expvar1^1ij + expvar1^2ij + expvar1^3ij + int2ij + expvar2^1ij + expvar2^2ij + expvar2^3ij +…+ eij cons,
and we have a level 2 random effect on each intercept and linear term ( int1ij, expvar1^1ij, int2ij expvar2^1ij,…), then to deal with the first join, between interval 1 and interval 2, we need 4 constraints:
These polynomials meet at expvar1 = expvar2 = 2. The first constraint is that the value of the polynomials should be the same at this point, so we need
+ × 2 + × 22 + × 23 = + × 2 + × 22 + × 23
Our constraint will be + 2 + 4 + 8 - - 2 - 4 - 8 = 0.
The second constraint is that the first derivatives of these polynomials are equal at this point, so we need
+ 2 × 2 + 3 × 22 = + 2 × 2 + 3 × 22
so our constraint will be + 4 + 12 - - 4 - 12 = 0
Similarly our third constraint, that the second derivatives of these polynomials are equal at 2, will be 2 + 12 - 2 - 12 = 0.
Our fourth constraint is that the level 2 variance due to the random effects on the first polynomial and the level 2 variance due to the random effects on the second polynomial should be equal at expvar = 2. In other words,
× 1 + × 1 × 2 + × 22 = × 1 + × 1 × 2 + × 22
Our constraint will be + 2 + 4 - - 2 - 4 = 0.
We will have two further similar sets of four constraints to make sure the value, derivatives and level 2 variance are equal at the other cut points.
|
| My level 2 variables are being treated as level 1 variables |
- I have some level 2 variables (i.e. variables which have the same value for all individuals in the same level 2 unit) which I put in my model as explanatory variables. The Equations window shows some of these with an ij subscript, not a j subscript, indicating that MLwiN thinks these are level 1 variables. How do I tell MLwiN they are level 2 variables?
- MLwiN works out automatically whether a variable is a level 2 variable. It does this by checking whether there is any level 2 unit where not all individuals have the same value of the variable. If there is (at least 1) level 2 unit like this, then it concludes that the variable is a level 1 variable. If there are no level 2 units like this, so that in all level 2 units all individuals have the same value of the variable, then it concludes that the variable is a level 2 variable.
If MLwiN gives one of your level 2 variables an ij subscript in the Equations window, showing that it thinks this variable is a level 1 variable, then you should check for mistakes in your data: a level 2 unit where not all individuals have the same value of the variable. You can do this by inspecting your data, of course, but a quicker and less error-prone method is as follows (to check a level 2 variable called popden):
From the Data Manipulations menu, select Multilevel data manipulations. In the window that appears, under Operation select Minimum and under On blocks defined by select the column which contains your level 2 identifiers. Under Input columns select popden and under Output columns select any free column (say c70). Click Add to action list and Execute. Now from under Operation select Maximum and select another free column (say c71), leave everything else the same and click Add to action list and Execute again. Now c70 contains the smallest value of popden for each level 2 unit and c71 contains the largest value of popden for each level 2 unit. If popden is supposed to be a level 2 variable and if there are no mistakes in the data then c70 should be identical to c71. We check this by typing in the Command interface window (available from the Data Manipulation menu) 'calc c72=c71-c70' -that is we are calculating the difference between c70 and c71 which we expect to be 0 for every row. We can now check whether the data are correctly entered by looking in the Names window. If c72 is 0 for every row then in Names window in the row for c72 we will see 0 under min and 0 under max. If we do not this tells us that in at least one case, c70 and c71 were not identical, so the minimum and maximum values of popden were not equal for each level 2 unit- i.e. for some level 2 unit different level 1 units had different values of popden. In order to find out which level 2 unit (and whether this is the case for more than one level 2 unit), we generate a column that will keep track of which row things are on. We do this by selecting Generate vector from the Data Manipulation menu, selecting Sequence under Type of vector, selecting a free column (e.g. c73) next to Output column, typing 1 next to Start number, the length of the dataset next to End number, and 1 next to Step value, and clicking Generate. Now from the Data Manipulation menu select Select or omit cases, under Condition to select or omit on type c72 != 0, under Input columns select c72 and c73 and under Output columns select two free columns (e.g. c74 and c75), and click Add to action list and Execute. We have now created two new columns which have kept only information from rows where c72 was not 0; c74 contains the value of c72 for those rows and c75 contains the value of c73 for those rows (and will tell us which row of the original dataset the values came from). If we look at c74 and c75 in the Data window we can see the nonzero values of c72 and which row of the original dataset each comes from. Now if we look at these rows in our original dataset (looking at the column with the level 2 IDs and popden) we should be able to see the discrepancy, and edit the data appropriately.
Note that in some cases we may find that we got a nonzero value simply because the value of popden was missing (then the corresponding value in c74 will be MISSING). If the value of popden is missing for all individuals in a level 2 unit, then we will find entries for these individuals in c74 and c75, but this will not stop MLwiN treating popden as a level 2 variable. If, on the other hand, the value of popden is missing for only some individuals in a level 2 unit, then MLwiN will treat popden as a level 1 variable.
|