Fundamentally you have a very large data set (700,000 individuals) and an unordered categorical response variable. MLwiN and other statistical packages are always going to struggle to fit a three-level model to these data. I think you will have to work with a smaller sub sample of these data, at least while you are exploring. When you decide on the final models for your paper, you can always then try to use the whole data.
If you type help runmlwin##quasilikelihood_estimates in the Stata viewer, you will see some comments on MQL1 and PQL2
mql1, mql2, pql1 and pql2 specify the linearization technique for fitting discrete response models by
(R)IGLS. All four quasilikelihood methods are approximate: pql2 is the most accurate but the least stable
and the slowest to converge, mql1 is the least accurate but the most stable and fastest to converge. We
recommend that model exploration is conducted using mql1. For any final model, we recommend fitting the
model using pql2 (or preferably MCMC) as a two stage process. First fit the model using mql1, then refit
the model using pql2 where the initsprevious option (or initsmodel(name) or initsb(matrix)) is specified to
use the mql1 parameter estimates as the starting values for fitting the model using pql2.
So while PQL2 gives less biases estimates than MQL1 it is rather flakey and the algorithm may well fail to converge at all. If after increasing the maximum number of iterations, MLwiN still fails to converge then one alternative is to use the MCMC engine where you use the MQL1 parameter estimates as starting values for the MCMC chains. MCMC does not suffer from the same biases as the quasilikelihood methods. However, MCMC is computationally very intensive and will also be very slow to fit models to such a large data set.
Yes we do report the results even if the IGLS algorithm fails to converge, but we do so with a warning message to this effect. This is no different from Stata's offical xtmixed, xtmelogit and xtmepoisson commands where you can specify estimation options which set a maximum number of iterations. These commands will also report the estimation results at the point where you hit this maximum number of iterations even though the model hasn't fully converged. No you shouldn't report results from models which have not converged in any final analysis.
Centre for Multilevel Modelling