MML Regression

The MML Regression procedure estimates a linear regression in situations where the dependent variable (e.g., proficiency on a subtest) is only partially observed. In such cases, we observe a pattern of responses to items rather than observing proficiency directly. Hence, an individual’s score is represented by a probability distribution over all possible scores instead of a single value for the dependent variable. Parameter estimates are obtained through marginal maximum likelihood.

As in classical linear regression, the dependent variable (e.g., proficiency, denoted q) in MML regression is modeled as a linear combination of independent variables and a normally distributed error term:

qi = a + b1X1 + b2X2 + ... + bkXk + ei

or in matrix notation: However, unlike in classical linear regression, since proficiency (qi) is only indirectly observed through the item responses, the dependent variable qi is represented as a probability distribution. The likelihood function for individual i is given by: The conditional likelihood of the observed response pattern given a proficiency (q) can be calculated directly from the item parameters and response patterns via marginal maximum likelihood, taking the item parameters as known.

We can evaluate the integral in the above equation through numeric quadrature on equally spaced points. For convenience, we will denote this value at quadrature point q for respondent i as piq. The likelihood function for a individual i, therefore, is given by: where the wi represent the sample weights. For clarity of exposition, these are omitted from the remainder of the discussion, but in practice they carry through as multipliers throughout.

In a simple random sample, the likelihood of the data is the product of the individual likelihoods across observations, typically estimated by taking logs and summing across observations. The function is no longer a true likelihood function when the observations are correlated (the likelihood would have to incorporate the covariance terms across observations, greatly increasing the complexity of estimation). However, under fairly general conditions, the point estimates obtained by maximizing the pseudo-log likelihood are consistent. Essentially, consistency requires only that the same model hold across sample clusters. The inverse of the estimated information matrix, however, does not provide an acceptable estimate of the standard error of the estimates. Appropriate standard errors are obtained through the Taylor-series approximation method based on the work of Binder (1983).

To run Regression left-click on the Statistics menu and select "MML Regression." The following dialogue box will open: Specify the independent variables and the dependent variable. You may also elect to change the design variables, change the starting values, suppress the constant, and select the desired output format.

If you wish to change the default values of the program, click the Advanced button in the bottom left corner and the Advanced parameters dialogue box shown here will open: You may now edit the values for quadrature points, minimum, range, subtest weight, convergence, maximum number of iterations allowed for convergence, and change the default optimization method. You may elect to create a diagnostic log.

When you are finished, click the OK button.

Click the OK button on the MML Regression dialogue box to begin the analysis.

Once the analysis is completed, you may perform either predicted values, posteriors, or t-tests on the results.

In NAEP, MML regression models are the models underlying plausible values. To obtain plausible values, NAEP estimates a large MML regression containing every background and contextual variable included in NAEP (known as conditioning variables), plus an additional random component. These marginal estimation procedures circumvent the need for calculating scores for individual students by providing consistent estimates of group or subgroup parameters.