Table Of Contents

- Manual
- Getting Started
- Starting the Program
- Retrieving Data
- Manipulating Data
- The Variable List
- The Variable List Menu
- Filter Observations/Selecting
- Add New Variables
- Delete Variables
- Edit Metadata
- Set Replicate Weights
- New Variable Reserve
- Edit Value Labels
- Dummy Code Categorical Variable
- Collapse Categories of Categorical Variable
- Set Missing Values
- The Expression Evaluator

- Saving and Re-running Actions

- Sampling
- Procedures
- Measurement Models
- MML Models for Test Data
- Other Available Procedures

- Graphics
- Tools
- Estimation Methods
- Optimization Techniques
- Variance Estimation

- Post-hoc Procedures
- More user input instructions
- The User Interface
- Input Instructions
- Options
- Output Precision

- Glossary of Terms and Symbols

- Getting Started

MML Composite Regression

Large-scale assessments typically cover domains with multiple sub-domains. For example, a reading test might include two scales: reading for literary experience and reading for information. These scales, measured indirectly through the test items and background variables, are assumed to be correlated (though not perfectly) with one another, and conceptually distinct. A summary measure of one’s reading performance is formed as a weighted average of these two scales. These summary scales are known as *composite scales*.

Procedures developed to accommodate the analysis of composite scales are based on either numerical integration or an approximation to the integrals (Thomas, 1993), and prove dauntingly slow. MML composite regression develops an extremely efficient alternative procedure for the analysis of composite scales that does not require more than two-dimensional numerical integration or a complex approximation, resulting in a computationally very fast algorithm that does not sacrifice statistical efficiency.

Regression parameters for composite scales are typically obtained through some simultaneous estimation procedures, or through separate estimation for each scale and estimation of the pairwise correlations. The former procedure is usually preferred on the grounds that higher-order integration yields efficiency gains. In simultaneous estimation, calculating the marginal maximum likelihood statistics requires integrating over one dimension for each subscale on the assessment. Such numerical integration becomes intractable when more than three or four dimensions exist.

To address this problem, Thomas (1993) proposed a Laplace approximation to the integrals that solved many of the difficulties associated with the integrated normal approximation. While this approach, which remains in use today, is dramatically faster than numerical integration, it is still dauntingly slow due to the many iterations (100 or more) required to reach convergence by the EM algorithm.

Cohen and Jiang (1998) have shown that the multidimensional integration that poses the computational problem is unnecessary. The composite scale situation is analogous to a Seemingly Unrelated Regressions (SUR) model-a model that provides no more efficient estimates when regressions including identical regressors are estimated simultaneously than when they are estimated separately (Greene, 1993). In other words, no efficiency is gained by doing a GLS estimation, which usually results in more computing steps and more time needed to obtain the results. Based on this principle MML composite regression develops a two-stage approach to obtain robust parameter estimates in a composite scale situation.

In the first stage, a one-dimensional marginal maximum likelihood estimation procedure is applied to each sub-scale separately to obtain the regression coefficients and associated standard errors of each sub-scale. These results are used to infer the conditional mean and variance of each sub-scale. In the second stage, a two-dimensional marginal estimation procedure is used to compute the covariance between each pair of scales.

We start from a multivariate multiple regression model with *p* latent variables representing proficiency or scale in several closely related subjects. For a sample of *n* examinees, the proficiency variables corresponding to examinee *i*, denoted by 2* _{i}* = q

where q* _{i}* is the proficiency variable corresponding to the

We can, therefore, consider the single-scale probability, instead of the total probability.

The second part of the likelihood function is the probability that an examinee with specific proficiency q* _{i}* will be sampled from the population. As in most situations, we assume that this probability has a normal distribution. Since we have multivariate scales, this distribution is a

Once the mean and standard deviation are obtained for each sub-scale, the only parameters left are the covariances between the sub-scales. To estimate the covariance matrix is equivalent to estimating the covariance between each pair of the sub-scales. Of the *p* scales, there are a total of (*p*-1)*p*/2 pairs of covariances that need to be estimated. For a particular pair of scales, denoted by q_{1},q_{2}) , the likelihood function, based upon the estimated parameters from stage 1, is given by

A two-dimensional marginal maximum likelihood estimation procedure, therefore, can be applied to obtain the estimate of q_{12}, which remains tractable no matter the numbers of the sub-scales.

Cohen, J., & Jiang, T. (1998). *Composite Assessment Scales and NAEP Analyses*. Washington, DC: American Institutes for Research.

Greene, W. H. (1993). *Econometric Analysis, (2 ^{nd} Edition)*. New York: Macmillan Publishing Company.

Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. *Psychometrika, 56, * 177-196.

Rubin, D. B. (1987). *Multiple Imputation for Nonresponse in Surveys*. New York, NY: Wiley.

Thomas, N. (1993). Asymptotic corrections for multivariate posterior moments with factored likelihood functions. *Journal of Computational and Graphical Statistics, 2,* 309-322.

To run MML Composite Regression left-click on the **Statistics** Menu and select "MML Composite Regression." The following dialogue box will open:

Specify the independent variable and the dependent variable. Note that the dependent variable is composed of multiple subtests from the same test. By default, all subtests are equally weighted. The advanced options allow users to change these weights. You may also elect to change the design variables, suppress the constant, and select the desired output format.

If you wish to change the default values of the program, click the *Advanced* button in the bottom left corner and the Advanced parameters dialogue box shown here will open:

You may now edit the values for quadrature points, minimum, range, subtest weight, convergence, subtest, and change the default optimization method.

When you are finished, click the *OK* button.

Click the *OK* button on the MML Composite Regression dialogue box to begin the analysis.

Once the analysis is completed, you may perform correlations among equations or t-tests on the results.

In NAEP many assessment domains are composed of multiple subscales. For example, the reading test includes two subscales: reading for literary experience and reading for information; The math test includes five subscales. The NAEP plausible values have been designed to support analysis of composite scales based on Thomas’ (1993) approximation to the integrals over multiple dimensions-one for each subscale.