![]() |
|
The MML ordinal tables procedure was developed to consistently estimate subpopulation distributions when the groups are defined by values of an ordinal variable (e.g., age groups, income groups). The general approach to estimation assumes that the ordinal variable represents a partial observation on an underlying continuous distribution (say x*). Thus, rather than directly observe values of x*, we observe only that it falls within some range (e.g., if x* represents age, then we observe x, which might take on a few categories such as 16-24, 25-34, etc.). We can estimate the parameters of the joint distribution (q,x*) along with the parameters defining the relationship between x (the ordinal variable) and x*. With these estimates in hand, we can infer the distribution of q within groups while retaining the assumption of a normal population distribution. Let q represent an unobservable latent variable, which is imperfectly measured by a series of items z (e.g, test items). Also assume that the relationship between the measured items and the underlying latent variable is known, letting x represent an ordinal variable defining the groups to be compared. Typically, in large scale assessment, the items are given to a large enough sample that the parameters of the model specifying the relationship between items and underlying traits are estimated with sufficient precision to ignore this uncertainty in the relationship (Mislevy, 1985; 1991). The normal distribution of q is given a priori, but the conditional distribution q|x) remains unknown. However, since x is ordinal, we can assume that x is a partially observed realization of the standard normal variate x*, such that and that the relationship between x* and q is linear. This assumption is analogous to that made in the ordered probit model (Zavoira & McElvey, 1975). The likelihood function is given by, Suppressing the dependence of the parameters on the right-hand side and employing the standard IRT assumption of conditional independence, we can write Recall that the probability of the item responses given q, (p(z|q)) is assumed known. Again invoking the definition of a conditional density, and replacing the observed x with the integral of x* over the range that would yield the observed value of x, yields
The conditional distribution f2 of these two normal distributions is also normal, with mean The parameters of the distribution f1 can also be estimated. Also, p(z|q) is a known function of the data (generally not normal) and may be directly calculated. Given the unpredictable form typically taken by p(z|q), it is prudent to approximate the outer integral over a finite number of points. Thus, defining Q quadrature points, we can write the likelihood function as, Where F(x) is the standard normal cumulative distribution function. Taking logs and summing over the sample observation yields the log likelihood for the sample data. This model yields estimates of the mean and variance of the proficiency distribution, the correlation between that distribution and the underlying distribution of x*, and the estimated thresholds along x* that correspond to cut points between the levels of x. These estimates can be used to construct “implicit tables,” that is, the predicted table that would result from the observed bivariate relationships. The simplest way to do this entails calculating the mean and variance of the (possibly doubly) truncated normal distribution within each cell, given the estimated mean, variance, correlation, and thresholds (see Maddala, 1983 p. 369-370). Tests of the significance of relationships apparent in the tables reduce to tests of the parameters of the original model. For example, a test of whether the cell means differ reduces to a test of whether the correlation parameter is significant and whether the difference between the threshold values is significant.
|