Table Of Contents

- Manual
- Getting Started
- Starting the Program
- Retrieving Data
- Manipulating Data
- The Variable List
- The Variable List Menu
- Filter Observations/Selecting
- Add New Variables
- Delete Variables
- Edit Metadata
- Set Replicate Weights
- New Variable Reserve
- Edit Value Labels
- Dummy Code Categorical Variable
- Collapse Categories of Categorical Variable
- Set Missing Values
- The Expression Evaluator

- Saving and Re-running Actions

- Sampling
- Procedures
- Measurement Models
- MML Models for Test Data
- Other Available Procedures

- Graphics
- Tools
- Estimation Methods
- Optimization Techniques
- Variance Estimation

- Post-hoc Procedures
- More user input instructions
- The User Interface
- Input Instructions
- Options
- Output Precision

- Glossary of Terms and Symbols

- Getting Started

Sample Design

Sample design has two aspects: a selection process which includes the rules and operations by which some members of the population are included in the sample; and an estimation process for computing the sample statistics, which are sample estimates of population values. A good sample design requires the balancing of several important criteria: (1) goal orientation, referring to the research objectives, tailored to the survey design and fitted to the survey conditions; (2) measurability, denoting designs which allow the computation, from the sample itself, of valid estimates of sampling variability; (3) practicality, referring to translating the theoretical selection model into a set of easy-to-follow instructions for the field offices; and (4) economy, concerning the fulfillment of survey objectives with minimum cost and effort and maximum statistical efficiency. These four criteria frequently conflict, and survey researchers must balance and blend them to obtain a good sample design. No single best tool exists to obtain a good or desirable sample. However, a variety of sampling techniques and procedures exist that maximize one or more of these criteria.

The most general goal of sample design is that the sample design should aim to obtain maximum precision (i.e., minimum variance) for the fixed allowed cost (Kish, 1965). A sample is economical if the precision per unit cost is high, or the cost per unit of variance is low. To meet this objective, samplers typically rely on probability sampling, in which every element in the subpopulation has a non-zero probability of being selected in the sample. This probability is obtained through some mechanical operation or randomization.

Simple random sample (srs), where every unit has the same probability of selection, is the basic selection process. All other procedures can be viewed as modifications of srs that provided more practical, economical, or precise designs. There are five major types of modifications, or sampling methods:

Epsem (equal probability of selection method) sampling describes any sample in which the population elements have equal probability of selection. This form of sampling is widely used because it leads to self-weighting sample where the sample mean is a good estimate of the population mean. Epsem sampling can result either from equal probability of selection throughout (in which case srs is a special type of epsem), or from variable probabilities that compensate each other through the several stages of selection (through inverse weights for example).

In element sampling, the elements are also the only sampling units. Hence sampling units contain only one element. Cluster sampling, on the contrary, involves the selection of groups, called clusters of elements, as sampling units (e.g., schools; classrooms). Subsampling of the clusters results in multistage sampling where the selection of the elements results form selection of sampling units in two or more stages.

Stratification denotes selection from several population, called strata (e.g., race, sex, region), into which the population is divided.

Systematic selection, an alternative to random choice, denotes the selection of sampling units in sequences separated on lists by the interval of selection. Hence every *k*th sampling unit is selected.

Two-phase sampling, or double sampling, refers to the subselection of the final sample from a preselected larger sample, that provides information for improving the final selection. Multiphase sampling refers to the possibility of more than two phases of selection.

Kalton, G. (1983). Introduction to Survey Sampling. *Sage University Papers 35: Quantitative Applications in the Social Sciences*.

Kish, L. (1965). *Survey Sampling*. New York: John Wiley and Sons.

Sudman, S. (1976). *Applied Sampling*. New York: Academic Press.

Wallace, L., & Rust, K. F. (1996). Sample design. In N. Allen, D. L. Kline, & C. A. Zelenak (Eds.), *The NAEP 1994 Technical Report* (pp. 69-86). Washington, DC: U.S. Department of Education.

Although NAEP national samples consist of two types - the main NAEP samples and the long-term trend samples, and the state samples - each type is based on a similar complex four-stage design involving the sampling of students from selected schools within selected geographic areas across the United States, called primary sampling units (PSUs). The goal of the sample design is to secure a sample from which estimates of population and population characteristics can be obtained with precision while keeping the cost of administration down. NAEP samples maximize these objectives by using a stratified multistage probability sampling design that includes provisions for sampling certain population at higher rates. To account for the differential probabilities of selection, and to allow for adjustments for nonresponse, each student is assigned a sampling weight.

Another feature of the NAEP sample design is its effect on the estimation of sampling variability. Because of the effects of cluster selections (students within schools, schools within primary sampling units), observations made on different students cannot be assumed to be independent of one another, thus potentially biasing the estimates of sampling variability. To appropriately estimate sampling variability based on a complex design, NAEP uses robust variance estimation techniques such as jacknife repeated replication or Taylor series expansion procedures.

The NAEP sample design has four stages of selection:

- selection of geographic PSUs (counties or groups of counties);
- selection of schools within PSUs;
- assignment of session types to schools; and
- selection of students for session types within schools.

Separate samples are drawn for three different age classes (age 9/grade 4, age 13/grade 8, and age 17/grades 11 and 12) and for each age class the samples are of two distinct types: cross-sectional sample (i.e., “main samples”) and long-term trend samples. Separate sample of schools are required for the long-term trend sample and main samples because of various differences in the calendar period for test administration, the format of the administration, and the grade and age definition of the population of interest. In addition, for the main sample there is oversampling of nonpublic schools, and of public schools with moderate or high enrollment of minority students to increase the reliability of estimates for these groups of students. The four stages of selection are described below.

*primary sampling units (PSUs)*

In the first stage of sampling the United States is divided into a series of geographic areas called primary sampling units (PSUs). Each PSU meets a minimum size requirements and is typically a metropolitan statistical area (MSA), a single county, or a group of contiguous counties. The PSUs are classified into four regions (Northeast, Southeast, Central, West) each containing about one-fourth of the U.S. population. Within each region each PSU is classified as either MSA or non-MSA, yielding a total of eight subuniverses of PSUs.

Some PSUs are designated as certainty PSUs because of their size and are automatically included in the sample. The remaining PSUs (called non-certainty PSUs) are stratified by several socioeconomic characteristics within each subuniverse. One PSU per stratum from each non-certainty stratum is selected, with probability proportional to size. PSUs from the high-minority subuniverses are sampled at twice the rate of those from the other subuniverses.

*Sampling schools*. In the second stage of sampling, public and private schools within each of the selected PSUs are listed according to the three age/grades. An independent sample of schools is selected separately for each age/grades, thus allowing some overlap of schools across the samples. Schools within each PSU are selected (without replacement) with probabilities proportional to size, with oversampling of private schools and of schools with high minority enrollment.

*Assigning assessment sessions to schools*. In the third stage of sampling, schools are assigned a number of assessment sessions based on (1) whether they are selected for the main or long-term samples, (2) age/grades, and (3) school size. Sessions are assigned to schools with three aims in mind: (1) to distribute students to different session types (e.g., reading, U.S. history/geography) across the whole sample for each age class so that the target number of assessed students can be achieved; (2) to maximize the number of different session types that are administered within a given schools; and (3) to give each student an equal chance of being selected for a given session type regardless of the number of sessions conducted in the school.

*Sampling students*. In the fourth stage of sampling, a list of all grade-eligible and age-eligible students is prepared for each school for the age class for which the school was selected. Students are selected systematically until the target sample size is reached. In small schools it is likely that all eligible students are included in the sample. No student is assigned to more than one session per school.