Jackknife repeated replication (JRR) is a method to estimate the sampling variability of a statistic that takes into account the properties of the sample design. It provides unbiased estimates of the sampling error arising from complex sample selection procedures; reflects the component of sampling error introduced by the use of weighting factors that are dependent on the sample data obtained; and can be readily adapted to the estimation of sampling errors for parameters estimated using statistical modeling procedures. The general idea behind the Jackknife is to split a single sample into multiple subsamples and use the fluctuation among the subsamples to obtain an estimate of the overall sampling variability. The first step in this procedure is to divide the full sample into random groups. In turn, each group is removed from the full sample in order to create a subsample. The JRR procedure derives estimates of the parameter of interest from each of the subsamples, and calculates the variance of the full-sample estimate from the variability between the subsample estimates.
The jackknife procedure consists of three steps:
- Forming random groups from the full sample ;
- Constructing the replicate weights to be used in calculating the estimate of the parameter of interest for the subsamples; and
- Computing the estimates of variance for the parameter of interest.
1. Forming Random Groups
The random groups must be formed so that each random group has essentially the same sampling design as the parent sample . This ensures that the random group estimator of variance has acceptable statistical properties. In complex surveys this is typically done by using the primary sampling units (PSUs) as the random groups.
2. Constructing Replicate Weights
To produce replicate weights, each of the A random groups should be removed, in turn, from the file, and the remaining random groups weighted. The records belonging to the removed random group should be assigned a replicate weight of 0. For records from the remaining A-1 random groups, the base weight assigned when the full sample was weighted will be used as the starting point for computing the replicate weights. The base weights are first adjusted by multiplying them by an adjustment factor {A/A-1} to account for the fact that one random groups has been removed. Then, all additional weight adjustments used in the full-sample weighting process should be repeated. Removing each of the A random groups in turn means that these weighting procedures will have to be repeated A times to produce the A sets of replicate weights.
3. Computing Estimates of Variance
Let the population parameter o be estimated by ô, an estimator based on data from the full sample. The aim is to estimate the variance of ô, using the jackknife estimator to obtain V(ô).
Assume that A such groups have been constructed. Then, for each group (a = 1,... ,A), ô(a) is calculated based only on the data that remain after omitting the ath group. For a = 1,..., A we define
ôa = Aô - (A-1)ô(a)
The jackknife estimator of ô is the average value of the previously-calculated estimates
ô = (1/A)Sôa
and the jackknife variance estimator which calculates the variability between the subsample estimates, is defined as
V [1/(A(A-1))]S(ôa-ô)2
where the summation is done over a = 1 to A.
The Jackknife was originally used for bias reduction. In most survey applications, the jackknife is used only to capture the sampling variance, and a different formula is used. The replicate estimate ô(a) is estimated directly using the appropriate replicate weights (rather than as a pseudo-value):
- If the sample was stratified, the estimator is sometimes called JK2, and the estimate is given by V = S(ôa-ô)2
- If the sample was not stratified, the estimator is sometimes called JK1, and the estimate is given by V = [(A-1)/A]S(ôa-ô)2
Lee, E. S., Forthofer, R. N., & Lorimor, R. J. (1989). Analyzing Complex Survey Data. Sage University Papers 71: Quantitative Applications in the Social Sciences.
Särndal, C. E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag.
NAEP uses a jackknife repeated replication (JRR) procedure to estimate the sampling variability of all statistics presented in NAEP reports. This robust variance estimator takes into account sampling variability that arises due to the imprecision in the measurement of individual proficiencies as well as sampling variability resulting from the features of the sample design (stratification, clustering, and weighting).
NAEP uses the JK2 variant.