Jackknife resampling

Statistical method for resampling
Schematic of Jackknife Resampling

In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for bias and variance estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size n {\displaystyle n} , a jackknife estimator can be built by aggregating the parameter estimates from each subsample of size ( n 1 ) {\displaystyle (n-1)} obtained by omitting one observation.[1]

The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.[2]

The jackknife is a linear approximation of the bootstrap.[2]

A simple example: mean estimation

The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.

For example, if the parameter to be estimated is the population mean of random variable x {\displaystyle x} , then for a given set of i.i.d. observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} the natural estimator is the sample mean:

x ¯ = 1 n i = 1 n x i = 1 n i [ n ] x i , {\displaystyle {\bar {x}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}={\frac {1}{n}}\sum _{i\in [n]}x_{i},}

where the last sum used another way to indicate that the index i {\displaystyle i} runs over the set [ n ] = { 1 , , n } {\displaystyle [n]=\{1,\ldots ,n\}} .

Then we proceed as follows: For each i [ n ] {\displaystyle i\in [n]} we compute the mean x ¯ ( i ) {\displaystyle {\bar {x}}_{(i)}} of the jackknife subsample consisting of all but the i {\displaystyle i} -th data point, and this is called the i {\displaystyle i} -th jackknife replicate:

x ¯ ( i ) = 1 n 1 j [ n ] , j i x j , i = 1 , , n . {\displaystyle {\bar {x}}_{(i)}={\frac {1}{n-1}}\sum _{j\in [n],j\neq i}x_{j},\quad \quad i=1,\dots ,n.}

It could help to think that these n {\displaystyle n} jackknife replicates x ¯ ( 1 ) , , x ¯ ( n ) {\displaystyle {\bar {x}}_{(1)},\ldots ,{\bar {x}}_{(n)}} give us an approximation of the distribution of the sample mean x ¯ {\displaystyle {\bar {x}}} and the larger the n {\displaystyle n} the better this approximation will be. Then finally to get the jackknife estimator we take the average of these n {\displaystyle n} jackknife replicates:

x ¯ j a c k = 1 n i = 1 n x ¯ ( i ) . {\displaystyle {\bar {x}}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}.}

One may ask about the bias and the variance of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} . From the definition of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} is more involved since the jackknife replicates are not independent.

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

1 n i = 1 n x ¯ ( i ) = x ¯ . {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}{\bar {x}}_{(i)}={\bar {x}}.}

This establishes the identity x ¯ j a c k = x ¯ {\displaystyle {\bar {x}}_{\mathrm {jack} }={\bar {x}}} . Then taking expectations we get E [ x ¯ j a c k ] = E [ x ¯ ] = E [ x ] {\displaystyle E[{\bar {x}}_{\mathrm {jack} }]=E[{\bar {x}}]=E[x]} , so x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} is unbiased, while taking variance we get V [ x ¯ j a c k ] = V [ x ¯ ] = V [ x ] / n {\displaystyle V[{\bar {x}}_{\mathrm {jack} }]=V[{\bar {x}}]=V[x]/n} . However, these properties do not generally hold for parameters other than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

x ¯ j a c k {\displaystyle {\bar {x}}_{\mathrm {jack} }} could be used to construct an empirical estimate of the bias of x ¯ {\displaystyle {\bar {x}}} , namely bias ^ ( x ¯ ) j a c k = c ( x ¯ j a c k x ¯ ) {\displaystyle {\widehat {\operatorname {bias} }}({\bar {x}})_{\mathrm {jack} }=c({\bar {x}}_{\mathrm {jack} }-{\bar {x}})} with some suitable factor c > 0 {\displaystyle c>0} , although in this case we know that x ¯ j a c k = x ¯ {\displaystyle {\bar {x}}_{\mathrm {jack} }={\bar {x}}} so this construction does not add any meaningful knowledge, but it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of x ¯ {\displaystyle {\bar {x}}} can be calculated from the variance of the jackknife replicates x ¯ ( i ) {\displaystyle {\bar {x}}_{(i)}} :[3][4]

var ^ ( x ¯ ) j a c k = n 1 n i = 1 n ( x ¯ ( i ) x ¯ j a c k ) 2 = 1 n ( n 1 ) i = 1 n ( x i x ¯ ) 2 . {\displaystyle {\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }={\frac {n-1}{n}}\sum _{i=1}^{n}({\bar {x}}_{(i)}-{\bar {x}}_{\mathrm {jack} })^{2}={\frac {1}{n(n-1)}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}.}

The left equality defines the estimator var ^ ( x ¯ ) j a c k {\displaystyle {\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }} and the right equality is an identity that can be verified directly. Then taking expectations we get E [ var ^ ( x ¯ ) j a c k ] = V [ x ] / n = V [ x ¯ ] {\displaystyle E[{\widehat {\operatorname {var} }}({\bar {x}})_{\mathrm {jack} }]=V[x]/n=V[{\bar {x}}]} , so this is an unbiased estimator of the variance of x ¯ {\displaystyle {\bar {x}}} .

Estimating the bias of an estimator

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample.

Suppose θ {\displaystyle \theta } is the target parameter of interest, which is assumed to be some functional of the distribution of x {\displaystyle x} . Based on a finite set of observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} , which is assumed to consist of i.i.d. copies of x {\displaystyle x} , the estimator θ ^ {\displaystyle {\hat {\theta }}} is constructed:

θ ^ = f n ( x 1 , , x n ) . {\displaystyle {\hat {\theta }}=f_{n}(x_{1},\ldots ,x_{n}).}

The value of θ ^ {\displaystyle {\hat {\theta }}} is sample-dependent, so this value will change from one random sample to another.

By definition, the bias of θ ^ {\displaystyle {\hat {\theta }}} is as follows:

bias ( θ ^ ) = E [ θ ^ ] θ . {\displaystyle {\text{bias}}({\hat {\theta }})=E[{\hat {\theta }}]-\theta .}

One may wish to compute several values of θ ^ {\displaystyle {\hat {\theta }}} from several samples, and average them, to calculate an empirical approximation of E [ θ ^ ] {\displaystyle E[{\hat {\theta }}]} , but this is impossible when there are no "other samples" when the entire set of available observations x 1 , . . . , x n {\displaystyle x_{1},...,x_{n}} was used to calculate θ ^ {\displaystyle {\hat {\theta }}} . In this kind of situation the jackknife resampling technique may be of help.

We construct the jackknife replicates:

θ ^ ( 1 ) = f n 1 ( x 2 , x 3 , x n ) {\displaystyle {\hat {\theta }}_{(1)}=f_{n-1}(x_{2},x_{3}\ldots ,x_{n})}
θ ^ ( 2 ) = f n 1 ( x 1 , x 3 , , x n ) {\displaystyle {\hat {\theta }}_{(2)}=f_{n-1}(x_{1},x_{3},\ldots ,x_{n})}
{\displaystyle \vdots }
θ ^ ( n ) = f n 1 ( x 1 , x 2 , , x n 1 ) {\displaystyle {\hat {\theta }}_{(n)}=f_{n-1}(x_{1},x_{2},\ldots ,x_{n-1})}

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points:

θ ^ ( i ) = f n 1 ( x 1 , , x i 1 , x i + 1 , , x n ) i = 1 , , n . {\displaystyle {\hat {\theta }}_{(i)}=f_{n-1}(x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})\quad \quad i=1,\dots ,n.}

Then we define their average:

θ ^ j a c k = 1 n i = 1 n θ ^ ( i ) {\displaystyle {\hat {\theta }}_{\mathrm {jack} }={\frac {1}{n}}\sum _{i=1}^{n}{\hat {\theta }}_{(i)}}

The jackknife estimate of the bias of θ ^ {\displaystyle {\hat {\theta }}} is given by:

bias ^ ( θ ^ ) j a c k = ( n 1 ) ( θ ^ j a c k θ ^ ) {\displaystyle {\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=(n-1)({\hat {\theta }}_{\mathrm {jack} }-{\hat {\theta }})}

and the resulting bias-corrected jackknife estimate of θ {\displaystyle \theta } is given by:

θ ^ jack = θ ^ bias ^ ( θ ^ ) j a c k = n θ ^ ( n 1 ) θ ^ j a c k . {\displaystyle {\hat {\theta }}_{\text{jack}}^{*}={\hat {\theta }}-{\widehat {\text{bias}}}({\hat {\theta }})_{\mathrm {jack} }=n{\hat {\theta }}-(n-1){\hat {\theta }}_{\mathrm {jack} }.}

This removes the bias in the special case that the bias is O ( n 1 ) {\displaystyle O(n^{-1})} and reduces it to O ( n 2 ) {\displaystyle O(n^{-2})} in other cases.[2]

Estimating the variance of an estimator

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.

See also


  • Berger, Y.G. (2007). "A jackknife variance estimator for unistage stratified samples with unequal probabilities". Biometrika. 94 (4): 953–964. doi:10.1093/biomet/asm072.
  • Berger, Y.G.; Rao, J.N.K. (2006). "Adjusted jackknife for imputation under unequal probability sampling without replacement". Journal of the Royal Statistical Society, Series B. 68 (3): 531–547. doi:10.1111/j.1467-9868.2006.00555.x.
  • Berger, Y.G.; Skinner, C.J. (2005). "A jackknife variance estimator for unequal probability sampling". Journal of the Royal Statistical Society, Series B. 67 (1): 79–89. doi:10.1111/j.1467-9868.2005.00489.x.
  • Jiang, J.; Lahiri, P.; Wan, S-M. (2002). "A unified jackknife theory for empirical best prediction with M-estimation". The Annals of Statistics. 30 (6): 1782–810. doi:10.1214/aos/1043351257.
  • Jones, H.L. (1974). "Jackknife estimation of functions of stratum means". Biometrika. 61 (2): 343–348. doi:10.2307/2334363. JSTOR 2334363.
  • Kish, L.; Frankel, M.R. (1974). "Inference from complex samples". Journal of the Royal Statistical Society, Series B. 36 (1): 1–37.
  • Krewski, D.; Rao, J.N.K. (1981). "Inference from stratified samples: properties of the linearization, jackknife and balanced repeated replication methods". The Annals of Statistics. 9 (5): 1010–1019. doi:10.1214/aos/1176345580.
  • Quenouille, M.H. (1956). "Notes on bias in estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353.
  • Rao, J.N.K.; Shao, J. (1992). "Jackknife variance estimation with survey data under hot deck imputation". Biometrika. 79 (4): 811–822. doi:10.1093/biomet/79.4.811.
  • Rao, J.N.K.; Wu, C.F.J.; Yue, K. (1992). "Some recent work on resampling methods for complex surveys". Survey Methodology. 18 (2): 209–217.
  • Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag, Inc.
  • Tukey, J.W. (1958). "Bias and confidence in not-quite large samples (abstract)". The Annals of Mathematical Statistics. 29 (2): 614.
  • Wu, C.F.J. (1986). "Jackknife, Bootstrap and other resampling methods in regression analysis". The Annals of Statistics. 14 (4): 1261–1295. doi:10.1214/aos/1176350142.


  1. ^ Efron 1982, p. 2.
  2. ^ a b c Cameron & Trivedi 2005, p. 375.
  3. ^ Efron 1982, p. 14.
  4. ^ McIntosh, Avery I. "The Jackknife Estimation Method" (PDF). Boston University. Avery I. McIntosh. Archived from the original (PDF) on 2016-05-14. Retrieved 2016-04-30.: p. 3.


  • Cameron, Adrian; Trivedi, Pravin K. (2005). Microeconometrics : methods and applications. Cambridge New York: Cambridge University Press. ISBN 9780521848053.
  • Efron, Bradley; Stein, Charles (May 1981). "The Jackknife Estimate of Variance". The Annals of Statistics. 9 (3): 586–596. doi:10.1214/aos/1176345462. JSTOR 2240822.
  • Efron, Bradley (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 9781611970319.
  • Quenouille, Maurice H. (September 1949). "Problems in Plane Sampling". The Annals of Mathematical Statistics. 20 (3): 355–375. doi:10.1214/aoms/1177729989. JSTOR 2236533.
  • Quenouille, Maurice H. (1956). "Notes on Bias in Estimation". Biometrika. 43 (3–4): 353–360. doi:10.1093/biomet/43.3-4.353. JSTOR 2332914.
  • Tukey, John W. (1958). "Bias and confidence in not quite large samples (abstract)". The Annals of Mathematical Statistics. 29 (2): 614. doi:10.1214/aoms/1177706647.
  • v
  • t
  • e
Continuous data
Count data
Summary tables
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Specific tests
Time domain
Frequency domain
Survival function
Hazard function
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject