Math 445—Bootstrap Example
Suppose we are
interested in the annual incomes of adults in the Fox Cities. Because we don’t
have the time and money to take an accurate census, we simply take a random
sample of 15 adults and record their incomes (in thousands of dollars). For
this sample, the income distribution and numerical summaries are shown below.

Variable
N Mean StDev
Minimum Q1 Median
Q3 Maximum
Income (in 1000s $) 15 238.6
351.3 0.0 25.0
82.0 400.0 1200.0
Inference about the Mean
Suppose we want
to estimate the average salary for all adults in the Fox Cities. We can use our
sample to create a confidence interval for the population mean. Since the
population standard deviation is unknown, we need to use a one-sample t
confidence interval. Recall, though, that this t-interval procedure includes
the condition that the population being sampled is normal. It’s clear from the
histogram of our sample data that the distribution of annual incomes is, in
fact, not normal but very skewed. Hence,
we shouldn’t use the t interval.
Bootstrap to
the rescue! We can treat this sample of 15 incomes as our population, and
resample (n=15) from it with replacement. The histogram below shows 1000
bootstrap mean values. This histogram gives us an estimate of the sampling
distribution of the sample mean income (based on samples of size 15). Note that
this distribution is not normal. Now our
confidence interval can’t take on the standard form of estimate
multiplier
(standard error), because we don’t know what the multiplier
is (it doesn’t come from a z table or a t table). Hence, we use the 2.5th
percentile and the 97.5th percentile of the bootstrapped means to
serve as our 95% confidence interval (because we have nothing else to go on—we
“pull ourselves up by our bootstraps”).

For this
particular bootstrap simulation the 95% confidence interval is ($96,500,
$426,100). If we had (inappropriately) used the t confidence interval, our 95%
interval would be ($44,000, $433,200)—notice the lower end of the two intervals
is quite different. Also note that both intervals are very wide (perhaps too
wide to be practically useful. When the
bootstrap and the t intervals disagree significantly, this typically means the
parametric conditions of the t methods are not met. This then also means we
cannot trust the confidence level of the t interval.
Inference about the Median
It’s clear that
the distribution of annual incomes is skewed toward the high values (this is a
typical shape of salary distributions). Hence, the median, rather than the
mean, is a better measure of typical salary. (This is a simple observation that
is often overlooked in analyses.)
Suppose we want
to estimate the median annual income of all Fox Cities adults. Now we have no theory at all to guide us. We
don’t have a central limit theorem for medians. Hence, the bootstrap (or some
other nonparametric approach) is our only option. Included below is a
histogram of 1000 bootstrapped median incomes. Notice the estimated sampling
distribution of the sample median is not at all normal.

We can use the
standard deviation of these 1000 bootstrapped medians to estimate the standard
error of the sample median: $66,800 (this gives us an idea of the precision of
our estimator). Using the bootstrap percentile method, we can create a 95%
confidence interval for the median annual income of all Fox Cities adults:
($25,000, $400,000). The bootstrap method allows us to determine a confidence
interval when we had no theory to guide us. Unfortunately this interval is
quite wide (perhaps too wide to be practically helpful?).
Inference about the First Quartile
Suppose now we’re
most interested in the first quartile (25th percentile) of the
annual salaries of all Fox Cities adults (or any of the percentiles, for that
matter). Again, we have no theory to guide us. We have no idea what the
sampling distribution is of the sample first quartile (there’s no central limit
theorem for sample first quartiles).
Included below
is a histogram of 1000 bootstrapped first quartile incomes. This histogram
gives us an estimate of the sampling distribution of the sample first quartile.
We can use the standard deviation of these 1000 bootstrapped first quartiles to
estimate the standard error of the sample first quartile: $17,200 (this gives
us a sense of the precision of our estimator). We can also use these values to
create a bootstrap (percentile) 95% confidence interval for the first quartile
income of all Fox Cities adults: ($12,000, $82,000).
