Elementary Statistics—Inference for a Population
Mean
(Out of our bubble! We no
longer assume we know the population standard deviation.)
Setting
Suppose
we have a random sample of size n from a normal population with unknown mean,
, and unknown standard deviation. (Note: Since the population
standard deviation is unknown, we must use the t-distribution, not the
z-distribution, for inference.)
Confidence Interval
A level
C confidence interval for
is

- The
value comes from
the T-distribution (Table D)
with (n – 1) degrees of freedom. It’s the value such that there’s area C
between -
and
. (Now you get to use “magical” Table D to find these
values—no reverse lookup.)
- Recall s is simply our
notation for the sample standard deviation (as defined in Section 1.2).
- Per usual, our confidence is in the method we use,
not in our one particular interval.
- The quantity
is often called
the standard error (just another
name for standard deviation) of the sample mean.
- The quantity
is still called
the margin of error.
Significance Test
Suppose
we want to test
. To test this hypothesis, first calculate the test statistic: 
[Note this is simply a standardized value. But since we
estimated the population standard deviation with the sample standard deviation,
we must look up the P-value in the T-table, not the z-table.]
Then
determine the P-value using the
T-distribution with (n – 1) degrees
of freedom. Recall that the P-value
depends on the direction of the alternative hypothesis (e.g., if the
alternative hypothesis is two-sided, then you need to double the P-value).
Finally,
define the P-value in the words of the
problem and provide a conclusion (which might depend on a given value of
significance,
). If you find statistical significance, then it’s a good
idea to create a confidence interval to assess the practical significance.
Important Notes:
- The t-inference procedures
require that the population being sampled follows a normal distribution.
We can check this condition by looking at a graph of the sample data.
Also, as n gets larger, we can relax the condition of normality of the
population (since the sampling distribution of
is more normal—by the CLT—and since s gets closer to
). Here is our
rule of thumb for checking the normality condition of the t-test
(depending on the sample size):
- If
, then we must be very careful and only use the
t-procedures if the sample-data distribution is close to normal (which
indicates the population distribution is close to normal);
- If
, then we have more “wiggle room” and we can use the
t-procedures even if our sample-data distribution is somewhat skewed (but
we shouldn’t use the procedures if the sample-data distribution is strongly
skewed or has extreme outliers);
- If
, then we have lots of “wiggle room” and we can use the
t-procedures even if our sample-data distribution is strongly skewed (but
we shouldn’t use the procedures if there are extreme outliers).
- General note: Regardless of the sample size, if the
sample-data distribution looks fairly normal, then the normality
condition on the population has been met.
- For a given set of data, if
it’s inappropriate (based on the above rule-of-thumb) to use the
t-procedures, then there are other statistical options (e.g.,
non-parametric tests, bootstrap procedures); we just don’t have time to
discuss these procedures this term (but you can take Math 217: Applied
Statistical Methods if you want to learn more!).