Example 3 (Paired
t-test)
A study was
conducted on the effect of a special class designed to improve children’s
verbal skills. Each of 41 children took a verbal skills test twice, both before
and after a 3-week period in the class. From the sample, the “after score –
before score” differences have mean 0.645 and standard deviation 1.527.
Hypotheses
Note
this is not a two-sample problem;
it’s a paired-data problem. We have one sample of 41 children, with two
measurements on each of them. The sample data have already been differenced and
we’re provided with the sample mean and standard deviation of the differences.
Let
be the average verbal improvement of the population of all
children if they took the pre- and post-class test. Then we want to test the
hypotheses ![]()
[Note: In
a paired t-test, the null hypothesis is always
. In
this case, the alternative is one-sided, because the special class was designed
to improve children’s verbal skills.]
Check Conditions of the Test
We do
not know the population standard deviation of differences, so we should use a
paired t-test (not a z-test). Hence, we need to check the normality condition
of the t-test. We aren’t given any information about the distribution of our
sample data. Since we have a large (larger than 40) sample size, we can relax
the normality condition and use the t-test even if the sample data are skewed.
Still, we should find out if the sample-data distribution shows extreme
outliers (or any other odd features), before we provide a final conclusion.
Test Statistic
The test
statistic is
. (That is, our sample average improvement is 2.705 standard errors
above the null-hypothesized value of the population average.)
P-value
We must
use the t-distribution with (41 – 1) = 40 degrees of freedom to find the
P-value. From Table D, the P-value is (since it’s a one-sided test) ![]()
Definition of P-value and
Conclusion
Definition of P-value: If the average verbal
improvement of the population of all children is 0, then there’s only a 0.005
chance of getting our sample average improvement (0.645) or a larger sample average
improvement. Conclusion: Because our
data are so unlikely, we have strong evidence that the population average
improvement is greater than 0. That is, these results are statistically
significant at even the 0.01 significance level.
Practical Significance
These
results are strongly statistically
significant, but are they practically
significant? The 95% t confidence interval for the population mean
improvement is
(0.163, 1.127). There are no units given, since the test is
standardized, so it’s hard to gauge the practical importance, but this range of
possible average improvements seems quite small (especially if the test is out
of say, 20 or more questions). We need to talk to the teachers, but it seems
like this might be a case of statistical significance, yet not practical
significance.
Causality?
Can the
significant improvement be solely attributed to the special class? No, because
the experimental design did not include a control group (so there are many
potential confounding variables, including the possible improvement simply
based on taking the test a second time). This brings us back full-circle to
data collection. How cool!