7.8
a. If these are responses from parents with ADHD
children, they will probably pile up more on the high end of the scale and
trail off to the low end. That is, they will most likely be skewed to the low
values (long tail on the low end).
b. The t
procedures assume the population being sampled from is normal, but they are
robust to deviations from this assumption. In this case, the sample size is
very large (certainly larger than 40), so we can use the t confidence interval even if the population is strongly skewed, as
long as there are not extreme outliers (which there probably aren’t, since the
variable only takes the values 0 – 5, not really allowing for outlying values).
c.
Because the
sample size is 282, there are 281 degrees of freedom for the t distribution. Using Table D (and 100
degrees of freedom, which is the closest thing to 281),
for 99% confidence.
Then the 99% confidence interval for the population mean is
(2.06, 2.38). Hence, we are 99% confident the mean response
to this question from all parents of ADHD children is between 2.06 and 2.38.
Our confidence is in the method we use to create the interval. If the sampling
were (hypothetically) done repeatedly, then 99% of the intervals created would
contain the true population mean response.
d. Since
this isn’t a random sample of all parents with ADHD children, we probably
shouldn’t generalize to this population. Instead, we can generalize to the
population of parents in
7.12
The collected data are paired. That is, we have last month and this month
sales for 40 stores. Hence, we are really interested in the differences, and
using the paired t-test reduces the
problem to a one-sample problem about the differences.
a.
Let
be mean difference in
sales for all stores. Then we want to test the hypotheses
![]()
(The question asks whether average sales for all stores are different from last month. Hence, the alternative is two-sided.)
b. First we should check the normality assumption. The sample size is large (n = 40), so according to our rough rule, we can use the t-test even if the sample data distribution is strongly skewed. That said, we should verify that there are no extreme outliers.
The test statistic is
. To find the p-value
for the test, we need to use the t-distribution
with 40 – 1 = 39 degrees of freedom (there is no entry in Table D for 39
degrees of freedom, so we can use the closest degrees of freedom – 40). From
Table D, we know p-value =
is between 0.05 and
0.10 (since the area to the right of 2.003 is between 0.025 and 0.05). Note: Although a t-curve picture isn’t included with this solution, I encourage you
to draw one.
Assuming the mean difference in sales for all stores is 0, there is between a 5% and 10% chance of observing our sample mean difference or a more extreme mean difference. While our data are somewhat unlikely, they are not unlikely enough for us to doubt that the mean difference is 0 at the 0.05 significance level.
c. If we had found statistically significant results above, this only would have told us that average sales in all stores is different from 0. This significance test is about the population mean, not about the individual values. Hence, we would not be able to make a conclusion about the performance of the individual stores (e.g., the average difference in sales could be positive even while individual stores have losses).
7.31
Let
be the average difference
(after – before) in vitamin C content in all batches of Haitian gruel. Then,
using a paired t-test, we want to
test
![]()
We need to create the differences for the sample:
-53, -52, -57, -52, -61
Then the sample mean of the differences is
, and the sample standard deviation of the differences
is
.
It is difficult to check the normality assumption with so few
observations, but since the sample size is so small we need to think carefully
whether a t-test should be used. A
quick dotplot of the 5 sample points indicates there
is a right skew. Hence, based on our rough rule, we really should not use the t-test (a nonparametric test, which we
didn’t cover in class, could be used, though). We can carry out the test for
practice, but we must realize that the normality assumption may be violated.
The test statistic is
. To find the p-value for the test, we need to use the t-distribution with 5 – 1 = 4 degrees of freedom. From Table D, we
know p-value =
is less than 0.0005.
In fact, since -31.238 is so much less than -8.610, we know the p-value is much, much less than 0.0005,
and we can think of the p-value as
essentially 0. Note: Although a t-curve picture isn’t included with this
solution, I encourage you to draw one.
Assuming the mean difference (after – before) is 0, there essentially no chance of observing our sample mean difference or a smaller mean difference. Hence, we have very strong evidence (at any significance level) that the average difference in vitamin C is negative (i.e., the gruel loses vitamin C, on average, from cooking).
Even though we weren’t sure the normality assumption was met for this test, the data very strongly indicate a loss of vitamin C on average (even without doing the test), so we can feel confident in our conclusion.
7.33
Let
be the mean increase
in charges for the population of credit card holders. This is a matched-pairs
design, and we can apply the one-sample t-procedures
on the differences. Since the sample size is so large, we may use the t-procedures regardless of the shape of
the sample distribution (unless there are extreme outliers).
a.
The hypothesis are
. Then the test statistic is
. To find the p-value for the test, we need to use the
t-distribution with 500 – 1 = 499
degrees of freedom (there is no entry in Table D for 499 degrees of freedom, so
we can use the closest value – 100). From Table D, we know p-value =
is less than 0.0005.
In fact, since 47.318 is so much more than 3.390, we
know the p-value is much, much less
than 0.0005, and we can think of the p-value
as essentially 0. Note: Although a t-curve picture isn’t included with this
solution, I encourage you to draw one.
Assuming the mean increase for all credit card holders is $0, the chance of observing our sample mean difference or a larger mean difference is essentially 0. Our data are obviously extremely unlikely, and therefore we have strong evidence that the mean increase is actually larger than $0. The results are definitely statistically significant at the 1% level, since our p-value is much less than 0.01.
b.
Using 100 degrees of freedom (since 499 degrees of
freedom isn’t included in Table D),
Then the 95%
confidence interval is
($541.31, $588.69). We are 95% confident the mean increase in
charges for all credit card holders is between $541.31 and $588.69. Our
confidence is in the method we use to create the interval. If the sampling were
(hypothetically) done repeatedly, then 95% of the confidence interval created
would contain the true mean increase.
c.
Since the sample size is so large, s will be
a good estimate of
(regardless of the shape of the original population), and the
Central Limit Theorem says the distribution of
will be approximately normal (regardless of the shape of the
original distribution). The only potential problem is if extreme outliers are
present, but these are prevented by credit limits. Therefore, we can use the t-procedures.
d. There needs to be a control group. They could randomly select customers who will get the special offer, and randomly select customers that will not get the offer (the control group). Then they can compare the mean increases for these two groups.
7.34
It is known that phosphate levels vary normally, so our assumption of a normal population is met. Therefore, the t-procedures may be used.
a.
For this set of 6 data points,
and
. Then the standard error is ![]()
b.
Using 5 degrees of freedom,
Then the 90%
confidence interval is
(4.82, 5.91). We are 90% confident the patient’s mean
phosphate level is between 4.82 and 5.91 mg/dl. Our confidence is in the method
we use to create the interval. If the sampling were (hypothetically) done
repeatedly, then 90% of the intervals created would contain the patient’s true
mean phosphate level.
7.35
Let
be the patient’s true
mean phosphate level.
![]()
Then the test statistic is
. To find the p-value
for the test, we need to use the t-distribution
with 6 – 1 = 5 degrees of freedom. From Table D, we know p-value =
is between 0.025 and
0.05. Note: Although a t-curve picture isn’t included with this
solution, I encourage you to draw one.
Assuming the patient’s mean phosphate level is 4.8 mg/dl, the chance of observing our sample mean or a larger sample mean is between 0.025 and 0.05. This provides fairly strong evidence that the patient has a higher than normal average phosphate level. The results are statistically significant at the 5% level, but not at the 1% level.
7.36
a.
Using 27 – 1 = 26 degrees of freedom,
Then the 95%
confidence interval is
(111.2, 118.6). We are 95% confident that the mean blood pressure
is between 111.2 and 118.6. Our confidence is in the method we use to create
the interval. If the sampling were (hypothetically) done repeatedly, then 95%
of the intervals created would contain the true mean blood pressure.
b. The one-sample t-procedures require a random sample from a normal population. This is not a random sample of men, so we must think carefully about what population they represent. Because the sample size is 27, we may use the t-procedures except in the presence of strong skewness or outliers in the sample data. Therefore, we should graph the data before carrying out the procedures.