Math 445 – Assignment 4 Solutions

 

CH8.20 (WA; 5 points)

We want to estimate the proportion of all American young people (ages 6 to 19 years old) who are seriously overweight. Furthermore, we’ll do this estimation with a 99% confidence interval. Because this is a confidence interval for a population proportion, we must be careful with the particular CI we choose (recall, the usual Wald interval has coverage problems and the score interval is much more accurate). But, in this case, the sample size is very large (n = 4722), so the Wald and Score intervals will probably be very close to each other (see the argument on page 388 of the textbook about why this happens for large n). We’ll determine both intervals.

 

Score Interval

For 99% confidence, . Also, for this particular sample, .

Then, using the formula for the score interval, the lower endpoint of the confidence interval is . (Within these calculations, you can that in the very-large n case, the score interval (practically speaking) reduces to the Wald interval.)

 

And the upper endpoint is

 

So the Score confidence interval is (0.137, 0.164).

 

Wald Interval

The Wald 99% confidence interval is

 

Note these intervals are essentially the same, and to two decimal places they are exactly the same.

 

Intepretation:

We are 99% confident that the proportion of all American kids who are seriously overweight is between 0.14 and 0.16. Our confidence is in the method we used to create this interval. That is, if the sampling were (hypothetically) done repeatedly, then 99% of the confidence intervals created would contain the true proportion of seriously overweight kids. (We hope this is one of those times!)

 

 

 

CH8.28 (WT; 10 points)

  1. In this case, the population standard deviation is unknown and it must be estimated with the sample standard deviation. Then technically, the resulting quanity has a t distribution (not a z distribution). But since the sample size is so large (n = 131), there will be very little difference between the t distribution (with 130 df) and the z distribution. Hence, we can simply use the z distribution. Then a 99% confidence interval for the mean weight of backpacks carried by all sixth graders is (12.69 pounds, 14.97 pounds).

 

We are 99% confident that the mean weight of backpacks carried by all sixth graders is between 12.69 pounds and 14.97 pounds. Our confidence is in the method we used to create this interval. That is, if the sampling were (hypothetically) done repeatedly, then 99% of the confidence intervals created would contain the true mean backpack weight of all sixth graders. (We hope this is one of those times!)

 

  1. We know the 95% confidence interval of population mean weight as a percentage of body weight is (13.62, 15.89). Because the sample mean is always at the center of the confidence interval, we know . Also the margin of error is 1.135. Hence, .

 

For a 99% confidence interval, the only thing that changes is the z value (from 1.96 to 2.575). Hence, the 99% confidence interval is (13.26%, 16.25%).

 

  1. Both the 95% and 99% confidence intervals for mean backpack weight as a percentage of body weight for all sixth graders are completely above 10%. That is, we are highly confident that the mean backpack weight as a percentage of body weight is actually greater than the recommended amount.

 

 

CH8.34 (WA; 5 points)

  1. For this sample of ACT scores, the histogram and normality plot are shown below.

 

 

The histogram shows a mostly mound-shaped distribution (with the exception of the interval at 28), and the normality plot doesn’t show any big deviations from normality. Hence, it’s plausible that these data come from a normal distribution. (The t confidence interval has a condition of normality, so it’s important that we check.)

 

  1. The numerical summaries for the sample of ACT scores are shown below.

 

Variable    N    Mean  StDev  Minimum      Q1  Median      Q3  Maximum

ACT score  20  25.050  2.690   19.750  23.313  24.750  27.563   30.000

 

For 95% confidence, . So a 95% confidence interval for the average ACT score of all college freshmen in calculus is (23.79, 26.31).

  1. Note the entire interval determined in part b is above 21. If we can think of our sample as representative of all college freshmen in calculus, then it appears that they have a better average ACT score than the general freshman class.

 

 

CH8.43 (WT; 5 points)

Assume that is a random sample from a distribution. We showed in class that  has a standard normal distribution. Furthermore, we’ve previously shown that has a Chi-squared distribution with (n-1) degrees of freedom. Finally, we know that and are independent random variables (and since is a new observation, it’s also independent of ). Then we know the following quantity has a t distribution with (n-1) degrees of freedom (since it’s the ratio of independent random variables—a N(0,1) r.v. divided by the square root of a Chi-squared r.v. divided by it’s degrees of freedom):

. There is much cancellation in this quantity (the (n-1)s cancel, the sigmas cancel), and it

 

 

reduces to . Hence, T (on which the prediction interval is based) has a t distribution with (n-1) degrees of freedom.

 

 

CH8.50 (WT; 5 points)

  1. The histogram of my 1000 bootstrap medians is shown below.

 

 

  1. The standard deviation of the 1000 bootstrap medians is our estimate of the standard error of the sample median (based on samples of size 22). For my 1000 bootstrap medians, the standard deviation is 1.049 hours. If we naively assume that the distribution sampled from is normal, then we can use the t distribution to form a 95% confidence interval for the population median: (7.8 hours, 12.2 hours). This interval is centered around the median, 10 hours, for our sample of study times, and uses the t-value, 2.080, with area 0.025 to the right for a t-distribution with 21 degrees of freedom.

 

  1. From the histogram in part a, it is easy to see that the sampling distribution of the sample median is definitely non-normal. The distribution does take on only a few values, and doesn’t have equal “tails” in both directions. Hence, the interval we created in part b is inappropriate.

 

  1. The bootstrap percentile confidence interval takes as endpoints the 2.5th percentile of the 1000 bootstrap medians and the 97.5th percentile of the 1000 bootstrap medians. For my sample of bootstrap medians, the 95% bootstrap confidence interval is (7.5 hours, 10.25 hours).

 

  1. The histogram of the original sample of student study times is shown below. The distribution is clearly not symmetric. Hence, the mean and the median are not the same. For non-symmetric distributions, the median is often a better measure of “center” or typical value than is the mean (because the mean is affected by extreme values in either tail). In this case, the median study time is a better measure of typical study time.

 

 

 

CH8.76 (WA; 10 points)

  1. By definition of the median, . Also, since the Xs are independent and identically distributed, .

 

  1. By definition of the smallest and largest order statistics and because the Xs are independent and identically distributed, we have

and

 

  1. . Hence, the interval has a confidence level of .

 

  1. Since n = 10, the confidence level is . Then the 99.8% confidence interval for the median based on part c is (28.7 minutes, 42.0 minutes). We can also determine the 99.8% t confidence interval for the mean/median, assuming the population that is sampled from is normal. (For such a small set of data, it’s difficult to determine if the distribution looks normal. The dotplot and normality plot do not show any big deviations from normality, though, so it’s plausible to assume the population is normal.) From Minitab, we have the following numerical summaries of the times for the anesthetic to work:

 

Variable            N   Mean  StDev  Minimum     Q1  Median     Q3  Maximum

Time (in minutes)  10  34.45   4.29    28.70  30.88   34.35  37.73    42.00

 

For 99.8% confidence and 9 degrees of freedom, the corresponding t value is . Then a 99.8% t confidence interval for the mean/median is (28.62 minutes, 40.28 minutes).

 

Important Notes: The t interval is slightly narrower. This is because the t interval makes use of the normality condition (which seems reasonable in this case). If the normality condition isn’t met, though, the non-parametric approach in part c  would be better (since the t interval is no longer a confidence interval for the median).

 

 

CH9.7 (WT; 5 points)

A Type I error occurs if we conclude the power plant is non-compliant with regulations, when, in fact, the plant really is compliant. A Type II error occurs if we do not have evidence of non-compliance, yet the plant really is breaking the regulations. Reasonable arguments can be made that the more serious error is either of these (depending on your personal views). If you think the Type II error is most serious, then you can reconstruct the test accordingly:  and . With this reconstructed test, a Type I error occurs if we conclude the power plant is compliant, when, in fact, the plant really isn’t. Remember we can then “control” this error rate by using a small significance level.

 

 

CH9.25 (WA; 5 points)

Statement of hypotheses

Suppose  is the mean IQ score for all first-graders in this school. We want to test the hypotheses and .

 

Check of conditions

For this problem, we are told the distribution of first-grade IQ scores for this school follows a normal distribution with population standard deviation, . Then we can use a z-test (since we know the distribution of the sample mean is exactly normal).

 

Calculation of the Test Statistic

For this sample of 10 IQ scores, .

 

So our test statistic is . That is, our particular sample average is 3.37 standard errors above the null-hypothesized mean. Note: A z-distribution picture should be included with this solution (the only reason it isn’t is because Word cannot draw it).

 

Calculation of the P-value

This is a two-sided test, so “more extreme” than our test statistic is both above 3.37 and below -3.37 on a standard-normal distribution. Hence, .

 

Interpretation of the Results in the Context of the Problem

Assuming the average IQ score for first-graders at this school is 100, there is only a 0.0008 chance of getting our particular sample average IQ or a more extreme average IQ. This very surprising and provides very strong evidence that the average IQ score for this school is different from the national average. Clearly, these results are statistically significant at the 0.05 significance level (since our p-value is so much smaller than 0.05).

 

But are these results practically significant? A 95% confidence interval for the average IQ score for first graders at this school is . The IQ test is standardized, so the scores include no units. This interval seems, practically speaking, substantially higher than 100. But an IQ-test expert should be consulted to verify that these results are of practical importance.

 

 

CH9.27 (WA; 10 points)

Statement of hypotheses

Suppose  is the population mean weight for all Pepperidge Farm bagels. We want to test the hypotheses  and .

 

Check of conditions

Since the population standard deviation is unknown, we must use a t-test, but this test has the condition that the population from which we sampled follows a normal distribution. For such a small sample size, it’s especially important for us to check this condition. That said, because there are only 6 radon readings, it’s very difficult to tell if they seem to follow a normal distribution (such is the life of a practicing statistician!). Included below are a dotplot and normal-probability plot of the sample radon readings. The dotplot indicates some deviation from normality (two of the observations seem separated from the others), but the normality-plot doesn’t indicate a significant deviation from normality. Hence, we can tentatively proceed with a t-test and feel reasonably good about the conclusions.

 

 

Calculation of the Test Statistic

Numerical summaries provided by Minitab:

 

Variable                  N    Mean  StDev

Bagel Weights (in grams)  6  112.97   4.29

 

From our sample, the test statistic is . That is, our particular sample average is only 0.02 estimated standard errors below the null-hypothesized mean. Even for a t-distribution (with “fatter tails” than the standard normal), this does not seem surprising. Note: A t-distribution picture should be included with this solution (the only reason it isn’t is because Word cannot draw it).

 

Calculation of the P-value

We know our test statistic has a t-distribution with  degrees of freedom. This is a one-sided test, so “more extreme” than our test statistic is only below -0.02 on a t-distribution with 5 df. Hence, . Note that 0.02 is not listed on Table A.5, but from Table A.5, we can say this p-value is much greater than 0.10 (Minitab can give us the exact p-value: 0.49).

 

Interpretation of the Results in the Context of the Problem

Assuming the average weight of all Pepperidge Farm bagels is 113 grams, there is a 49% chance of getting our particular sample average weight or a more extreme average weight. This is not at all surprising and provides no evidence that the average weight is smaller than 113 grams. The results are not statistically significant at any reasonable significant level. (Since the results are not statistically significant, we don’t need to explore the practical importance.)

 

Part b

Now suppose we know the population of bagel weights follows a normal distribution with . Then we can use a z-test, not a t-test. We want to perform a one-sided test:  and . We want to test at the 0.05 significance level. Assuming the true mean bagel weight is 110 grams, what is the probability that our test rejects the null hypothesis, if our test is based on only 6 observations?

 

Note: Normal curve pictures should be included with this solution (the only reason they aren’t is because Word cannot draw them).

 

First we must quantify what it means to “reject ”(recall this depends on the significance level and the alternative hypothesis). We reject only for small values of the sample average (and our significance level is 0.05). Then we “reject ” when our test statistic, z, is less than -1.645.

 

In terms of  we “reject ” when .

Then,  Note: This power, 0.5753, isn’t very high—mainly because there are so few observations in the sample.

 

Part c

Same conditions as in part b, but now we want to determine how large our sample must be in order to bring the power up to 0.95.

 

Note: Normal curve pictures should be included with this solution (the only reason they aren’t is because Word cannot draw them).

 

First we must quantify what it means to “reject ”(recall this depends on the significance level and the alternative hypothesis). We reject only for small values of the sample average (and our significance level is 0.05). Then we “reject ” when our test statistic, z, is less than -1.645.

 

In terms of  we “reject ” when .

From this we can determine the power:

We want this power to be 0.95. From the standard normal table (Table A.3), we know .

 

Hence, we must find n such that .

 

So they only need to weigh a sample of 20 bagels in order to have a power of 0.95 to detect a true mean weight of 110 grams. (So they need 14 additional bagels in order to raise the power from 0.58 to 0.95.)

 

 

 

CH9.32 (WT; 10 points)

Statement of hypotheses

Suppose  is the population mean reading for all radon detectors of this type. We want to test the hypotheses  and .

Check of conditions

Since the population standard deviation is unknown, we must use a t-test, but this test has the condition that the population from which we sampled follows a normal distribution. For such a small sample size, it’s especially important for us to check this condition. That said, because there are only 12 radon readings, it’s difficult to tell if they seem to follow a normal distribution (such is the life of a practicing statistician!). Included below are a dotplot and normal-probability plot of the sample radon readings. Neither indicates a deviation from normality. Hence, we can proceed with a t-test and feel reasonably good about the conclusions.

 

Calculation of the Test Statistic

Numerical summaries provided by Minitab:

 

Variable                N   Mean  StDev

Radon Reading (pCi/L)  12  98.37   6.11

 

From our sample, the test statistic is . That is, our particular sample average is 0.924 estimated standard errors below the null-hypothesized mean. Even for a t-distribution (with “fatter tails” than the standard normal), this does not seem surprising. Note: A t-distribution picture should be included with this solution (the only reason it isn’t is because Word cannot draw it).

 

Calculation of the P-value

We know our test statistic has a t-distribution with  degrees of freedom. This is a two-sided test, so “more extreme” than our test statistic is both below -0.924 and above 0.924 on a t-distribution with 11 df. Hence, . Note that 0.924 is not listed on Table A.5, but from Table A.5, we can say this p-value is greater than 2(0.10)=0.20 (Minitab can give us the exact p-value: 2(0.187)=0.374).

 

Interpretation of the Results in the Context of the Problem

Assuming the average radon reading for all detectors of this type is 100 pCi/L, there is a 37.4% chance of getting our particular sample average reading or a more extreme average reading. This is not at all surprising and provides no evidence that the average reading is different from 100 pCi/L. The results are not statistically significant at any reasonable significant level. (Since the results are not statistically significant, we don’t need to explore the practical importance.)

 

Part b

Now suppose we know the population of radon readings follows a normal distribution with . Then we can use a z-test, not a t-test. Furthermore, suppose we want to perform a one-sided test:  and . And we want the Type II error rate to be only 0.10 when the true mean radon reading is 95 pCi/L (that is, we want a high power of 0.90 to detect this particular deviation from the null hypothesis).

 

Note: Normal curve pictures should be included with this solution (the only reason they aren’t is because Word cannot draw them).

 

First we must quantify what it means to “fail to reject ”(recall this depends on the significance level and the alternative hypothesis). We reject only for small values of the sample average (and our significance level is 0.05). Then we “fail to reject ” when our test statistic, z, is larger than -1.645.

 

In terms of  we “fail to reject ” when .

From this we can determine the Type II error rate, :

We want this error rate to be only 0.10. From the standard normal table (Table A.3), we know .

 

Hence, we must find n such that .

 

So they only need to test 20 radon detectors in order to have a power of 0.9 to detect a true mean reading of 95 pCi/L.