Math 207 – Solutions to
Assignment 1
1.53
a.
The
distribution of state popular vote for Bush is skewed to the right (long right
tail). There are three high vote values that could be considered outliers—these
states are Florida, Texas, and California (3 of the top 4 most populous
states). The distribution of state percent vote for Bush is approximately
symmetric and mound shaped. There are no outliers among the percent votes.
c.
The
distribution of state popular vote is naturally skewed right because of the nature
of state populations: there are many mid-sized state populations, yet there are
a few very populous states—hence the long right tail. Determining vote percents
takes away the impact of state population. The percent votes pile up in the
center and taper off equally on both sides (since there isn’t a natural
boundary on either side).
1.54
a.
The
distribution of heights of biostatistics students is bimodal (notice the large
dip in histogram height at 68 inches) and approximately symmetric.
b.
The bimodal nature
of the distribution of heights is somewhat unusual.
c.
Most likely,
the two peaks in the distribution represent female heights (lower peak) and
male heights (higher peak), as females, on average, are shorter than males (and
the class probably consists of both men and women). Additional note: The distribution of heights of female adults is
typically symmetric and mound-shaped (i.e.,
follows an approximate normal distributions) and the same holds, typically, for
the distribution of male heights. The histogram shown in this problem seems to
be a combination of these two normal distributions, where the female heights
are centered at a lower value (although the variation of both groups looks
roughly the same).
1.60
a.
The
sizes/volumes of the items in the graph increase as the calories increase, but
they don’t seem to increase in proportion
to the increase in calories. For example, there is only a 5-calorie difference
between a can of Coke and a bottle of Budweiser, yet the two pictures show a
much larger difference in terms of volume.
b.
The bar
chart is shown below. [Note: I
created this graph in Minitab (to make it easier to post the solutions), but
you were asked to do the graph by hand.] This chart more accurately displays
the calories in America’s favorite foods, as the heights of the bars are the
only things to change and they change exactly with the number of calories.

1.61
The distributions of both sets of
algebra exam scores are slightly skewed left (slightly longer left tails). The
distribution of scores for students with laptops has a slightly higher
center/middle value and slightly less spread. Perhaps more importantly, a much
higher percentage of students with laptops scored 80 or better on the final
exam. The students weren’t randomly assigned to treatments (laptop or no
laptop), though, so we cannot say if laptop use caused better grades.
2.55
a.
Generic Brand numbers (ordered):
24 25 25 25 26 26 26 26 26 27 27 28 28 28
The median
is in position (14 + 1)(.5) = 7.5, which is between the two middle 26 values.
Hence, the median number is 26. The first and third quartiles are in positions
(14 + 1)(.25) = 3.75 and (14 + 1)(.75) = 11.25, respectively. Hence, the
quartiles are 25 (three-quarters of the way between the third 25 value and the
fourth 25 value) and 27.25 (one-quarter of the way between 27 and 28),
respectively.
For the
generic brand, the median, quartiles, and IQR number of raisins are 26, 25,
27.25, and 2.25, respectively. (Note:
The values are counts, so no units need be included.)
Sunmaid numbers
(ordered): 22 24 24 24 24 25 25 27
28 28 28 28 29 30
The median
is in position (14 + 1)(.5) = 7.5, which is between the values 25 and 27.
Hence, the median number is 26. The first and third quartiles are in positions
(14 + 1)(.25) = 3.75 and (14 + 1)(.75) = 11.25, respectively. Hence, the
quartiles are 24 (three-quarters of the way between the third 24 value and the
fourth 24 value) and 28 (one-quarter of the way between the third 28 value and
the fourth 28 value), respectively.
For the Sunmaid brand, the median, quartiles, and IQR number of
raisins are 26, 24, 28, and 4, respectively. (Note: The values are counts, so no units need be included.)
b.
The boxplots are shown below. Note: I created this graph in Minitab (to make it easier to post
the solutions), but you were asked to do the graph by hand.

c. The stem-and-leaf plots are shown below.
|
Generic Brand, Number of Raisins (leaf unit = 0.1) |
|
Sunmaid Brand, Number of Raisins (leaf unit = 0.1) |
||
|
|
|
22 |
0 |
|
|
|
|
|
23 |
|
|
24 |
0 |
|
24 |
0 0 0 0 |
|
25 |
0 0 0 |
|
25 |
0 0 |
|
26 |
0 0 0 0
0 |
|
26 |
|
|
27 |
0 0 |
|
27 |
0 |
|
28 |
0 0 0 |
|
28 |
0 0 0 0 |
|
|
|
|
29 |
0 |
|
|
|
|
30 |
0 |
The stem-and-leaf plots and boxplots
show the same distribution shapes (although the stem plots give more detail, as
they report all the individual values). The distributions of number of raisins
for both brands are approximately symmetric, centered around 26, yet the
distribution for Sunmaid raisins has more variability
(more spread).
d.
As mentioned
in part c, the median number of
raisins in a 0.5-oz box is the same for both brands, but there is more
variability in the number of raisins for the Sunmaid
brand. If each of the boxes weighs the same, yet there’s more variation in the
number of raisins per box made by Sunmaid, this means
there’s more variability in the sizes of the individual Sunmaid
raisins (as compared to the generic brand).
2.58
As provided on the homework
assignment, the mean and standard deviation time to recurrence are 8.37 months
and 7.67 months, respectively.
a.
The number
of observations within the given intervals are shown in the table below.
|
|
Interval (values in months) |
Number of Observations |
Percent of Observations |
Percent According to Empirical Rule |
Percent According to Tchebysheff |
|
|
(0.7, 16.04) |
37 |
74% |
68% |
at least 0% |
|
|
(-6.97, 23.71) |
47 |
94% |
95% |
at least 75% |
|
|
(-14.64, 31.38) |
49 |
98% |
99.7% |
at least 88.9% |
b.
The
percentages agree with Tchebysheff’s rule (recall the
rule simply gives a lower bound), but they don’t agree with the Empirical rule
(especially at one standard deviation away from the mean).
c.
The
Empirical Rule only applies to mound-shaped
distributions. The distribution of times to recurrence seems like it would be
skewed right. There is a natural boundary at 0 months (there can’t be a
negative amount of time to recurrence) and some people would fall near that
boundary, yet there are other people (perhaps just a few) who would have very
long periods of time between illnesses. Because the distribution of recurrence
times is probably skewed, not mound-shaped, the Empirical Rule doesn’t apply. (Note: The distribution of times to
reoccurrence is indeed skewed—see the histogram below. You didn’t have to
include the histogram as part of your solution, but it’s a helpful piece of the
overall analysis.)

2.64
a.
The mean and
standard deviation were given as part of the homework assignment: 6.85 hours
and 1.01 hours, respectively.
b.
The z-score for the largest value in the
sample is
. Hence, this particular value of sleep hours
is 1.63 standard deviations above the mean. While this value is not typical, it
is certainly not unusual.
d.
Sample of Sleep Hours (ordered): 5 6 6 6.75
7 7 7 7.25
8 8.50
The
positions of the first quartile, median, and third quartile are 2.75, 5.5, and
8.25, respectively. Hence, the first quartile is 6 hours; the median is 7
hours; the third quartile is
hours.
Then the interquartile range is 7.44 – 6 = 1.44 hours,
and
hours. Then
the lower and upper fences are (6 – 2.16) = 3.84 hours and (7.44 + 2.16) = 9.6
hours, respectively. Because all observations fall within this range, there are
no suspected outliers. This agrees with the
answer to part b—the largest sample value of 8.5 hours is not unusually large
(i.e., not a suspected outlier). The boxplot is shown below.
