Math 207—Solution to Assignment 2
1.33
a.
The
distribution of ages of death for people in general is typically skewed to the
left. A small percentage of people die early, but most live past 60 years old. Since
it’s very unusual to live to be 100, there is a natural boundary for the ages
of death (so there will be a clump of age values high on the graph and a longer
tail low on the graph).
The distribution of ages of death for the
presidents may not be skewed left, though. In order to be elected president, a
person must be at least 35 years old and is typically even older than that.
Hence, the long left tail present in the distribution of ages of death for the
general population will not exist for the distribution of ages of death for the
presidents. Therefore, the distribution of ages of death for the presidents may
be symmetric or even skewed to the right.
b.
The
stem-and-leaf plot of ages of death is shown below (note this isn’t the exact
plot Minitab produced—I deleted the left column that cumulatively counts the
observations). The shape of the distribution of ages of death of US presidents
is somewhat symmetric (although slightly skewed right). This is not surprising
based on my explanation above.
Stem-and-Leaf
Display: Ages of Death of US Presidents
N = 38, Leaf Unit = 1.0
4 69
5 3
5 6678
6 003344
6 567778
7 0111234
7 7889
8 013
8 58
9 003
c.
The five
youngest presidents at the time of death were Kennedy, Garfield, Polk, Lincoln,
and Arthur. The common trait of three of these presidents (Lincoln, Garfield,
and Kennedy) is that they were assassinated.
1.49
a.
Before
looking at the time series plot, I would guess there’s been improvement (i.e., decrease) in winning times over
the years (since there’s naturally been improvement in training, breeding,
etc.).The time plot of winnings times is shown below. Interestingly, there is
no downward trend show in the graph, and there is a lot of year-to-year
variation in the winning times. It appears that year is not an important factor
in describing winning time. [The data in
this new edition of the textbook goes to 2007, whereas the data I included for
the assignment only go to 2004. My graphics are based on the older set of data.
If you noticed this problem and augmented my data with the times from
2005-2007, good for you! Then your graphs will look slightly different from
mine.]

b.
Because we
now know time (i.e., year) doesn’t
help predict winning race time, we can look at the winning times in a
one-variable graph. Winning time is a quantitative variable, so any of the
following graphs would be appropriate to show the distribution. The
distribution of winning race times is approximately symmetric around 122
seconds. There is one very fast time (119.2 seconds) and a two especially slow
times (125 seconds), but they don’t seem to be highly unusual (they still fit
with the overall pattern in the rest of the data).
Stem-and-Leaf
Display: Winning Times (in seconds) for Kentucky Derby Races (1950-2004)
N = 55, Leaf Unit = 0.10
119 2
119 9
120 0123
120
121 00111113334444
121
122 000000111112222222344
122
123 000122223
123
124 000
124
125 00


1.65
Because car color is a categorical variable, it would be appropriate to
use either a pie chart or a bar chart as a graphical display. Both types of
graphs are shown below. (Note: To create a bar chart in Minitab from a summary
table, choose “values from a table” from the dropdown “Bars represent” menu in
the Bar Chart dialog box.)


Silver is the most popular car color, with gray, blue, and black
following closely as the next popular colors. The least-popular color is
yellow/gold (perhaps too bright and showy?). Another data clarification: These graphs are based on the data from the
new version of the textbook. But the correct version of this data file wasn’t
on the share drive until Wednesday. Hence, your graphs might look different
from mine.
2.47
a.
The
five-number summary (from Minitab) for the mercury concentrations in the
dolphin livers is shown below. The units on all the numbers are
micrograms/gram.
Descriptive
Statistics: Mercury Concentration (micrograms/gram)
Variable Minimum Q1 Median
Q3 Maximum
Mercury
(mcg/g) 1.70
130.5 246.5 317.5 485.0
b.
The boxplot of the mercury concentrations is shown below. (By
default, Minitab creates boxplots vertically, but I
transposed the graph as I think it’s easier to read. Either way, vertical or
horizontal, is okay.)

c.
According to
our suspected outlier criterion (i.e.,
values lying more than 1.5
IQR below the first quartile or above the
third quartile), there are no outliers. (If there were outliers, they would be
denoted with an asterisk in the boxplot.) That said,
there are four unusually small mercury concentrations, which can be seen in the
dotplot below. Because the overall spread of the
distribution of concentrations is so large and because there are four unusual
observations (as opposed to one or two), these observations were not flagged as
outliers.

d.
Knowing the
first four dolphins were less than 3 years old explains why their mercury
concentrations were so much smaller than the rest of the dolphins—they had
fewer years to accumulate mercury in their livers.
3.31
a. In this experiment
there is one qualitative/categorical variable: Kiln Site (Llanederyn,
Island Thorns, or Ashley Rails). Additionally there is one quantitative
variable: aluminum-oxide level (within a pottery piece)—unfortunately the
textbook doesn’t provide units of measurement for the aluminum oxide, so the
numbers are simply “floating in space” with no context.
b. The entire
distribution of aluminum-oxide levels for the Llanederyn
site is fully below the distributions from the other sites. Hence, the Llanederyn site produced pottery with lower levels of
aluminum oxide. The variation in levels—as measured by the interquartile
range—is also smallest for the Llanederyn site. The
distribution of aluminum-oxide levels for the two other sites (Island Thorns
and Ashley Rails) are fairly similar—each with a median around 18 and roughly
the same variability in measurements (these two distributions, though, seem to
be skewed in opposite directions).
4.14
Four equally
qualified runners, John (J), Bill (B), Ed (E), and Dave (D), run a 100-meter
sprint, and the order of finish is recorded. In this case, the order of finish does matter.
a.
S = {JBED,
JBDE, JEBD, JEDB, JDBE, JDEB, BJED, BJDE, BEJD, BEDJ, BDJE, BDEJ, EJBD, EJDB,
EBJD, EBDJ, EDJB, EDBJ, DJBE, DJEB, DBJE, DBEJ, DEJB, EDBJ}. Hence, there are 24 simple events in the
sample space.
Note: the problem simply asks for the number of simple events, and does
not explicitly ask you to write out the sample space. You can use counting
rules to obtain the number of simple events:
(i.e., there are 24 permutations of 4
people).
b.
If the
runners are equally qualified, then all the points in the sample space should
be equally likely (i.e., have probability
).
c.
There are
six ways for Dave to win the race (this can be verified by looking at the
sample space or via a counting-method calculation:
), so the probability is
.
d.
There are
two ways for Dave to win and John to place second (this can be verified by
looking at the sample space or through calculation:
), so the probability is
.
e.
There are
six ways for Ed to finish last (this can be verified by looking at the sample
space or through calculation:
),so the probability is
.
Additional
Problem
a. For each additional square foot of living
area in a house, the predicted
selling price increases by $196.2. Or,
perhaps more meaningfully, for each additional 100 square feet of living area
in a house, the predicted selling price increases by $19,620.
b. The regression line explains 85.4% of the variation in house selling price.
This is a fairly high level of explained variation—the regression line fits the
data pretty well.
c. Even though the regression line fits the data
fairly well, there is a slight pattern of curvature in the residuals. Hence, a
curve would be a better summary for the relationship in these data (rather than
a straight line)—in fact, when a quadratic regression is fit,
increases
from 85.4% to 89.4%.