Math 207—Solution to Assignment 2

 

1.33

a.      The distribution of ages of death for people in general is typically skewed to the left. A small percentage of people die early, but most live past 60 years old. Since it’s very unusual to live to be 100, there is a natural boundary for the ages of death (so there will be a clump of age values high on the graph and a longer tail low on the graph).

 

The distribution of ages of death for the presidents may not be skewed left, though. In order to be elected president, a person must be at least 35 years old and is typically even older than that. Hence, the long left tail present in the distribution of ages of death for the general population will not exist for the distribution of ages of death for the presidents. Therefore, the distribution of ages of death for the presidents may be symmetric or even skewed to the right.

 

b.      The stem-and-leaf plot of ages of death is shown below (note this isn’t the exact plot Minitab produced—I deleted the left column that cumulatively counts the observations). The shape of the distribution of ages of death of US presidents is somewhat symmetric (although slightly skewed right). This is not surprising based on my explanation above.

 

Stem-and-Leaf Display: Ages of Death of US Presidents

N  = 38, Leaf Unit = 1.0

 

       4  69

       5  3

       5  6678

       6  003344

       6  567778

       7  0111234

       7  7889

       8  013

       8  58

       9  003

 

c.       The five youngest presidents at the time of death were Kennedy, Garfield, Polk, Lincoln, and Arthur. The common trait of three of these presidents (Lincoln, Garfield, and Kennedy) is that they were assassinated.

 

1.49

a.      Before looking at the time series plot, I would guess there’s been improvement (i.e., decrease) in winning times over the years (since there’s naturally been improvement in training, breeding, etc.).The time plot of winnings times is shown below. Interestingly, there is no downward trend show in the graph, and there is a lot of year-to-year variation in the winning times. It appears that year is not an important factor in describing winning time. [The data in this new edition of the textbook goes to 2007, whereas the data I included for the assignment only go to 2004. My graphics are based on the older set of data. If you noticed this problem and augmented my data with the times from 2005-2007, good for you! Then your graphs will look slightly different from mine.]

 

 

b.      Because we now know time (i.e., year) doesn’t help predict winning race time, we can look at the winning times in a one-variable graph. Winning time is a quantitative variable, so any of the following graphs would be appropriate to show the distribution. The distribution of winning race times is approximately symmetric around 122 seconds. There is one very fast time (119.2 seconds) and a two especially slow times (125 seconds), but they don’t seem to be highly unusual (they still fit with the overall pattern in the rest of the data).

 

Stem-and-Leaf Display: Winning Times (in seconds) for Kentucky Derby Races (1950-2004)

N  = 55, Leaf Unit = 0.10

 

119  2

119  9

120  0123

120

121  00111113334444

121

122  000000111112222222344

122

123  000122223

123

124  000

124

125  00

 

  

 

1.65

Because car color is a categorical variable, it would be appropriate to use either a pie chart or a bar chart as a graphical display. Both types of graphs are shown below. (Note: To create a bar chart in Minitab from a summary table, choose “values from a table” from the dropdown “Bars represent” menu in the Bar Chart dialog box.)

 

 

Silver is the most popular car color, with gray, blue, and black following closely as the next popular colors. The least-popular color is yellow/gold (perhaps too bright and showy?). Another data clarification: These graphs are based on the data from the new version of the textbook. But the correct version of this data file wasn’t on the share drive until Wednesday. Hence, your graphs might look different from mine.

2.47

a.      The five-number summary (from Minitab) for the mercury concentrations in the dolphin livers is shown below. The units on all the numbers are micrograms/gram.

 

Descriptive Statistics: Mercury Concentration (micrograms/gram)

Variable           Minimum      Q1   Median       Q3   Maximum

Mercury (mcg/g)       1.70   130.5    246.5    317.5     485.0

 

b.      The boxplot of the mercury concentrations is shown below. (By default, Minitab creates boxplots vertically, but I transposed the graph as I think it’s easier to read. Either way, vertical or horizontal, is okay.)

 

      

 

c.       According to our suspected outlier criterion (i.e., values lying more than 1.5IQR below the first quartile or above the third quartile), there are no outliers. (If there were outliers, they would be denoted with an asterisk in the boxplot.) That said, there are four unusually small mercury concentrations, which can be seen in the dotplot below. Because the overall spread of the distribution of concentrations is so large and because there are four unusual observations (as opposed to one or two), these observations were not flagged as outliers.

 

     

 

d.      Knowing the first four dolphins were less than 3 years old explains why their mercury concentrations were so much smaller than the rest of the dolphins—they had fewer years to accumulate mercury in their livers.

 

3.31

a.      In this experiment there is one qualitative/categorical variable: Kiln Site (Llanederyn, Island Thorns, or Ashley Rails). Additionally there is one quantitative variable: aluminum-oxide level (within a pottery piece)—unfortunately the textbook doesn’t provide units of measurement for the aluminum oxide, so the numbers are simply “floating in space” with no context.

 

b.      The entire distribution of aluminum-oxide levels for the Llanederyn site is fully below the distributions from the other sites. Hence, the Llanederyn site produced pottery with lower levels of aluminum oxide. The variation in levels—as measured by the interquartile range—is also smallest for the Llanederyn site. The distribution of aluminum-oxide levels for the two other sites (Island Thorns and Ashley Rails) are fairly similar—each with a median around 18 and roughly the same variability in measurements (these two distributions, though, seem to be skewed in opposite directions).

 

4.14

Four equally qualified runners, John (J), Bill (B), Ed (E), and Dave (D), run a 100-meter sprint, and the order of finish is recorded. In this case, the order of finish does matter.

 

a.      S = {JBED, JBDE, JEBD, JEDB, JDBE, JDEB, BJED, BJDE, BEJD, BEDJ, BDJE, BDEJ, EJBD, EJDB, EBJD, EBDJ, EDJB, EDBJ, DJBE, DJEB, DBJE, DBEJ, DEJB, EDBJ}. Hence, there are 24 simple events in the sample space.

 

Note: the problem simply asks for the number of simple events, and does not explicitly ask you to write out the sample space. You can use counting rules to obtain the number of simple events:  (i.e., there are 24 permutations of 4 people).

 

b.      If the runners are equally qualified, then all the points in the sample space should be equally likely (i.e., have probability ).

 

c.       There are six ways for Dave to win the race (this can be verified by looking at the sample space or via a counting-method calculation: ), so the probability is  .

 

d.      There are two ways for Dave to win and John to place second (this can be verified by looking at the sample space or through calculation: ), so the probability is .

 

e.      There are six ways for Ed to finish last (this can be verified by looking at the sample space or through calculation: ),so the probability is .

 

Additional Problem

 

a.      For each additional square foot of living area in a house, the predicted selling price increases by $196.2. Or, perhaps more meaningfully, for each additional 100 square feet of living area in a house, the predicted selling price increases by $19,620.

 

b.      The regression line explains 85.4% of the variation in house selling price. This is a fairly high level of explained variation—the regression line fits the data pretty well.

 

c.       Even though the regression line fits the data fairly well, there is a slight pattern of curvature in the residuals. Hence, a curve would be a better summary for the relationship in these data (rather than a straight line)—in fact, when a quadratic regression is fit,  increases from 85.4% to 89.4%.