Math 207 – Solutions to Assignment 1

 

1.53

a.      The distribution of state popular vote for Bush is skewed to the right (long right tail). There are three high vote values that could be considered outliers—these states are Florida, Texas, and California (3 of the top 4 most populous states). The distribution of state percent vote for Bush is approximately symmetric and mound shaped. There are no outliers among the percent votes.

 

c.       The distribution of state popular vote is naturally skewed right because of the nature of state populations: there are many mid-sized state populations, yet there are a few very populous states—hence the long right tail. Determining vote percents takes away the impact of state population. The percent votes pile up in the center and taper off equally on both sides (since there isn’t a natural boundary on either side).

 

1.54

a.      The distribution of heights of biostatistics students is bimodal (notice the large dip in histogram height at 68 inches) and approximately symmetric.

 

b.      The bimodal nature of the distribution of heights is somewhat unusual.

 

c.       Most likely, the two peaks in the distribution represent female heights (lower peak) and male heights (higher peak), as females, on average, are shorter than males (and the class probably consists of both men and women). Additional note: The distribution of heights of female adults is typically symmetric and mound-shaped (i.e., follows an approximate normal distributions) and the same holds, typically, for the distribution of male heights. The histogram shown in this problem seems to be a combination of these two normal distributions, where the female heights are centered at a lower value (although the variation of both groups looks roughly the same).

 

1.60

a.      The sizes/volumes of the items in the graph increase as the calories increase, but they don’t seem to increase in proportion to the increase in calories. For example, there is only a 5-calorie difference between a can of Coke and a bottle of Budweiser, yet the two pictures show a much larger difference in terms of volume.

 

b.      The bar chart is shown below. [Note: I created this graph in Minitab (to make it easier to post the solutions), but you were asked to do the graph by hand.] This chart more accurately displays the calories in America’s favorite foods, as the heights of the bars are the only things to change and they change exactly with the number of calories.

 


 

1.61

The distributions of both sets of algebra exam scores are slightly skewed left (slightly longer left tails). The distribution of scores for students with laptops has a slightly higher center/middle value and slightly less spread. Perhaps more importantly, a much higher percentage of students with laptops scored 80 or better on the final exam. The students weren’t randomly assigned to treatments (laptop or no laptop), though, so we cannot say if laptop use caused better grades.

 

2.55

a.      Generic Brand numbers (ordered): 24  25  25  25  26  26  26  26  26  27  27  28  28  28

The median is in position (14 + 1)(.5) = 7.5, which is between the two middle 26 values. Hence, the median number is 26. The first and third quartiles are in positions (14 + 1)(.25) = 3.75 and (14 + 1)(.75) = 11.25, respectively. Hence, the quartiles are 25 (three-quarters of the way between the third 25 value and the fourth 25 value) and 27.25 (one-quarter of the way between 27 and 28), respectively.

 

For the generic brand, the median, quartiles, and IQR number of raisins are 26, 25, 27.25, and 2.25, respectively. (Note: The values are counts, so no units need be included.)

 

Sunmaid numbers (ordered): 22  24  24  24  24  25  25  27  28  28  28  28  29  30

The median is in position (14 + 1)(.5) = 7.5, which is between the values 25 and 27. Hence, the median number is 26. The first and third quartiles are in positions (14 + 1)(.25) = 3.75 and (14 + 1)(.75) = 11.25, respectively. Hence, the quartiles are 24 (three-quarters of the way between the third 24 value and the fourth 24 value) and 28 (one-quarter of the way between the third 28 value and the fourth 28 value), respectively.

 

For the Sunmaid brand, the median, quartiles, and IQR number of raisins are 26, 24, 28, and 4, respectively. (Note: The values are counts, so no units need be included.)

 

b.      The boxplots are shown below. Note: I created this graph in Minitab (to make it easier to post the solutions), but you were asked to do the graph by hand.

 

 

c.       The stem-and-leaf plots are shown below.

 

Generic Brand, Number of Raisins

(leaf unit = 0.1)

 

Sunmaid Brand, Number of Raisins

(leaf unit = 0.1)

 

 

22

0

 

 

 

23

 

24

0

 

24

0 0 0 0

25

0 0 0

 

25

0 0

26

0 0 0 0 0

 

26

 

27

0 0

 

27

0

28

0 0 0

 

28

0 0 0 0

 

 

 

29

0

 

 

 

30

0

The stem-and-leaf plots and boxplots show the same distribution shapes (although the stem plots give more detail, as they report all the individual values). The distributions of number of raisins for both brands are approximately symmetric, centered around 26, yet the distribution for Sunmaid raisins has more variability (more spread).

 

d.      As mentioned in part c, the median number of raisins in a 0.5-oz box is the same for both brands, but there is more variability in the number of raisins for the Sunmaid brand. If each of the boxes weighs the same, yet there’s more variation in the number of raisins per box made by Sunmaid, this means there’s more variability in the sizes of the individual Sunmaid raisins (as compared to the generic brand).

 

2.58

As provided on the homework assignment, the mean and standard deviation time to recurrence are 8.37 months and 7.67 months, respectively.

 

a.      The number of observations within the given intervals are shown in the table below.

 

 

Interval

(values in months)

Number of Observations

Percent of Observations

Percent According to Empirical Rule

Percent According to Tchebysheff

(0.7, 16.04)

37

74%

68%

at least 0%

(-6.97, 23.71)

47

94%

95%

at least 75%

(-14.64, 31.38)

49

98%

99.7%

at least 88.9%

 

b.      The percentages agree with Tchebysheff’s rule (recall the rule simply gives a lower bound), but they don’t agree with the Empirical rule (especially at one standard deviation away from the mean).

 

c.       The Empirical Rule only applies to mound-shaped distributions. The distribution of times to recurrence seems like it would be skewed right. There is a natural boundary at 0 months (there can’t be a negative amount of time to recurrence) and some people would fall near that boundary, yet there are other people (perhaps just a few) who would have very long periods of time between illnesses. Because the distribution of recurrence times is probably skewed, not mound-shaped, the Empirical Rule doesn’t apply. (Note: The distribution of times to reoccurrence is indeed skewed—see the histogram below. You didn’t have to include the histogram as part of your solution, but it’s a helpful piece of the overall analysis.)

 

 

 

 

 


 

2.64

a.      The mean and standard deviation were given as part of the homework assignment: 6.85 hours and 1.01 hours, respectively.

 

b.      The z-score for the largest value in the sample is . Hence, this particular value of sleep hours is 1.63 standard deviations above the mean. While this value is not typical, it is certainly not unusual.

 

d.        Sample of Sleep Hours (ordered):   5   6   6   6.75   7   7   7   7.25   8   8.50

The positions of the first quartile, median, and third quartile are 2.75, 5.5, and 8.25, respectively. Hence, the first quartile is 6 hours; the median is 7 hours; the third quartile is  hours. Then the interquartile range is 7.44 – 6 = 1.44 hours, and  hours. Then the lower and upper fences are (6 – 2.16) = 3.84 hours and (7.44 + 2.16) = 9.6 hours, respectively. Because all observations fall within this range, there are no suspected outliers. This agrees with the answer to part b—the largest sample value of 8.5 hours is not unusually large (i.e., not a suspected outlier). The boxplot is shown below.