Elementary
Statistics Problems – One-variable Graphics and Numerical Summaries
- Included below are the reported ACT scores for a previous
class of Math 117 students.
31 26 25 28 29 28 28 31 25 29 30
28 28 29 26 29 28 27 30 32
- Create a stem-and-leaf plot of the ACT scores. (Note
there are many stem-and-leaf plots that can be made from these data,
depending on leaf unit and how the leaves are split. Choose the
representation that seems to best show the distribution.)
- Describe the distribution of the ACT scores (think about
overall pattern—shape, center, and spread—and any deviations from the
pattern). Would the distribution best be characterized as approximately
symmetric, skewed left, or skewed right?
- Calculate the five-number summary for the ACT scores.
- According to the rule on page 48 of the textbook, are
there any suspected outliers?
- Suppose you found an outlier, would it be okay to simply
throw it out of the data set? (That is, if an outlier is detected, what
issues should be considered?)
- Create a boxplot of the ACT scores.
- Shown below is a clustered bar chart showing peanut butter
preferences by sex for a previous class of Math 117 students. How is this
graph potentially misleading (specifically when comparing female and male
preferences)? What change could be made to better the graphical
representation?

- Two different sections of a statistics course took the
same exam. The distributions of exam scores (separated by section) are shown
in the dotplots below. The value labels along the horizontal axis are
purposely left off, as you need not do any calculations to answer
the following questions.

- Is the mean score for Section A the same, bigger, or smaller
than the mean score for Section B? Explain your answer.
- Is the standard deviation of scores for Section A the
same, bigger, or smaller than the standard deviation for Section B?
Explain your answer.
- In this case, how is the standard deviation a more
informative measure of variability than the range is?
- In the past, Math 117 students completed a survey on the
first day of class (we didn’t have time to do it this term). Shown below
is the histogram of responses to one of the questions. Consider the following
variables: hours of sleep on a typical weeknight, monetary amount (in
dollars) of carried coin money, randomly selected integer from 0 to 9, and
height in inches. Which of these variables do you think is depicted in the
histogram? Give reasons why your answer is correct, and why the other
answers are incorrect.

- A statistics course has two different sections (call them
Class A and Class B). Recently, an exam was given in the course (both
classes took the same exam). For Class A, the sample mean and standard
deviation were
points and s =
10.0 points, respectively. For Class B, the sample mean and standard
deviation were
points and s =
15.0 points, respectively. Furthermore, Class A contains 30 students and Class
B contains 20 students.
- What is the sample mean score for both classes combined?
(Note the class sizes are different.)
- Now only consider Class A. Suppose the professor decides
to increase each score by 10% and then add 5 points. What is the linear
transformation that will accomplish this task? What is the new mean of
the scores? What is the new standard deviation of the scores?
- Reconsider the original information given on the
two classes. The professor wants both classes to have a mean of 75.0
points and a standard deviation of 5.0 points. Find the appropriate
linear transformation for each class. (Recall that measures of
center/location, such as the mean, are affected by both additive and
multiplicative constants, but measures of spread, such as the standard
deviation, are only affected by multiplicative constants.)
In problems 6
– 9, a part (or parts) of the given analysis, graph, calculation, or interpretation
is incorrect. You need to determine what is incorrect and why it is
incorrect.
- A local business has 11 employees. The incomes of the
employees are $19,000, $25,000, $34,000, $45,000, $27,000, $63,000,
$23,000, $31,000, $42,000, $61,000, and $31,000. Bubba creates the
following stem-and-leaf plot (with multiple errors):
1 | 9
2 | 5 7 3
3 | 4 1 1
4 | 5 2
6 | 3 1
- A sample of 100 Lawrence students is selected. The GPA,
height, and home state are recorded for each of the students. Bubba wants
to graphically display the distributions of these variables. He decides to
create stem-and-leaf plots of the distributions of GPA and height, and to
create a histogram of the home states.
- The Roller Coaster Database maintains a web site (www.rcdb.com)
with data on roller coasters around the world. Some of the recorded data
include whether the coaster is made of steel or wood and the maximum speed
achieved by the coaster, in miles per hour. The boxplots below display the
distributions of speed by type of coaster for 305 active coasters in the United States.

Bubba makes the following statements (poor Bubba
is often confused):
·
The average maximum speed for steel roller coasters is 50 mph.
·
Because the first quartile is so close to the median, the
distribution of maximum speeds for steel coasters is skewed to the left.
·
The median maximum speed for wooden coasters is higher than the
median for steel coasters.
·
From the boxplots, we can tell there are more steel roller
coasters in the sample.
·
A higher percentage of steel coasters have maximum speeds above
50 mph (as compared to wooden coasters).
- Bubba plans to find the yearly income data (in dollars)
for a random sample of people in the United States. He speculates that the
mean income will be less than the median income. Furthermore, he thinks
the best numerical summary of the data will be to present the mean and
standard deviation.