Elementary Statistics Problems – One-variable Graphics and Numerical Summaries

 

  1. Included below are the reported ACT scores for a previous class of Math 117 students.

 

       31  26  25  28  29  28  28  31  25  29  30  28  28  29  26  29  28  27  30  32

 

 

    1. Create a stem-and-leaf plot of the ACT scores. (Note there are many stem-and-leaf plots that can be made from these data, depending on leaf unit and how the leaves are split. Choose the representation that seems to best show the distribution.)

 

 

    1. Describe the distribution of the ACT scores (think about overall pattern—shape, center, and spread—and any deviations from the pattern). Would the distribution best be characterized as approximately symmetric, skewed left, or skewed right?

 

 

    1. Calculate the five-number summary for the ACT scores.

 

 

    1. According to the rule on page 48 of the textbook, are there any suspected outliers?

 

 

    1. Suppose you found an outlier, would it be okay to simply throw it out of the data set? (That is, if an outlier is detected, what issues should be considered?)

 

 

    1. Create a boxplot of the ACT scores.

 

 

 

  1. Shown below is a clustered bar chart showing peanut butter preferences by sex for a previous class of Math 117 students. How is this graph potentially misleading (specifically when comparing female and male preferences)? What change could be made to better the graphical representation?

 

 

 

 

 

 


  1. Two different sections of a statistics course took the same exam. The distributions of exam scores (separated by section) are shown in the dotplots below. The value labels along the horizontal axis are purposely left off, as you need not do any calculations to answer the following questions.

 

 

    1. Is the mean score for Section A the same, bigger, or smaller than the mean score for Section B? Explain your answer.

 

 

    1. Is the standard deviation of scores for Section A the same, bigger, or smaller than the standard deviation for Section B? Explain your answer.

 

 

    1. In this case, how is the standard deviation a more informative measure of variability than the range is?

 

 

  1. In the past, Math 117 students completed a survey on the first day of class (we didn’t have time to do it this term). Shown below is the histogram of responses to one of the questions. Consider the following variables: hours of sleep on a typical weeknight, monetary amount (in dollars) of carried coin money, randomly selected integer from 0 to 9, and height in inches. Which of these variables do you think is depicted in the histogram? Give reasons why your answer is correct, and why the other answers are incorrect.

 

 


  1. A statistics course has two different sections (call them Class A and Class B). Recently, an exam was given in the course (both classes took the same exam). For Class A, the sample mean and standard deviation were  points and s = 10.0 points, respectively. For Class B, the sample mean and standard deviation were  points and s = 15.0 points, respectively. Furthermore, Class A contains 30 students and Class B contains 20 students.

 

    1. What is the sample mean score for both classes combined? (Note the class sizes are different.)

 

 

 

    1. Now only consider Class A. Suppose the professor decides to increase each score by 10% and then add 5 points. What is the linear transformation that will accomplish this task? What is the new mean of the scores? What is the new standard deviation of the scores?

 

 

 

    1. Reconsider the original information given on the two classes. The professor wants both classes to have a mean of 75.0 points and a standard deviation of 5.0 points. Find the appropriate linear transformation for each class. (Recall that measures of center/location, such as the mean, are affected by both additive and multiplicative constants, but measures of spread, such as the standard deviation, are only affected by multiplicative constants.)

 

 

 

 

 

 

In problems 6 – 9, a part (or parts) of the given analysis, graph, calculation, or interpretation is incorrect. You need to determine what is incorrect and why it is incorrect.

 

  1. A local business has 11 employees. The incomes of the employees are $19,000, $25,000, $34,000, $45,000, $27,000, $63,000, $23,000, $31,000, $42,000, $61,000, and $31,000. Bubba creates the following stem-and-leaf plot (with multiple errors):

 

1 | 9

2 | 5 7 3

3 | 4 1 1

4 | 5 2

6 | 3 1

 

 

 

 

  1. A sample of 100 Lawrence students is selected. The GPA, height, and home state are recorded for each of the students. Bubba wants to graphically display the distributions of these variables. He decides to create stem-and-leaf plots of the distributions of GPA and height, and to create a histogram of the home states.

  2. The Roller Coaster Database maintains a web site (www.rcdb.com) with data on roller coasters around the world. Some of the recorded data include whether the coaster is made of steel or wood and the maximum speed achieved by the coaster, in miles per hour. The boxplots below display the distributions of speed by type of coaster for 305 active coasters in the United States.

 

 

            Bubba makes the following statements (poor Bubba is often confused):

 

·       The average maximum speed for steel roller coasters is 50 mph.

 

·       Because the first quartile is so close to the median, the distribution of maximum speeds for steel coasters is skewed to the left.

 

·       The median maximum speed for wooden coasters is higher than the median for steel coasters.

 

·       From the boxplots, we can tell there are more steel roller coasters in the sample.

 

·       A higher percentage of steel coasters have maximum speeds above 50 mph (as compared to wooden coasters).

 

 

 

  1. Bubba plans to find the yearly income data (in dollars) for a random sample of people in the United States. He speculates that the mean income will be less than the median income. Furthermore, he thinks the best numerical summary of the data will be to present the mean and standard deviation.