Math 217 Computer Lab – One-Variable Graphics and Descriptive Statistics

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the Math445 folder (just for this lab—next week we’ll have our own Math217 folder). What you see in this folder are the Minitab files we will use in today’s lab: BodyTemp.MPJ, DrinkingStudy.MPJ, and WineSales.MPJ.

 

As a class, we cannot access these share files (only one person can assess them at a time). Thus, you each need to copy the three files to your personal account. You can do this by simply highlighting all the files (click on the first one, then shift-click on the last one—this should highlight them all). Then press Ctrl-C to copy the files. Now open the My Documents folder on the desktop (this is the My Documents folder of your personal account). Once you are in the My Documents folder, hit Ctrl-V to paste the four files into your account.

 

Now open the Minitab software (from the Start menu select Programs>Class Programs and then Minitab>Minitab15). Then open the first file (DrinkingStudy.MPJ) in Minitab: go to the File menu and choose Open Project.

 

Description of DrinkingStudy.MPJ

In 1994 the Harvard School of Public Health published a college alcohol study. Samples of students from 140 four-year colleges were asked questions about their alcohol consumption (demographic information was also collected). The 10,904 responses are included in this data set. The students answered questions based on their behavior in the last 30 days (e.g., they recorded how often they drove after drinking within the last 30 days).

 

Analysis

We will use tallies, pie charts, and bar charts to summarize the categorical variables. Suppose we simply want to know the proportion of males and females in the sample (in numbers, not a graph). To get a tally go to the Stat menu and select Tables>Tally Individual Variables. Enter Sex as the variable (this can be done by double clicking on the Sex variable in the left column or by highlighting the Sex variable and then clicking on the Select button). Then check the boxes for both counts and percents. Finally, click on the OK button. The results are shown in the session window. Are there an equal number of males and females in the sample? How might this affect the rest of our analysis?

 

To create a pie chart go to the Graph menu and select Pie Chart. A dialog box will appear. Click on the box under Categorical variables and then select the Driven After Drinking Alcohol variable. Now click on the Labels button and type in a title (putting an informative title on all your graphs is very important!). Then click on the Slice Labels folder tab and select Frequency and Percent (by default, Minitab shows the pie chart with no associated numbers, but it’s nice to have both the visual pie slices and the number summaries). Finally, click on the OK button (and then the OK button again).

 

The pie chart will appear as its own window. Notice there are four pie slices, including a group of 52 students for which we have no information (a blank as a response indicates missing data). If applicable, we can create a pie chart that reports only on people who responded to the question (this might not be a reasonable thing to do if there was a high percentage of non-response). To do this, go back to the pie chart dialog box. Click on the Data Options button and then the Group Options folder tab. Uncheck the box, Include missing as a group. The pie chart Minitab creates will now only have 3 slices, corresponding to the three answer choices.

 

Once a graph has been created, you can edit it. To change the title, you simply double click on the title. Then a dialog box will appear, allowing you to change the text, font, style, size, and alignment (if you simply click once on the title, then you are allowed to move its position). If you click once on one of the pie slices, then click on the pie again, and finally right click and select Edit Pie, then you can change the fill color of that pie slice (using the Custom fill pattern). There are many other editing options under the Editor menu.

 

To make a bar chart go to the Graph menu and select Bar Chart (and select Simple, which is the default). Again, a dialog box will appear. Select Served as a Designated Driver as the Categorical variable. Click on the Labels button, and title the graph. Then click on the Data Labels folder tab. The default setting is for no labels to be placed on top of the bars of the graph. If you want the count placed on top of the bar, then select “Use y-value labels” (this can sometimes be a helpful feature, but at other times can make the graph too busy—decide what works best for the specific analysis and presentation).

 

Within the bar chart dialog box, we have the same option (as with the pie chart) to exclude missing data. Suppose, though, we want to include the missing data in the graph, but want to better label that category. To do this, double click somewhere on the horizontal axis of the bar chart. This brings up an Edit Scale dialog box. Now click on the Labels folder tab, and choose Specified (rather than Automatic). For the first label, instead of simply a blank in the quotation marks type in “Missing Data.” Then click on the OK button. This should make the label change on the graph.

 

Description of BodyTemp.MPJ

In 1992, a study was done to investigate the average body temperature of healthy adults (and compare this to the standard of 98.6 degrees Fahrenheit). The results were published in the Journal of the American Medical Association. The study included 130 subjects (aged 18 through 40 years), including 65 men and 65 women. All were healthy volunteers recruited by the Center for Vaccine Development, University of Maryland School of Medicine. This dataset contains information (sex, resting heart rate, and body temperature) for the 130 subjects included in the study.  

 

Analysis

The Sex variable is categorical (0 – male, 1 – female), but the labels “0” and “1” are not very meaningful, so we will recode the variable. From the Data menu, select Code>Numeric to Text. Select the Sex variable for the Code data from column and select the Sex  variable as the Into column (important note: always be careful when overwriting a column; sometimes it’s actually best to create a new column rather than overwriting—in this case we want a more meaningfully labeled column, so overwriting is fine). For the first Original value, type in “0” (no quotation marks) and for the first New value, type in “male” (no quotation marks). For the second Original value, type in “1” (no quotation marks) and for the second New value, type in “female” (no quotation marks). Then click on the OK button (this is always your last step). The Sex column has now been recoded.

 

To make a histogram of the heart rate variable select Histogram from the Graph menu (select the Simple histogram, which is the default). In the dialog box, select Heart Rate as the variable to graph. Then title the graph (through the Labels button). By default, Minitab will create a frequency histogram. If you want to create a percent histogram (what the book calls a relative frequency histogram), then click on the Scale button and then the Y-Scale Type folder tab. In words, how would you describe the distribution of resting heart rates?

 

To get descriptive statistics for the heart rate variable, go to the Stat menu and select Basic Statistics then Display Descriptive Statistics. In the dialog box, select the Heart Rate variable. Note you can choose the statistics to be calculated by going through the Statistics... button. Do these descriptive statistics corroborate your description of the salary distribution based on the histogram?

 

To create a stem-and-leaf plot of the heart rate variable, go to the Graph menu and select Stem-and-Leaf. Select the heart rate variable in the dialog box. The stem-and-leaf plot appears in the session window. Note that by changing the increment (in the dialog box) you can change the number of split stems in the plot. (Minitab chooses to break the stems into five rows to effectively spread out the distribution for visual inspection. You can break the stems into only two rows by choosing an increment of 5 from the dialog box. Which graph do you think best depicts the distribution?)

 

It seems reasonable that the distribution of heart rates and body temperatures might be different for males and females, and, if so, we should analyze the data separately. To investigate this question, we can create dotplots and boxplots separately for the different sexes. To create the dotplots, select Dotplot from the Graph menu and select the One Y, With Groups graph. (Note that one of the decisions you must make while doing exploratory analysis with Minitab is what graphs are most effective for your particular data and the set-up of your data in the worksheet.) In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical variable for grouping. Then title the graph (through the Labels button). The dotplots appear in their own window. How would you describe the two distributions of heart rates (are there differences or similarities)? Suppose there is a certain point that you would like to identify. From the Editor menu select Brush, and then click on a point. The row of the observation will be identified and you can look up the school in the worksheet (this can be especially helpful when investigating outliers).

 

To create the boxplots, select Boxplot from the Graph menu and select the One Y, With Groups graph. In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical variable for grouping. Then click on the Scale button, and check the “Transpose value and category scales” box (this creates horizontal rather than vertical boxplots, which I think are easier to read). Finally, title the graph (through the Labels button). Note that a few female body temperatures are marked with asterisks in the plots. These are flagged as suspected outliers (according to the rule we discussed in Section 1.2 of the textbook). The Editor>Brush function can be used to identify which subjects have these outlying body temperatures.

 

Note it is also possible to get descriptive statistics for the body temperatures, separately for the sexes. To do so, simply select Body Temperature as the Variable and Sex as the By variable in the descriptive-statistics dialog box. Then the summary statistics for each sex are shown next to each other for easy comparison. Do these numerical summaries corroborate your conclusion when comparing the distributions based on separate graphical displays?

 

It’s also possible to get separate histograms based on sex (use the Simple Histogram, but from the Multiple Graphs button select “In separate panels of the same graph” and then select Sex as the “By variable”).

 

Description of WineSales.MPJ

This data file includes the monthly sales of dry white wine (in thousands of liters) in Australia from January 1980 to July 1995

 

Analysis

Obviously time is a potential factor related to wine sales, so we should initially consider time in our analysis (if, after the analysis, it appears that time has no impact on wine sales, then we can ignore it and use a simple one-variable graph, like a histogram). To create a time series plot of wine sales, select Time Series Plot from the Graph menu (select the Simple graph). In the dialog box, select White Wine Sales as the Series and then title the graph through the Labels button. Now click on the Time/Scale button. This dialog box allows you to label the x-axis in a more meaningful way. Instead of “Index” as the time scale, choose “Calendar: Month Year.” Then for the start values enter 1 for the month and 1980 for the year.

 

What do you see in the time series plot? How would you analyze the relationship between time and white wine sales in Australia?

 

Important Note: The time series plot in Minitab assumes equal increments between the data points. Some time series have unequal increments, and then using the time series plot can be graphically misleading. If you encounter a time series with unequal increments, you can still create a time plot by using the Graph>Scatterplot>With Connect Line option (entering your time series as the y variable and the actual time increments as the x variable).

 

Additional Notes

 

  • To print out a graph, simply choose Print Graph from the File menu (this only works for a graph that is the active window). If a full-page graph isn’t desired, the Minitab graphs can be copied and the pasted (and resized) within a Microsoft Word document (this is ideal for reports and can also be useful for homework).

 

  • Minitab has excellent help resources. If you’re stumped about how to use a Minitab function, you can click on the Help button (located in all dialog boxes), or you can search the help index.

 

  • Please do not turn off your brain when using statistical software. It’s important for you to think carefully about the type of variables, data set up, and research questions before doing exploratory data analysis.