Math 445 Computer Lab – One-Variable Graphics and Descriptive Statistics

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the math_445 folder. What you see in this folder are the Minitab files we will use in today’s lab: BodyDimensions.MPJ, BodyTemp.MPJ, and WineSales.MPJ.

 

As a class, we cannot access these share files (only one person can assess them at a time). Thus, you each need to copy the three files to your personal account. You can do this by simply highlighting all the files (click on the first one, then shift-click on the last one—this should highlight them all). Then press Ctrl-C to copy the files. Now open the My Documents folder on the desktop (this is the My Documents folder of your personal account). Once you are in the My Documents folder, hit Ctrl-V to paste the three files into your account.

 

Now open the Minitab software (from the Start menu select Programs>Class Programs and then Minitab>Minitab15). Then open the first file (BodyTemp.MPJ) in Minitab: go to the File menu and choose Open Project.

 

Description of BodyTemp.MPJ

In 1992, a study was done to investigate the average body temperature of healthy adults (and compare this to the standard of 98.6 degrees Fahrenheit). The results were published in the Journal of the American Medical Association. The study included 130 subjects (aged 18 through 40 years), including 65 men and 65 women. All were healthy volunteers recruited by the Center for Vaccine Development, University of Maryland School of Medicine. This dataset contains information (sex, resting heart rate, and body temperature) for the 130 subjects included in the study.  

 

Analysis

The Sex variable is categorical (0 – male, 1 – female), but the labels “0” and “1” are not very meaningful, so we will recode the variable. From the Data menu, select Code>Numeric to Text. Select the Sex variable for the Code data from column and select the Sex  variable as the Into column (important note: always be careful when overwriting a column; sometimes it’s actually best to create a new column rather than overwriting—in this case we want a more meaningfully labeled column, so overwriting is fine). For the first Original value, type in “0” (no quotation marks) and for the first New value, type in “male” (no quotation marks). For the second Original value, type in “1” (no quotation marks) and for the second New value, type in “female” (no quotation marks). Then click on the OK button (this is always your last step). The Sex column has now been recoded. (We’ll briefly discuss in class how to tally or graph the categorical variable, Sex. That’s not a very interesting question for this data set, though, as we’re given the tallies up front.)

 

To make a histogram of the heart rate variable select Histogram from the Graph menu (select the Simple histogram, which is the default). In the dialog box, select Heart Rate as the variable to graph. Then title the graph (through the Labels button). By default, Minitab will create a frequency histogram. If you want to create a percent histogram (what the book calls a relative frequency histogram), then click on the Scale button and then the Y-Scale Type folder tab. In words, how would you describe the distribution of resting heart rates?

 

To get descriptive statistics for the heart rate variable, go to the Stat menu and select Basic Statistics then Display Descriptive Statistics. In the dialog box, select the Heart Rate variable. Note you can choose the statistics to be calculated by going through the Statistics... button. Do these descriptive statistics corroborate your description of the salary distribution based on the histogram?

 

To create a stem-and-leaf plot of the heart rate variable, go to the Graph menu and select Stem-and-Leaf. Select the heart rate variable in the dialog box. The stem-and-leaf plot appears in the session window. Note that by changing the increment (in the dialog box) you can change the number of split stems in the plot. (Minitab chooses to break the stems into five rows to effectively spread out the distribution for visual inspection. You can break the stems into only two rows by choosing an increment of 5 from the dialog box. Which graph do you think best depicts the distribution?) In class, we’ll also briefly discuss how to create and interpret a single dotplot or boxplot, and rough guidelines on when to use what graph.

 

It seems reasonable that the distribution of heart rates and body temperatures might be different for males and females, and, if so, we should analyze the data separately. To investigate this question, we can create dotplots and boxplots separately for the different sexes. To create the dotplots, select Dotplot from the Graph menu and select the One Y, With Groups graph. (Note that one of the decisions you must make while doing exploratory analysis with Minitab is what graphs are most effective for your particular data and the set-up of your data in the worksheet.) In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical variable for grouping. Then title the graph (through the Labels button). The dotplots appear in their own window. How would you describe the two distributions of heart rates (are there differences or similarities)? Suppose there is a certain point that you would like to identify. From the Editor menu select Brush, and then click on a point. The row of the observation will be identified and you can look up the school in the worksheet (this can be especially helpful when investigating outliers).

 

To create the boxplots, select Boxplot from the Graph menu and select the One Y, With Groups graph. In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical variable for grouping. Then click on the Scale button, and check the “Transpose value and category scales” box (this creates horizontal rather than vertical boxplots, which I think are easier to read). Finally, title the graph (through the Labels button). Note that a few female body temperatures are marked with asterisks in the plots. These are flagged as suspected outliers. The Editor>Brush function can be used to identify which subjects have these outlying body temperatures. (Then, for example, if they are simply mis-coded values, they can be changed.)

 

Note it is also possible to get descriptive statistics for the body temperatures, separately for the sexes. To do so, simply select Body Temperature as the Variable and Sex as the By variable in the descriptive-statistics dialog box. Then the summary statistics for each sex are shown next to each other for easy comparison. Do these numerical summaries corroborate your conclusion when comparing the distributions based on separate graphical displays?

 

It’s also possible to get separate histograms based on sex (use the Simple Histogram, but from the Multiple Graphs button select “In separate panels of the same graph” and then select Sex as the “By variable”). Note that when comparing groups of different sizes, you should definitely use percent (not frequency) histograms. Our data set, though, has equal number of men and women, so we can use either type of histogram.

 

(In class, we’ll also briefly foreshadow regression and significance testing by creating scatterplots and performing a t-test on the “status quo” hypothesis of a 98.6 degree body temperature, on average.)

 

Description of WineSales.MPJ

This data file includes the monthly sales of dry white wine (in thousands of liters) in Australia from January 1980 to July 1995

 

Analysis

Obviously time is a potential factor related to wine sales, so we should initially consider time in our analysis (if, after the analysis, it appears that time has no impact on wine sales, then we can ignore it and use a simple one-variable graph, like a histogram). To create a time series plot of wine sales, select Time Series Plot from the Graph menu (select the Simple graph). In the dialog box, select White Wine Sales as the Series and then title the graph through the Labels button. Now click on the Time/Scale button. This dialog box allows you to label the x-axis in a more meaningful way. Instead of “Index” as the time scale, choose “Calendar: Month Year.” Then for the start values enter 1 for the month and 1980 for the year.

 

What do you see in the time series plot? How would you analyze the relationship between time and white wine sales in Australia?

 

Important Note: The time series plot in Minitab assumes equal increments between the data points. Some time series have unequal increments, and then using the time series plot can be graphically misleading. If you encounter a time series with unequal increments, you can still create a time plot by using the Graph>Scatterplot>With Connect Line option (entering your time series as the y variable and the actual time increments as the x variable).

 

Description of BodyDimensions.MPJ

In a 2003 study, body girth (circumference) measurements (in cm) and skeletal diameter measurements (in cm), as well as age (in years), weight (in kg), height (in cm), and sex were measured on 507 physically active individuals (247 men and 260 women).

Analysis

This data set is very rich, in that there are many variables and many questions to investigate. You can, for example, 1) look at single variables (graphically and numerically), 2) compare the distributions of a variable based on sex, 3) or use a scatterplot to consider the relationship between two variables. Furthermore, Suppose you want to analyze the data separately for men and women. Then from the Data menu select Split Worksheet. In the dialog box, select the Sex variable as the “By variable”. Minitab will then create two new worksheets separating the data by sex (note that it will also keep the original worksheet intact). The highlighted worksheet will be the active worksheet and it’s the active worksheet that Minitab will work with (be sure to label graphs appropriately).

 

Have fun!

 

 

Additional Notes

 

  • To print out a graph, simply choose Print Graph from the File menu (this only works for a graph that is the active window). If a full-page graph isn’t desired (which it typically isn’t—be green!), the Minitab graphs can be copied and the pasted (and resized) within a Microsoft Word document (this is ideal for reports and can also be useful for homework).

 

  • Minitab has excellent help resources. If you’re stumped about how to use a Minitab function, you can click on the Help button (located in all dialog boxes), or you can search the help index.

 

  • Please do not turn off your brain when using statistical software. It’s important for you to think carefully about the type of variables, data set up, and research questions before doing exploratory data analysis.