Double click on
the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share
folder. Finally, double click on the Math
folder and then the math_445 folder. What you see in this folder are the
Minitab files we will use in today’s lab: BodyDimensions.MPJ, BodyTemp.MPJ, and
WineSales.MPJ.
As a class, we
cannot access these share files (only one person can assess them at a time).
Thus, you each need to copy the three files to your personal account. You can
do this by simply highlighting all the files (click on the first one, then
shift-click on the last one—this should highlight them all). Then press Ctrl-C
to copy the files. Now open the My Documents folder on the desktop (this
is the My Documents folder of your personal account). Once you are in
the My Documents folder, hit Ctrl-V to paste the three files into your
account.
Now open the
Minitab software (from the Start menu select Programs>Class
Programs and then Minitab>Minitab15). Then open the first file (BodyTemp.MPJ)
in Minitab: go to the File menu and
choose Open Project.
In 1992, a
study was done to investigate the average body temperature of healthy adults
(and compare this to the standard of 98.6 degrees Fahrenheit). The results were
published in the Journal of the American
Medical Association. The study included 130 subjects (aged 18 through 40 years),
including 65 men and 65 women. All were healthy volunteers recruited by the
Center for Vaccine Development, University of Maryland School of Medicine. This
dataset contains information (sex, resting heart rate, and body temperature)
for the 130 subjects included in the study.
The Sex variable is categorical (0 – male, 1
– female), but the labels “0” and “1” are not very meaningful, so we will recode the variable. From the Data menu, select Code>Numeric to Text. Select the Sex variable for the Code
data from column and select the Sex variable as the Into column (important note: always be careful when overwriting a
column; sometimes it’s actually best to create a new column rather than
overwriting—in this case we want a more meaningfully labeled column, so
overwriting is fine). For the first Original
value, type in “0” (no quotation marks) and for the first New value, type in “male” (no quotation
marks). For the second Original value,
type in “1” (no quotation marks) and for the second New value, type in “female” (no quotation marks). Then click on the
OK button (this is always your last
step). The Sex column has now been recoded. (We’ll briefly discuss in class how to tally or
graph the categorical variable, Sex.
That’s not a very interesting question for this data set, though, as we’re
given the tallies up front.)
To make a histogram
of the heart rate variable select Histogram from the Graph menu
(select the Simple histogram, which
is the default). In the dialog box, select Heart Rate as the variable to
graph. Then title the graph (through the Labels
button). By default, Minitab will create a frequency histogram. If you want to
create a percent histogram (what the book calls a relative frequency histogram),
then click on the Scale button and
then the Y-Scale Type folder tab. In
words, how would you describe the distribution of resting heart rates?
To get descriptive
statistics for the heart rate variable, go to the Stat menu and
select Basic Statistics then Display Descriptive Statistics. In
the dialog box, select the Heart Rate
variable. Note you can choose the statistics to be calculated by going through
the Statistics... button. Do these
descriptive statistics corroborate your description of the salary distribution
based on the histogram?
To create a stem-and-leaf
plot of the heart rate variable, go to the Graph menu and select Stem-and-Leaf.
Select the heart rate variable in the dialog box. The stem-and-leaf plot
appears in the session window. Note that by changing the increment (in the dialog
box) you can change the number of split stems in the plot. (Minitab chooses to
break the stems into five rows to effectively spread out the distribution for
visual inspection. You can break the stems into only two rows by choosing an increment
of 5 from the dialog box. Which graph do you think best depicts the
distribution?) In class, we’ll also briefly discuss how to create and interpret
a single dotplot or boxplot,
and rough guidelines on when to use what graph.
It seems
reasonable that the distribution of heart rates and body temperatures might be
different for males and females, and, if so, we should analyze the data
separately. To investigate this question, we can create dotplots
and boxplots separately
for the different sexes. To create the dotplots,
select Dotplot from the Graph menu and
select the One Y, With Groups graph. (Note that one of the
decisions you must make while doing exploratory analysis with Minitab is what
graphs are most effective for your particular data and the set-up of your data
in the worksheet.) In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical
variable for grouping. Then title the graph (through the Labels button). The dotplots
appear in their own window. How would you describe the two distributions of
heart rates (are there differences or similarities)? Suppose there is a certain
point that you would like to identify. From the Editor menu select Brush,
and then click on a point. The row of the observation will be identified and
you can look up the school in the worksheet (this can be especially helpful
when investigating outliers).
To create the boxplots, select Boxplot
from the Graph menu and select the One Y, With Groups graph. In the dialog
box, select Body Temperature as the Graph
variable, and Sex as the Categorical variable for grouping. Then click
on the Scale button, and check the
“Transpose value and category scales” box (this creates horizontal rather than
vertical boxplots, which I think are easier to read).
Finally, title the graph (through the Labels
button). Note that a few female body temperatures are marked with asterisks in
the plots. These are flagged as suspected outliers. The Editor>Brush function can be used to identify which subjects
have these outlying body temperatures. (Then, for example, if they are simply mis-coded values, they can be changed.)
Note it is also
possible to get descriptive statistics for
the body temperatures, separately for
the sexes. To do so, simply select Body
Temperature as the Variable and Sex as the By variable in the descriptive-statistics dialog box. Then the
summary statistics for each sex are shown next to each other for easy
comparison. Do these numerical summaries corroborate your conclusion when
comparing the distributions based on separate graphical displays?
It’s also
possible to get separate histograms
based on sex (use the Simple
Histogram, but from the Multiple
Graphs button select “In separate panels of the same graph” and then select
Sex as the “By variable”). Note that when comparing groups of different sizes,
you should definitely use percent (not frequency) histograms. Our data set,
though, has equal number of men and women, so we can use either type of
histogram.
(In class,
we’ll also briefly foreshadow regression and significance testing by creating scatterplots and performing a t-test on the “status quo”
hypothesis of a 98.6 degree body temperature, on average.)
This data file
includes the monthly sales of dry white wine (in thousands of liters) in
Obviously time
is a potential factor related to wine sales, so we should initially consider
time in our analysis (if, after the analysis, it appears that time has no
impact on wine sales, then we can ignore it and use a simple one-variable
graph, like a histogram). To create a time series plot of wine sales,
select Time Series Plot from the Graph menu (select the Simple graph). In the dialog box, select
White Wine Sales as the Series
and then title the graph through the Labels
button. Now click on the Time/Scale
button. This dialog box allows you to label the x-axis in a more meaningful way. Instead of “Index” as the time
scale, choose “Calendar: Month Year.” Then for the start values enter 1 for the
month and 1980 for the year.
What do you see
in the time series plot? How would you analyze the relationship between time
and white wine sales in
Important Note: The time series plot in Minitab assumes
equal increments between the data points. Some time series have unequal
increments, and then using the time series plot can be graphically misleading.
If you encounter a time series with unequal increments, you can still create a
time plot by using the Graph>Scatterplot>With Connect Line option (entering your
time series as the y variable and the
actual time increments as the x
variable).
In
a 2003 study, body girth (circumference) measurements (in cm) and skeletal
diameter measurements (in cm), as well as age (in years), weight (in kg),
height (in cm), and sex were measured on 507 physically active individuals (247
men and 260 women).
This data set
is very rich, in that there are many variables and many questions to
investigate. You can, for example, 1) look at single variables (graphically and
numerically), 2) compare the distributions of a variable based on sex, 3) or
use a scatterplot to consider the relationship
between two variables. Furthermore, Suppose you want to analyze the data separately for men and women. Then from the Data
menu select Split Worksheet. In the dialog box, select the Sex
variable as the “By variable”. Minitab will then create two new worksheets
separating the data by sex (note that it will also keep the original worksheet
intact). The highlighted worksheet will be the active worksheet and it’s the
active worksheet that Minitab will work with (be sure to label graphs
appropriately).
Have fun!
Additional Notes