Double click on the My
Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share
folder. Finally, double click on the Math
folder and then the Math445 folder (just for this lab—next week we’ll
have our own Math217 folder). What you see in this folder are the Minitab files
we will use in today’s lab: BodyTemp.MPJ,
DrinkingStudy.MPJ, and WineSales.MPJ.
As a class, we cannot
access these share files (only one person can assess them at a time). Thus, you
each need to copy the three files to your personal account. You can do this by
simply highlighting all the files (click on the first one, then shift-click on
the last one—this should highlight them all). Then press Ctrl-C to copy the
files. Now open the My Documents folder on the desktop (this is the My
Documents folder of your personal account). Once you are in the My
Documents folder, hit Ctrl-V to paste the four files into your account.
Now open the Minitab
software (from the Start menu select Programs>Class Programs
and then Minitab>Minitab15). Then open the first file (DrinkingStudy.MPJ) in Minitab: go to the File menu and choose Open Project.
In 1994 the Harvard
School of Public Health published a college alcohol study. Samples of students
from 140 four-year colleges were asked questions about their alcohol
consumption (demographic information was also collected). The 10,904 responses
are included in this data set. The students answered questions based on their
behavior in the last 30 days (e.g.,
they recorded how often they drove after drinking within the last 30 days).
We will use tallies, pie
charts, and bar charts to summarize the categorical variables. Suppose we
simply want to know the proportion of males and females in the sample (in
numbers, not a graph). To get a tally
go to the Stat menu and select Tables>Tally Individual Variables.
Enter Sex as the variable (this can
be done by double clicking on the Sex variable
in the left column or by highlighting the Sex
variable and then clicking on the Select button). Then check the boxes
for both counts and percents. Finally, click on the OK button. The results are shown in the session window. Are there
an equal number of males and females in the sample? How might this affect the
rest of our analysis?
To create a pie chart
go to the Graph menu and select Pie Chart. A dialog box will
appear. Click on the box under Categorical variables and then select the
Driven After
Drinking Alcohol variable. Now click on the Labels button and type in a title (putting an informative title on
all your graphs is very important!). Then click on the Slice Labels folder tab and select Frequency and Percent (by
default, Minitab shows the pie chart with no associated numbers, but it’s nice
to have both the visual pie slices and the number summaries). Finally, click on
the OK button (and then the OK
button again).
The pie chart will appear
as its own window. Notice there are four pie slices, including a group of 52
students for which we have no information (a blank as a response indicates
missing data). If applicable, we can create a pie chart that reports only on
people who responded to the question (this might not be a reasonable thing to
do if there was a high percentage of non-response). To do this, go back to the
pie chart dialog box. Click on the Data
Options button and then the Group
Options folder tab. Uncheck the box, Include missing as a group. The pie
chart Minitab creates will now only have 3 slices, corresponding to the three
answer choices.
Once a graph has been
created, you can edit it. To change
the title, you simply double click on the title. Then a dialog box will appear,
allowing you to change the text, font, style, size, and alignment (if you
simply click once on the title, then you are allowed to move its position). If
you click once on one of the pie slices, then click on the pie again, and
finally right click and select Edit Pie,
then you can change the fill color of that pie slice (using the Custom fill pattern). There are many
other editing options under the Editor
menu.
To make a bar chart
go to the Graph menu and select Bar Chart (and select Simple, which is the default). Again, a
dialog box will appear. Select Served as
a Designated Driver as the Categorical
variable. Click on the Labels
button, and title the graph. Then click on the Data Labels folder tab. The default setting is for no labels to be
placed on top of the bars of the graph. If you want the count placed on top of
the bar, then select “Use y-value labels” (this can sometimes be a helpful
feature, but at other times can make the graph too busy—decide what works best
for the specific analysis and presentation).
Within the bar chart
dialog box, we have the same option (as with the pie chart) to exclude missing
data. Suppose, though, we want to include the missing data in the graph, but
want to better label that category. To do this, double click somewhere on the
horizontal axis of the bar chart. This brings up an Edit Scale dialog box. Now click on the Labels folder tab, and choose Specified
(rather than Automatic). For the
first label, instead of simply a blank in the quotation marks type in “Missing
Data.” Then click on the OK button.
This should make the label change on the graph.
In 1992, a study was done
to investigate the average body temperature of healthy adults (and compare this
to the standard of 98.6 degrees Fahrenheit). The results were published in the
Journal of the American Medical Association. The study included 130 subjects
(aged 18 through 40 years), including 65 men and 65 women. All were healthy
volunteers recruited by the Center for Vaccine Development, University of
Maryland School of Medicine. This dataset contains information (sex, resting
heart rate, and body temperature) for the 130 subjects included in the study.
The Sex variable is categorical (0 – male, 1 – female), but the labels
“0” and “1” are not very meaningful, so we will recode the variable. From the Data
menu, select Code>Numeric to Text.
Select the Sex variable for the Code data from column and select the Sex variable as the Into column (important note: always be careful when overwriting a
column; sometimes it’s actually best to create a new column rather than
overwriting—in this case we want a more meaningfully labeled column, so
overwriting is fine). For the first Original value, type in “0” (no quotation marks) and for the first New value, type in “male” (no quotation
marks). For the second Original value, type in “1” (no quotation marks) and for the second
New value, type in “female” (no
quotation marks). Then click on the OK
button (this is always your last step). The Sex
column has now been recoded.
To make
a histogram of the heart rate variable select Histogram from the Graph
menu (select the Simple histogram,
which is the default). In the
dialog box, select Heart Rate as the variable to graph. Then title the
graph (through the Labels button). By
default, Minitab will create a frequency histogram. If you want to create a
percent histogram (what the book calls a relative frequency histogram), then
click on the Scale button and then
the Y-Scale Type folder tab. In
words, how would you describe the distribution of resting heart rates?
To get descriptive
statistics for the heart rate variable, go to the Stat menu and
select Basic Statistics then Display Descriptive Statistics. In
the dialog box, select the Heart Rate
variable. Note you can choose the statistics to be calculated by going through
the Statistics... button. Do these
descriptive statistics corroborate your description of the salary distribution
based on the histogram?
To create a stem-and-leaf
plot of the heart rate variable, go to the Graph menu and select Stem-and-Leaf.
Select the heart rate variable in the dialog box. The stem-and-leaf plot
appears in the session window. Note that by changing the increment (in the
dialog box) you can change the number of split stems in the plot. (Minitab
chooses to break the stems into five rows to effectively spread out the distribution
for visual inspection. You can break the stems into only two rows by choosing
an increment of 5 from the dialog box. Which graph do you think best depicts
the distribution?)
It seems reasonable that
the distribution of heart rates and body temperatures might be different for
males and females, and, if so, we should analyze the data separately. To
investigate this question, we can create dotplots and
boxplots separately
for the different sexes. To create the dotplots,
select Dotplot from the Graph menu and
select the One Y, With Groups graph. (Note that one of the
decisions you must make while doing exploratory analysis with Minitab is what
graphs are most effective for your particular data and the set-up of your data
in the worksheet.) In the dialog box, select Body Temperature as the Graph variable, and Sex as the Categorical
variable for grouping. Then title the graph (through the Labels button). The dotplots
appear in their own window. How would you describe the two distributions of
heart rates (are there differences or similarities)? Suppose there is a certain
point that you would like to identify. From the Editor menu select Brush,
and then click on a point. The row of the observation will be identified and
you can look up the school in the worksheet (this can be especially helpful
when investigating outliers).
To create the boxplots, select Boxplot
from the Graph menu and select the One Y, With Groups graph. In the dialog
box, select Body Temperature as the Graph
variable, and Sex as the Categorical variable for grouping. Then click
on the Scale button, and check the
“Transpose value and category scales” box (this creates horizontal rather than
vertical boxplots, which I think are easier to read).
Finally, title the graph (through the Labels
button). Note that a few female body temperatures are marked with asterisks in
the plots. These are flagged as suspected outliers (according to the rule we discussed
in Section 1.2 of the textbook). The Editor>Brush
function can be used to identify which subjects have these outlying body
temperatures.
Note it is also possible
to get descriptive statistics for the body
temperatures, separately for the sexes. To do so, simply select Body Temperature as the Variable and Sex as the By variable in the descriptive-statistics
dialog box. Then the summary statistics for each sex are shown next to each
other for easy comparison. Do these numerical summaries corroborate your
conclusion when comparing the distributions based on separate graphical
displays?
It’s also possible to get
separate histograms based on sex
(use the Simple Histogram, but from
the Multiple Graphs button select “In
separate panels of the same graph” and then select Sex as the “By variable”).
This data file includes the
monthly sales of dry white wine (in thousands of liters) in
Obviously time is a
potential factor related to wine sales, so we should initially consider time in
our analysis (if, after the analysis, it appears that time has no impact on
wine sales, then we can ignore it and use a simple one-variable graph, like a
histogram). To create a time series plot of wine sales, select Time
Series Plot from the Graph menu (select the Simple graph). In the dialog box, select White Wine Sales as
the Series and then title the graph
through the Labels button. Now click
on the Time/Scale button. This dialog
box allows you to label the x-axis in
a more meaningful way. Instead of “Index” as the time scale, choose “Calendar:
Month Year.” Then for the start values enter 1 for the month and 1980 for the
year.
What do you see in the
time series plot? How would you analyze the relationship between time and white
wine sales in
Important Note: The time series plot in Minitab assumes equal increments between the
data points. Some time series have unequal increments, and then using the time
series plot can be graphically misleading. If you encounter a time series with
unequal increments, you can still create a time plot by using the Graph>Scatterplot>With
Connect Line option (entering your time series as the y variable and the actual time increments as the x variable).
Additional Notes