Double click on the My
Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share
folder. Finally, double click on the Math
folder and then the Math217 folder. If you don’t already have copies in
you account, make copies of the DrinkingStudy.MPJ and
the MedDisposalData.xls (the Excel data file for your project).
Now open the Minitab
software (from the Start menu select Programs>Class Programs
and then Minitab>Minitab15). Then open the first file (DrinkingStudy.MPJ) in Minitab: go to the File menu and choose Open Project.
In 1994 the Harvard
School of Public Health published a college alcohol study. Samples of students
from 140 four-year colleges were asked questions about their alcohol
consumption (demographic information was also collected). The 10,904 responses
are included in this data set. The students answered questions based on their behavior
in the last 30 days (e.g., they
recorded how often they drove after drinking within the last 30 days).
There are many
categorical variables in this data set. Suppose we wonder if Sex and Driven After Drinking Alcohol are dependent variables (i.e., if
there is some kind of relationship between the variables). To perform a chi-square significance test of independence, go to
the Stat menu and choose Tables>Cross Tabulation and Chi-Square. Enter Sex as the row variable and
Driven After Drinking Alcohol as the column variable (note: it doesn’t matter which variables go in the rows and
columns—the test will be the same; but you might want to think about how the
table will most easily be read). Then click on the Chi-Square button. Select Chi-Square analysis, Expected cell
counts, and each cell’s contribution to the Chi-Square statistic (note you can
also select residuals—these, like the contribution to the Chi-square statistic,
give you an indication of which cells show the biggest difference from
independence; if you look at residuals, be sure they are standardized).
The output goes to the
Session window. Notice that the missing-data category is included in the table
(this doesn’t impact the test at all, but it makes for a messier table). It’s
good to know how many data are missing, but it’s also nice to have a compact
table. Hence, go back to the previous dialog box and from the Options button
choose “Display missing values for no variables.”
Now we can interpret the
results. (Note: the Pearson Chi-square
is the statistic we discussed in class, and is the most commonly used
statistic—so this is what you should refer to in your report. The Likelihood
Ratio Chi-square is asymptotically equivalent to the Pearson Chi-square, yet
it’s calculated differently. Don’t worry about the LR Chi-square.) Do we have
evidence of a relationship between these variables? What cells seem to
contribute the most to the dependence? Go back to the previous dialog box to
get row and column percents for the table (you can remove all the Chi-square
display at this point). What particular relationship do you notice? (Important reminder: Significant results
from a chi-square test indicate some relationship between the two variables,
but this does not give any indication of whether one variable causes change in
the other.)
Suppose we want to collapse a variable from 3 categories to 2
categories (we need to think carefully before doing this, as we don’t want
to lose important information). Label column 8 as “Driven After
5 Drinks?” (this title isn’t quite as accurate, but we can’t name two columns
the same thing; another option is to use the same column name, but create a new
worksheet). Now we’ll recode so this new variable only has the answers “no” or
“yes.” From the Data menu select Code>Text to Text. Enter the original
variable (Driven After 5 or More Drinks) as the “copy
from” column and the new variable (Driven After 5 Drinks?). For the first
original value type “not at all” (you must include quotes, because it’s more
than one word) and for the new value type “no” (no quotes needed). For the
second original value type “once” (no quotes needed, since it’s a single word)
and for the new value type “yes” (no quotes needed). For the last new value
type “twice or more” (quotes needed) and for the new value type “yes” (no
quotes needed). Important note: you
must type things in—including capital and lower-case letters—just as they
appear in the Minitab column.
Now we can analyze the
2x2 table of gender and driven after 5 drinks. Go back to the Cross Tabulation
and Chi-Square dialog box. Select Sex
as the row variable and Driven After 5 Drinks? as the column
variable. Choose the appropriate output from the Chi-Square button. Now click
on the Other Stats button, and choose Fisher’s
exact test for 2x2 tables. (Truth in advertising, Fisher’s test is based on
an exact distribution—the hypergeometric distribution—not an approximate
distribution—like the test-of-independence test statistic is based on an
approximation from the Chi-Square distribution. So Fisher’s test can be used
anytime and is a good check against the Chi-square results. The downside is
that it can only be used for 2x2 tables.)
Consider the output. Both
the Chi-square test and Fisher’s test show P-values of essentially 0. Hence, we
have incredibly strong evidence that there is a relationship between these two
variables. Now go back to the previous dialog box, de-select all the Chi-square
analysis, and select row and column percentages. Clearly, females are much less
likely to drive after 5 or more drinks than are males.
Lastly, suppose we’re interested in the relationship between
Driven After Drinking Alcohol and Served as a
Designated Driver, yet we want to look separately for males and females.
For this we can use the “layer” option in the Chi-square dialog box. Choose
Driven After Drinking Alcohol as the row variable,
Served as a Designated Driver as the column variable, and Sex as the layer
variable. First, display only the counts, but then select the appropriate
options from the Chi-Square button.
Look at the output.
Notice the warnings Minitab gives about small expected counts (this is one of
the few, perhaps only, places in Minitab where you get this kind of warning).
What category of Sex is this for? It’s for the missing data, so it’s not a big
deal. Scroll up to see the rest of the output. Separately for males and
females, there is strong evidence of a relationship between Driven After Drinking Alcohol and Served as a Designated Driver.
Now go back to the previous dialog box, turn off the Chi-square analysis and
turn on the row and column percents. How would you describe the nature of the
relationship?
Important note about multiple comparisons (this applies to your project): Suppose we want
to make inference from a data set and we perform 20 chi-square tests. We’d like
the family/overall Type I error rate to be 0.05, so this means (using the Bonferroni correction) we should us a (0.05)/20=0.0025
significance level for each individual test.
Project Data Set
As a class, we’ll copy
the medicine-disposal survey data from Excel to Minitab. Once we do the
copying, we must double-check that we have all the columns and appropriate
column headings (we’ll need to make some fixes in class).
Some important first steps: