Math 217 Computer Lab – Inference in Two-way Tables

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the Math217 folder. If you don’t already have copies in you account, make copies of the DrinkingStudy.MPJ and the MedDisposalData.xls (the Excel data file for your project).

 

Now open the Minitab software (from the Start menu select Programs>Class Programs and then Minitab>Minitab15). Then open the first file (DrinkingStudy.MPJ) in Minitab: go to the File menu and choose Open Project.

 

Description of DrinkingStudy.MPJ

In 1994 the Harvard School of Public Health published a college alcohol study. Samples of students from 140 four-year colleges were asked questions about their alcohol consumption (demographic information was also collected). The 10,904 responses are included in this data set. The students answered questions based on their behavior in the last 30 days (e.g., they recorded how often they drove after drinking within the last 30 days).

 

Analysis

There are many categorical variables in this data set. Suppose we wonder if Sex and Driven After Drinking Alcohol are dependent variables (i.e., if there is some kind of relationship between the variables). To perform a chi-square significance test of independence, go to the Stat menu and choose Tables>Cross Tabulation and Chi-Square. Enter Sex as the row variable and Driven After Drinking Alcohol as the column variable (note: it doesn’t matter which variables go in the rows and columns—the test will be the same; but you might want to think about how the table will most easily be read). Then click on the Chi-Square button. Select Chi-Square analysis, Expected cell counts, and each cell’s contribution to the Chi-Square statistic (note you can also select residuals—these, like the contribution to the Chi-square statistic, give you an indication of which cells show the biggest difference from independence; if you look at residuals, be sure they are standardized).

 

The output goes to the Session window. Notice that the missing-data category is included in the table (this doesn’t impact the test at all, but it makes for a messier table). It’s good to know how many data are missing, but it’s also nice to have a compact table. Hence, go back to the previous dialog box and from the Options button choose “Display missing values for no variables.”

 

Now we can interpret the results. (Note: the Pearson Chi-square is the statistic we discussed in class, and is the most commonly used statistic—so this is what you should refer to in your report. The Likelihood Ratio Chi-square is asymptotically equivalent to the Pearson Chi-square, yet it’s calculated differently. Don’t worry about the LR Chi-square.) Do we have evidence of a relationship between these variables? What cells seem to contribute the most to the dependence? Go back to the previous dialog box to get row and column percents for the table (you can remove all the Chi-square display at this point). What particular relationship do you notice? (Important reminder: Significant results from a chi-square test indicate some relationship between the two variables, but this does not give any indication of whether one variable causes change in the other.)

 

Suppose we want to collapse a variable from 3 categories to 2 categories (we need to think carefully before doing this, as we don’t want to lose important information). Label column 8 as “Driven After 5 Drinks?” (this title isn’t quite as accurate, but we can’t name two columns the same thing; another option is to use the same column name, but create a new worksheet). Now we’ll recode so this new variable only has the answers “no” or “yes.” From the Data menu select Code>Text to Text. Enter the original variable (Driven After 5 or More Drinks) as the “copy from” column and the new variable (Driven After 5 Drinks?). For the first original value type “not at all” (you must include quotes, because it’s more than one word) and for the new value type “no” (no quotes needed). For the second original value type “once” (no quotes needed, since it’s a single word) and for the new value type “yes” (no quotes needed). For the last new value type “twice or more” (quotes needed) and for the new value type “yes” (no quotes needed). Important note: you must type things in—including capital and lower-case letters—just as they appear in the Minitab column.

 

Now we can analyze the 2x2 table of gender and driven after 5 drinks. Go back to the Cross Tabulation and Chi-Square dialog box. Select Sex as the row variable and Driven After 5 Drinks? as the column variable. Choose the appropriate output from the Chi-Square button. Now click on the Other Stats button, and choose Fisher’s exact test for 2x2 tables. (Truth in advertising, Fisher’s test is based on an exact distribution—the hypergeometric distribution—not an approximate distribution—like the test-of-independence test statistic is based on an approximation from the Chi-Square distribution. So Fisher’s test can be used anytime and is a good check against the Chi-square results. The downside is that it can only be used for 2x2 tables.)

 

Consider the output. Both the Chi-square test and Fisher’s test show P-values of essentially 0. Hence, we have incredibly strong evidence that there is a relationship between these two variables. Now go back to the previous dialog box, de-select all the Chi-square analysis, and select row and column percentages. Clearly, females are much less likely to drive after 5 or more drinks than are males.

 

Lastly, suppose we’re interested in the relationship between Driven After Drinking Alcohol and Served as a Designated Driver, yet we want to look separately for males and females. For this we can use the “layer” option in the Chi-square dialog box. Choose Driven After Drinking Alcohol as the row variable, Served as a Designated Driver as the column variable, and Sex as the layer variable. First, display only the counts, but then select the appropriate options from the Chi-Square button.

 

Look at the output. Notice the warnings Minitab gives about small expected counts (this is one of the few, perhaps only, places in Minitab where you get this kind of warning). What category of Sex is this for? It’s for the missing data, so it’s not a big deal. Scroll up to see the rest of the output. Separately for males and females, there is strong evidence of a relationship between Driven After Drinking Alcohol and Served as a Designated Driver. Now go back to the previous dialog box, turn off the Chi-square analysis and turn on the row and column percents. How would you describe the nature of the relationship?

 

Important note about multiple comparisons (this applies to your project): Suppose we want to make inference from a data set and we perform 20 chi-square tests. We’d like the family/overall Type I error rate to be 0.05, so this means (using the Bonferroni correction) we should us a (0.05)/20=0.0025 significance level for each individual test.

 

 

 

Project Data Set

As a class, we’ll copy the medicine-disposal survey data from Excel to Minitab. Once we do the copying, we must double-check that we have all the columns and appropriate column headings (we’ll need to make some fixes in class).

 

Some important first steps: