Math 217 Computer Lab: One-Factor and Two-Factor ANOVA Using Minitab

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the Math_217folder. What you see in this folder are (among others) the Minitab files we will use in today’s lab: Cleanliness.MPJ, IronContent, RatWeightGain.MPJ and ReactionTime.MPJ.

 

As a class, we cannot access these share files (only one person can assess them at a time). Thus, you each need to copy the four files to your personal account. You can do this by simply highlighting the files (click on the first one, then ctrl-click on the others—this should highlight them all). Then press Ctrl-C to copy the files. Now open the My Documents folder on the desktop (this is the My Documents folder of your personal account). Once you are in the My Documents folder, hit Ctrl-V to paste the two files into your account.

 

Now open the Minitab software (from the Start menu select Programs>Class Programs and then Minitab>Minitab15). From the File menu choose Open Project and select the RatWeightGain.MPJ file from your documents.

 

Description of RatWeightGain.MPJ

An experiment was designed and run to measure the effect of diet type on the weight gain of rats. Four diets were used: 1 = low-protein beef, 2 = high-protein beef, 3 = low-protein cereal, and 4 = high-protein cereal. Forty rats were each randomly assigned to one of these diets (10 rats for each diet). After a certain period of time, the weight gain (in grams) was recorded for each rat.

 

Analysis

For this one-factor experiment, an obvious question is whether the average weight gain is the same for all these diets, or if there are some significant differences between diets, on average. Obviously, one-factor ANOVA can be used to answer these questions.

 

In the one-factor situation, it’s easy to first look at the data descriptively, before doing formal significance testing. First create boxplots of the weight gains, separately for diet type (Graphs>Boxplot>One Y With Groups, and select Diet as the grouping variable). How would you compare these distributions of weight gains? Does the constant-variance assumption seem to be met? Also determine numerical summaries for the weight gains, separately for diet type (Stat>Basic Statistics>Display Descriptive Statistics, and select Diet as the “By variable”). By our rule of thumb (based on sample standard deviations), is the constant-variance assumption met?

 

Now we’re ready to perform the ANOVA analysis (we’ll check the normality assumption in a moment). From the Stat menu select ANOVA>One-Way (our data set-up is “stacked” not “unstacked”). Weight Gain is the response variable and Diet is the factor. Now click on the Graphs button. Notice you can select boxplots of the data (that is, you can do your first exploratory look from within the ANOVA procedure). Recall we use graphs of the residuals to check the normality assumption, so select a histogram and normal plot of the residuals. Now click on the Comparisons button. If we have evidence against the overall ANOVA hypothesis (that is, against the average weight gains of all diets being equal), then we will want to make pair-wise comparisons. We can select these comparisons, and then ignore them if we don’t have evidence of a difference in means. Select Tukey’s procedure (with the default family error rate of 5%). Also select Fisher’s procedure—recall this procedure makes no adjustment for multiple comparisons, and should only be used to look descriptively at the results (in order to set up another, more focused, experiment).

 

First check the normality assumption: do the histogram and normal plot of residuals indicate that the population of weight gains is plausibly normal? Since this assumption is reasonable, we can look at the results. First consider the overall ANOVA test. The p-value is 0.023, which indicates pretty strong evidence that there is a difference in at least two of the diets (the result is significant at the 0.05, but not at the more stringent 0.01 level).

 

Because the overall ANOVA null hypothesis is rejected, we can now do pair-wise comparisons. What do Tukey’s 95% simultaneous confidence intervals tell you? There seems to be a significant difference in average weight gains between Diet 1 (low-protein beef) and Diet 2 (high-protein beef). Specifically, rats on Diet 2 seem to gain more weight on average than rats on Diet 1. Note we also need to consider the practical significance.

 

If we simply look at the confidence intervals descriptively, then we can consider the intervals given by Fisher’s method. Note that these intervals additionally show a possible difference in average weight gains between Diet 2 and Diets 3 and 4. Perhaps these three diets (or perhaps just Diet 2 and Diet 4) can be studied again in another experiment (depending on the research questions of interest).

 

Description of ReactionTime.MPJ

Suppose an experiment is done where the response variable is the time (in minutes) for a certain chemical reaction to occur and the one factor is temperature (60, 80, or 100 degrees Fahrenheit). The data in this Minitab project are results from this hypothetical experiment.

Analysis

Before running a formal ANOVA analysis, we should first look descriptively at our data. Create boxplots of the reaction times, separately for each temperature setting. Does the constant-variance assumption seem to be met? In what way is it systematically not met? (Also look at numerical summaries for reaction times, separately for each temperature setting. Note that our rule of thumb is not met when comparing the largest sample standard deviation to the smallest sample standard deviation.)

When the variability in responses between treatments increases in size as the means increase, a logarithm transformation is often helpful in stabilizing the variance. Title column 3 in your worksheet “Log Reaction Time.”  Then from the Calc menu choose Calculator. Store your result in Log Reaction Time, and create an expression which takes the natural logarithm of the Reaction Time variable.

Now create boxplots of the log reaction times, separately for each temperature setting. Notice that the variability is now more similar. Also, get numerical summaries for log reaction times, separately for each temperature setting, and note our rule of thumb is now met (just barely).

Use the Stat>ANOVA>One-Way procedure to perform the ANOVA analysis where Log Reaction Time is the response variable (get appropriate plots of the residuals and choose Tukey’s multiple comparisons).

Looking at graphs of the residuals, does the normality assumption seem reasonable? Since this assumption seems to be met, now consider the overall F test—is it significant? Since there is a significant F-test, we can consider the pair-wise comparisons. The results of Tukey’s procedure indicates a significant difference in average log reaction times between temperatures 60 and 100 degrees.

Note that all the inference is on the average of the logarithm reaction times, not of the reaction times themselves. This makes interpretation more difficult. We can say there is a significant difference in average log reaction times between temperatures 60 and 80. Because the inference is in the log scale, this result might not be informative. If transforming the data makes interpretation difficult (or meaningless), then non-parametric methods should be employed (which keep the data in their original form).

 

Description of Cleanliness.MPJ

An experiment was conducted to gauge the effect of both temperature setting and detergent type on the cleanliness of soiled t-shirts put through a washing cycle. Eighteen identically soiled t-shirts are randomly assigned to a treatment (that is, to a combination of temperature setting and detergent). Three wash-cycle temperatures are used: cold, warm, and hot. Two different detergents are considered: Detergent A and Detergent B. The response variable is a cleanliness rating (on a 1–10 scale). Note there are a total of 6 treatments, and since there are 18 t-shirts, we have 3 replications within each treatment. The numerical results from the experiment are shown in this data file.

 

Analysis

We already considered these data in an in-class example, so we know the conclusions, but now we’ll see how to use Minitab to perform the analysis.

 

Because we have two factors, we want to perform a two-way ANOVA analysis. Within Minitab there is a specific two-way ANOVA procedure, but it doesn’t include all the analyses we want (it doesn’t include interaction plots and multiple comparisons). Hence, we need to use the general linear model option (Stat>ANOVA>General Linear Model). This option includes the capability of two-way ANOVA, as well as more complicated models (e.g., including covariates in the model).

 

Choose Cleanliness Score as your response variable. Now we need to specifiy our “model.” We have replications, so we can test for an interaction (which is always good practice). Hence, our full model includes both factors plus their interaction. Select both individual factors (Temperature and Detergent) to be in the model. Then add an additional term: Temperature*Detergent (Minitab will recognize this as an interaction). Now click on the Graphs button and select a normal plot of residuals and residuals versus fits (we’ll use these graphs to check the normality and constant-variance assumptions). Additionally, click on the Factor Plots button. Notice you can ask for both main effects plots and interaction plots. Choose Temperature and Detergent as the factors for the interaction plot (you can also choose main effects plots). Finally, click on the Comparisons button. If the interaction effect is not significant, yet main effects are, then we want to do pair-wise comparisons using Tukey’s method. Select Temperature and Detergent as the terms for the pair-wise comparisons (Tukey’s method is the default choice). Also, unclick the Test option—we’ll simply use confidence intervals, as we discussed in class.

 

Remember the order of analysis steps: 1) check the conditions (normality and constant-variance) via plots of the residuals; if the conditions appear to met then, 2) check if the interaction effect is significant; if the effect is significant, then interpret the results using the interaction plot (and possibly an interpretation of main effects, if appropriate); if the interaction effect is not significant, then 3) check if the main effects are significant; if either or both main effect is significant then, 4) use Tukey’s method to see specifically where the significant differences are.

 

Important note: In the general linear model procedure, the Adj SS is exactly the same as the sum of squares we’ve talked about for one- and two-way ANOVA (you can ignore the Seq SS).

 

 

Description of IronContent.MPJ

Iron-deficiency anemia is the most common form of malnutrition in developing countries, affecting about 50% of children and women and 25% of men. Iron pots for cooking food had traditionally been used in many of these countries, but they have been largely replaced by aluminum pots, which are cheaper and lighter. Some research has suggested that food cooked in iron pots will contain more iron than food cooked in other types of pots. One study designed to investigate this issue compared the iron content of different Ethiopian foods cooked in aluminum, clay, and iron pots. The response variable is the iron in the food as measured in milligrams of iron per 100 grams of cooked food. There are four replications in each combination of factor levels. The numerical results of the experiment are shown in this data file.

Analysis

Note it’s possible to first look at boxplots of the iron content for the separate combinations of factors (that is, look descriptively at the data before performing a formal ANOVA analysis). From the Graph menu select Boxplot>One Y With Groups. Select Iron Content as the graph variable and then select both Type of Pot and Type of Food as the categorical variables. What do you notice in these plots? Does the constant-variance assumption seem reasonable? (We’ll follow up on this assumption via a residual plot.) Note that each of these boxplots is only based on 4 observations.

Perform a two-way ANOVA analysis (using the general linear model procedure in Minitab), including an interaction effect in your model.

Do the assumptions seem to be met? Normality seems somewhat plausible, but the constant-variance assumption is clearly not met (notice the big changes in variation within the residual plot). But because the constant-variance assumption isn’t violated in a systematic way, it may be difficult to transform the response variable in a way that stabilizes the variance (log and square root transformations don’t work).

Because an assumption is pretty severely violated, we really shouldn’t use the ANOVA procedures. Simply as example of interpretation, though, we’ll discuss the rest of the results. In this case, it seems reasonable to interpret both the significant interaction and the significant main effects (why?). What does the interaction plot tell us? Average iron content for food cooked in iron pots was higher than for food cooked in other pots, but particularly so for meat.