Math 445 Computer Lab: One-Factor and Two-Factor ANOVA Using Minitab

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the math_445 folder. What you see in this folder are (among others) the Minitab files we will use in today’s lab: Cleanliness.MPJ, IronContent, RatWeightGain.MPJ and ReactionTime.MPJ. Copy these files into your account and then double-click on the RatWeightGain.MPJ file (this will both open Minitab and open that particular file).

 

 

Description of RatWeightGain.MPJ

An experiment was designed and run to measure the effect of diet type on the weight gain of rats. Four diets were used: 1 = low-protein beef, 2 = high-protein beef, 3 = low-protein cereal, and 4 = high-protein cereal. Forty rats were each randomly assigned to one of these diets (10 rats for each diet). After a certain period of time, the weight gain (in grams) was recorded for each rat.

 

Analysis

For this one-factor experiment, an obvious question is whether the average weight gain is the same for all these diets, or if there are some significant differences between diets, on average. One-factor ANOVA can be used to answer these questions.

 

In the one-factor situation, it’s easy (and important) to first look at the data descriptively, before doing formal significance testing. First create boxplots of the weight gains, separately for diet type (Graphs>Boxplot>One Y With Groups, and select Diet as the grouping variable; if you want horizontal boxplots, click on the Scale button and check the box for “transpose value and category scales”). How would you compare these distributions of weight gains? Does the constant-variance condition seem to be met? Do you think we’ll find significant differences in average weight gains? Also determine numerical summaries for the weight gains, separately for diet type (Stat>Basic Statistics>Display Descriptive Statistics, and select Diet as the “By variable”). By our rule of thumb (based on sample standard deviations), is the constant-variance condition met?

 

Now we’re ready to perform the ANOVA analysis (we’ll check the normality condition in a moment). From the Stat menu select ANOVA>One-Way (our data set-up is “stacked” not “unstacked”). Weight Gain is the response variable and Diet is the factor. Now click on the Graphs button. Notice you can select boxplots of the data (that is, you can do your first exploratory look from within the ANOVA procedure). Recall we use graphs of the residuals to check the normality condition, so select a histogram and normal plot of the residuals. Now click on the Comparisons button. If we have evidence against the overall ANOVA hypothesis (that is, against the average weight gains of all diets being equal), then we will want to make pair-wise comparisons. We can select these comparisons, and then ignore them if we don’t have evidence of a difference in means. Select Tukey’s procedure (with the default family error rate of 5%). Also select Fisher’s procedure—recall this procedure makes no adjustment for multiple comparisons, and should only be used to look descriptively at the results (in order to set up another, more focused, experiment).

 

First check the normality condition: do the histogram and normal plot of residuals indicate that the population of weight gains is plausibly normal? Since this conidtion is reasonable, we can look at the results. First consider the overall ANOVA test. The p-value is 0.023, which indicates pretty strong evidence that there is a difference in at least two of the diets (the result is significant at the 0.05, but not at the more stringent 0.01 level).

 

Because the overall ANOVA null hypothesis is rejected, we can now do pair-wise comparisons. What do Tukey’s 95% simultaneous confidence intervals tell you? There seems to be a significant difference in average weight gains between Diet 1 (low-protein beef) and Diet 2 (high-protein beef). Specifically, rats on Diet 2 seem to gain more weight on average than rats on Diet 1. Note we also need to consider the practical significance.

 

If we simply look at the confidence intervals descriptively, then we can consider the intervals given by Fisher’s method. Note that these intervals additionally show a possible difference in average weight gains between Diet 2 and Diets 3 and 4. Perhaps these three diets (or perhaps just Diet 2 and Diet 4) can be studied again in another experiment (depending on the research questions of interest).

 

Description of ReactionTime.MPJ

Suppose an experiment is done where the response variable is the time (in minutes) for a certain chemical reaction to occur and the one factor is temperature (60, 80, or 100 degrees Fahrenheit). The data in this Minitab project are results from this hypothetical experiment.

Analysis

Before running a formal ANOVA analysis, we should first look descriptively at our data. Create boxplots of the reaction times, separately for each temperature setting. Does the constant-variance condition seem to be met? In what way is it systematically not met? (Also look at numerical summaries for reaction times, separately for each temperature setting. Note that our rule of thumb is not met when comparing the largest sample standard deviation to the smallest sample standard deviation.)

When the variability in responses between treatments increases in size as the means increase, a logarithm transformation is often helpful in stabilizing the variance. Title column 3 in your worksheet “Log Reaction Time.”  Then from the Calc menu choose Calculator. Store your result in Log Reaction Time, and create an expression which takes the natural logarithm of the Reaction Time variable.

Now create boxplots of the log reaction times, separately for each temperature setting. Notice that, using the transformed data, the variability is more similar. Also, get numerical summaries for log reaction times, separately for each temperature setting, and note our rule of thumb is now met (just barely).

Use the Stat>ANOVA>One-Way procedure to perform the ANOVA analysis where Log Reaction Time is the response variable (get appropriate plots of the residuals and choose Tukey’s multiple comparisons).

Looking at graphs of the residuals, does the normality conidtion seem reasonable? Since this condition seems to be met, now consider the overall F test—is it significant? Since there is a significant F-test, we can consider the pair-wise comparisons. The results of Tukey’s procedure indicates a significant difference in average log reaction times between temperatures 60 and 100 degrees.

Note that all the inference is on the average of the logarithm reaction times, not of the reaction times themselves. This makes interpretation more difficult. We can say there is a significant difference in average log reaction times between temperatures 60 and 80. Because the inference is in the log scale, this result might not be informative. If transforming the data makes interpretation difficult (or meaningless), then non-parametric methods should be employed (which keep the data in their original form).

 

 

Description of Cleanliness.MPJ

An experiment was conducted to gauge the effect of both temperature setting and detergent type on the cleanliness of soiled t-shirts put through a washing cycle. Eighteen identically soiled t-shirts are randomly assigned to a treatment (that is, to a combination of temperature setting and detergent). Three wash-cycle temperatures are used: cold, warm, and hot. Two different detergents are considered: Detergent A and Detergent B. The response variable is a cleanliness rating (on a 1–10 scale). Note there are a total of 6 treatments, and since there are 18 t-shirts, we have 3 replications within each treatment. The numerical results from the experiment are shown in this data file.

 

Analysis

We already considered these data in an in-class example, so we know the conclusions, but now we’ll see how to use Minitab to perform the analysis.

 

Because we have two factors, we want to perform a two-way ANOVA analysis. Within Minitab there is a specific two-way ANOVA procedure, but it doesn’t include all the analyses we want (it doesn’t include interaction plots and multiple comparisons). Hence, we need to use the general linear model option (Stat>ANOVA>General Linear Model). This option includes the capability of two-way ANOVA, as well as more complicated models (e.g., including covariates in the model).

 

Choose Cleanliness Score as your response variable. Now we need to specifiy our “model.” We have replications, so we can test for an interaction (which is always good practice). Hence, our full model includes both factors plus their interaction. Select both individual factors (Temperature and Detergent) to be in the model. Then add an additional term: Temperature*Detergent (Minitab will recognize this as an interaction—the asterisk denotes an interaction). Now click on the Graphs button and select a normal plot of residuals and residuals versus fits (we’ll use these graphs to check the normality and constant-variance conditions). Additionally, click on the Factor Plots button. Notice you can ask for both main effects plots and interaction plots. Choose Temperature and Detergent as the factors for the interaction plot (also choose main effects plots for both factors). Finally, click on the Comparisons button. If the interaction effect is not significant, yet main effects are, then we want to do pair-wise comparisons using Tukey’s method. Select Temperature and Detergent as the terms for the pair-wise comparisons (Tukey’s method is the default choice). Also, unclick the Test option—we’ll simply use confidence intervals, as we discussed in class.

 

We’ll discuss all the output (basically repeating our class discussion from yesterday).

 


 

Remember the order of analysis steps: 1) check the conditions (normality and constant-variance) via plots of the residuals; if the conditions appear to met then, 2) check if the interaction effect is significant; if the effect is significant, then interpret the results using the interaction plot (and possibly an interpretation of main effects, if appropriate); if the interaction effect is not significant, then 3) check if the main effects are significant; if either or both main effect is significant then, 4) use Tukey’s method to see specifically where the significant differences are.

 

Important note: In the general linear model procedure, the Adj SS is exactly the same as the sum of squares we’ve talked about for one- and two-way ANOVA (you can ignore the Seq SS).

 

 

Description of IronContent.MPJ

Iron-deficiency anemia is the most common form of malnutrition in developing countries, affecting about 50% of children and women and 25% of men. Iron pots for cooking food had traditionally been used in many of these countries, but they have been largely replaced by aluminum pots, which are cheaper and lighter. Some research has suggested that food cooked in iron pots will contain more iron than food cooked in other types of pots. One study designed to investigate this issue compared the iron content of different Ethiopian foods (meat, legumes, or vegetables) cooked in aluminum, clay, and iron pots. The response variable is the iron in the food as measured in milligrams of iron per 100 grams of cooked food. There are four replications in each combination of factor levels. The numerical results of the experiment are shown in this data file.

Analysis

Note it’s possible to first look at boxplots of the iron content for the separate combinations of factors (that is, look descriptively at the data before performing a formal ANOVA analysis). From the Graph menu select Boxplot>One Y With Groups. Select Iron Content as the graph variable and then select both Type of Pot and Type of Food as the categorical variables. What do you notice in these plots? Does the constant-variance condition seem reasonable? (We’ll follow up on this condition via a residual plot.) Note that each of these boxplots is only based on 4 observations. What other observations can you make based on the boxplots?

Perform a two-way ANOVA analysis (using the general linear model procedure in Minitab), including an interaction effect in your model.

Do the conditions seem to be met? Normality seems somewhat plausible, but the constant-variance condition is clearly not met (notice the big changes in variation within the residual plot). But because the constant-variance assumption isn’t violated in a systematic way, it may be difficult to transform the response variable in a way that stabilizes the variance (log and square root transformations don’t work).

Because a condition is pretty severely violated, we really shouldn’t use the ANOVA procedures. (There are non-parametric versions of ANOVA, but unfortunately, they don’t offer as much detailed analysis. That said, you can run the non-parametric version and if the general results are similar to the ANOVA results—for example, significant main effects—then you can feel better about the ANOVA results and continue with your analysis.) Simply as example of interpretation, though, we’ll discuss the rest of the results. In this case, it seems reasonable to interpret both the significant interaction and the significant main effects (why?). What does the interaction plot tell us? Average iron content for food cooked in iron pots was higher than for food cooked in other pots, but particularly so for meat. And what do the main effects (and pair-wise comparisons) tell us?