Double
click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share
folder. Finally, double click on the Math
folder and then the math_445 folder. What you see in this folder are
(among others) the Minitab files we will use in today’s lab: Cleanliness.MPJ, IronContent, RatWeightGain.MPJ and ReactionTime.MPJ. Copy
these files into your account and then double-click on the RatWeightGain.MPJ
file (this will both open Minitab and open that particular file).
Description of RatWeightGain.MPJ
An
experiment was designed and run to measure the effect of diet type on the
weight gain of rats. Four diets were used: 1 = low-protein beef, 2 =
high-protein beef, 3 = low-protein cereal, and 4 = high-protein cereal. Forty
rats were each randomly assigned to one of these diets (10 rats for each diet).
After a certain period of time, the weight gain (in grams) was recorded for
each rat.
Analysis
For
this one-factor experiment, an obvious question is whether the average weight
gain is the same for all these diets, or if there are some significant
differences between diets, on average. One-factor ANOVA can be used to answer
these questions.
In
the one-factor situation, it’s easy (and important) to first look at the data
descriptively, before doing formal significance testing. First create boxplots of the
weight gains, separately for diet type (Graphs>Boxplot>One Y With Groups, and select Diet as the grouping variable; if you
want horizontal boxplots, click on the Scale button
and check the box for “transpose value and category scales”). How would you
compare these distributions of weight gains? Does the constant-variance condition
seem to be met? Do you think we’ll find significant differences in average
weight gains? Also determine numerical summaries for the weight gains,
separately for diet type (Stat>Basic
Statistics>Display Descriptive Statistics, and select Diet as the “By variable”). By our rule
of thumb (based on sample standard deviations), is the constant-variance condition met?
Now
we’re ready to perform the ANOVA
analysis (we’ll check the normality condition in a moment). From the Stat menu select ANOVA>One-Way (our data set-up is “stacked” not “unstacked”). Weight
Gain is the response variable and Diet
is the factor. Now click on the Graphs
button. Notice you can select boxplots of the data
(that is, you can do your first exploratory look from within the ANOVA
procedure). Recall we use graphs of the residuals to check the normality condition, so select a histogram and normal
plot of the residuals. Now click on the Comparisons
button. If we have evidence against the overall ANOVA hypothesis (that is, against
the average weight gains of all diets being equal), then we will want to make pair-wise comparisons. We can select
these comparisons, and then ignore them if we don’t have evidence of a
difference in means. Select Tukey’s procedure (with
the default family error rate of 5%). Also select Fisher’s procedure—recall
this procedure makes no adjustment for multiple comparisons, and should only be
used to look descriptively at the results (in order to set up another, more
focused, experiment).
First
check the normality condition: do the histogram and normal plot of residuals
indicate that the population of weight gains is plausibly normal? Since this conidtion is reasonable, we can look at the results. First
consider the overall ANOVA test. The p-value is 0.023, which indicates pretty
strong evidence that there is a difference in at least two of the diets (the
result is significant at the 0.05, but not at the more stringent 0.01 level).
Because
the overall ANOVA null hypothesis is rejected, we can now do pair-wise
comparisons. What do Tukey’s 95% simultaneous
confidence intervals tell you? There seems to be a significant difference in
average weight gains between Diet 1 (low-protein beef) and Diet 2 (high-protein
beef). Specifically, rats on Diet 2 seem to gain more weight on average than
rats on Diet 1. Note we also need to consider the practical significance.
If
we simply look at the confidence intervals descriptively, then we can consider
the intervals given by Fisher’s method. Note that these intervals additionally
show a possible difference in average weight gains between Diet 2 and Diets 3
and 4. Perhaps these three diets (or perhaps just Diet 2 and Diet 4) can be
studied again in another experiment (depending on the research questions of
interest).
Suppose an experiment is done where the
response variable is the time (in minutes) for a certain chemical reaction to
occur and the one factor is temperature (60, 80, or 100 degrees Fahrenheit).
The data in this Minitab project are results from this hypothetical experiment.
Before running a formal ANOVA analysis, we
should first look descriptively at our data. Create boxplots
of the reaction times, separately for each temperature setting. Does the
constant-variance condition seem to be met? In what way is it systematically
not met? (Also look at numerical summaries for reaction times, separately for
each temperature setting. Note that our rule of thumb is not met when comparing
the largest sample standard deviation to the smallest sample standard
deviation.)
When the variability in responses between
treatments increases in size as the means increase, a logarithm transformation is often helpful in stabilizing the
variance. Title column 3 in your worksheet “Log Reaction Time.” Then from the Calc menu choose Calculator.
Store your result in Log Reaction Time,
and create an expression which takes the natural logarithm of the Reaction Time variable.
Now create boxplots
of the log reaction times, separately for each temperature setting. Notice that,
using the transformed data, the variability is more similar. Also, get
numerical summaries for log reaction times, separately for each temperature
setting, and note our rule of thumb is now met (just barely).
Use the Stat>ANOVA>One-Way
procedure to perform the ANOVA analysis where Log Reaction Time is the response variable (get appropriate plots
of the residuals and choose Tukey’s multiple
comparisons).
Looking at graphs of the residuals, does the
normality conidtion seem reasonable? Since this condition
seems to be met, now consider the overall F test—is it significant? Since there
is a significant F-test, we can consider the pair-wise comparisons. The results
of Tukey’s procedure indicates a significant
difference in average log reaction times between temperatures 60 and 100
degrees.
Note that
all the inference is on the average of the logarithm reaction times, not of the
reaction times themselves. This makes
interpretation more difficult. We can say there is a significant difference
in average log reaction times between temperatures 60 and 80. Because the
inference is in the log scale, this result might not be informative. If
transforming the data makes interpretation difficult (or meaningless), then
non-parametric methods should be employed (which keep the data in their
original form).
Description of Cleanliness.MPJ
An
experiment was conducted to gauge the effect of both temperature setting and
detergent type on the cleanliness of soiled t-shirts put through a washing
cycle. Eighteen identically soiled t-shirts are randomly assigned to a
treatment (that is, to a combination of temperature setting and detergent).
Three wash-cycle temperatures are used: cold, warm, and hot. Two different
detergents are considered: Detergent A and Detergent B. The response variable
is a cleanliness rating (on a 1–10 scale). Note there are a total of 6
treatments, and since there are 18 t-shirts, we have 3 replications within each
treatment. The numerical results from the experiment are shown in this data file.
Analysis
We
already considered these data in an in-class example, so we know the
conclusions, but now we’ll see how to use Minitab to perform the analysis.
Because
we have two factors, we want to perform a two-way ANOVA analysis. Within Minitab there is a specific two-way
ANOVA procedure, but it doesn’t include all the analyses we want (it doesn’t
include interaction plots and multiple comparisons). Hence, we need to use
the general linear model option (Stat>ANOVA>General Linear Model).
This option includes the capability of two-way ANOVA, as well as more
complicated models (e.g., including covariates in the model).
Choose
Cleanliness Score as your response
variable. Now we need to specifiy our “model.” We
have replications, so we can test for an interaction (which is always good
practice). Hence, our full model includes both factors plus their interaction.
Select both individual factors (Temperature
and Detergent) to be in the model.
Then add an additional term: Temperature*Detergent (Minitab will recognize this
as an interaction—the asterisk denotes an interaction). Now click on the Graphs button and select a normal plot of residuals and residuals
versus fits (we’ll use these graphs to check the normality and
constant-variance conditions). Additionally, click on the Factor Plots button. Notice you can ask for both main effects plots and interaction plots.
Choose Temperature and Detergent as the factors for the
interaction plot (also choose main effects plots for both factors). Finally,
click on the Comparisons button. If
the interaction effect is not significant, yet main effects are, then we want
to do pair-wise comparisons using Tukey’s method.
Select Temperature and Detergent as the terms for the pair-wise comparisons (Tukey’s method is
the default choice). Also, unclick the Test option—we’ll simply use confidence
intervals, as we discussed in class.
We’ll
discuss all the output (basically repeating our class discussion from
yesterday).
Remember the order of
analysis steps:
1) check the conditions (normality
and constant-variance) via plots of the residuals; if the conditions appear to
met then, 2) check if the interaction
effect is significant; if the effect is significant, then interpret the results
using the interaction plot (and possibly an interpretation of main effects, if
appropriate); if the interaction effect is not significant, then 3) check if the main effects are
significant; if either or both main effect is significant then, 4) use Tukey’s
method to see specifically where the significant differences are.
Important note: In the general linear
model procedure, the Adj SS is exactly the same as
the sum of squares we’ve talked about for one- and two-way ANOVA (you can
ignore the Seq SS).
Iron-deficiency anemia is the most common form
of malnutrition in developing countries, affecting about 50% of children and
women and 25% of men. Iron pots for cooking food had traditionally been used in
many of these countries, but they have been largely replaced by aluminum pots,
which are cheaper and lighter. Some research has suggested that food cooked in
iron pots will contain more iron than food cooked in other types of pots. One
study designed to investigate this issue compared the iron content of different
Ethiopian foods (meat, legumes, or vegetables) cooked in aluminum, clay, and
iron pots. The response variable is the iron in the food as measured in
milligrams of iron per 100 grams of cooked food. There are four replications in
each combination of factor levels. The numerical results of the experiment are
shown in this data file.
Note it’s possible to first look at boxplots of the
iron content for the separate
combinations of factors (that is, look descriptively at the data before
performing a formal ANOVA analysis). From the Graph menu select Boxplot>One Y With
Groups. Select Iron Content as
the graph variable and then select both Type
of Pot and Type of Food as the
categorical variables. What do you notice in these plots? Does the
constant-variance condition seem reasonable? (We’ll follow up on this condition
via a residual plot.) Note that each of these boxplots
is only based on 4 observations. What other observations can you make based on
the boxplots?
Perform a two-way ANOVA analysis (using the
general linear model procedure in Minitab), including an interaction effect in
your model.
Do the conditions seem to be met? Normality
seems somewhat plausible, but the constant-variance condition is clearly not
met (notice the big changes in variation within the residual plot). But because
the constant-variance assumption isn’t violated in a systematic way, it may be
difficult to transform the response variable in a way that stabilizes the
variance (log and square root transformations don’t work).
Because a
condition is pretty severely violated, we really shouldn’t use the ANOVA
procedures.
(There are non-parametric versions of ANOVA, but unfortunately, they don’t
offer as much detailed analysis. That said, you can run the non-parametric
version and if the general results are similar to the ANOVA results—for example,
significant main effects—then you can feel better about the ANOVA results and
continue with your analysis.) Simply as example of interpretation, though,
we’ll discuss the rest of the results. In
this case, it seems reasonable to interpret both the significant interaction
and the significant main effects (why?). What does the interaction plot
tell us? Average iron content for food cooked in iron pots was higher than for
food cooked in other pots, but particularly so for meat. And what do the main
effects (and pair-wise comparisons) tell us?