Double
click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share
folder. Finally, double click on the Math
folder and then the Math_217folder. What you see in this folder are
(among others) the Minitab files we will use in today’s lab: Cleanliness.MPJ, IronContent, RatWeightGain.MPJ and ReactionTime.MPJ.
As a
class, we cannot access these share files (only one person can assess them at a
time). Thus, you each need to copy the four files to your personal account. You
can do this by simply highlighting the files (click on the first one, then ctrl-click
on the others—this should highlight them all). Then press Ctrl-C to copy the
files. Now open the My Documents folder on the desktop (this is the My
Documents folder of your personal account). Once you are in the My
Documents folder, hit Ctrl-V to paste the two files into your account.
Now open
the Minitab software (from the Start menu select Programs>Class
Programs and then Minitab>Minitab15). From the File menu choose Open Project and select the RatWeightGain.MPJ
file from your documents.
Description of RatWeightGain.MPJ
An
experiment was designed and run to measure the effect of diet type on the
weight gain of rats. Four diets were used: 1 = low-protein beef, 2 =
high-protein beef, 3 = low-protein cereal, and 4 =
high-protein cereal. Forty rats were each randomly assigned to one of these
diets (10 rats for each diet). After a certain period of time, the weight gain
(in grams) was recorded for each rat.
Analysis
For this
one-factor experiment, an obvious question is whether the average weight gain
is the same for all these diets, or if there are some significant differences
between diets, on average. Obviously, one-factor ANOVA can be used to answer
these questions.
In the
one-factor situation, it’s easy to first look at the data descriptively, before
doing formal significance testing. First create boxplots
of the weight gains, separately for diet type (Graphs>Boxplot>One Y With
Groups, and select Diet as the
grouping variable). How would you compare these distributions of weight gains?
Does the constant-variance assumption seem to be met? Also determine numerical
summaries for the weight gains, separately for diet type (Stat>Basic Statistics>Display Descriptive Statistics, and
select Diet as the “By variable”). By
our rule of thumb (based on sample standard deviations), is the constant-variance assumption met?
Now
we’re ready to perform the ANOVA
analysis (we’ll check the normality assumption in a moment). From the Stat menu select ANOVA>One-Way (our data set-up is “stacked” not “unstacked”). Weight
Gain is the response variable and Diet
is the factor. Now click on the Graphs
button. Notice you can select boxplots of the data
(that is, you can do your first exploratory look from within the ANOVA
procedure). Recall we use graphs of the residuals to check the normality assumption, so select a histogram and normal
plot of the residuals. Now click on the Comparisons
button. If we have evidence against the overall ANOVA hypothesis (that is,
against the average weight gains of all diets being equal), then we will want
to make pair-wise comparisons. We
can select these comparisons, and then ignore them if we don’t have evidence of
a difference in means. Select Tukey’s procedure (with
the default family error rate of 5%). Also select Fisher’s procedure—recall
this procedure makes no adjustment for multiple comparisons, and should only be
used to look descriptively at the results (in order to set up another, more
focused, experiment).
First
check the normality assumption: do the histogram and normal plot of residuals
indicate that the population of weight gains is plausibly normal? Since this
assumption is reasonable, we can look at the results. First consider the
overall ANOVA test. The p-value is 0.023, which indicates pretty strong
evidence that there is a difference in at least two of the diets (the result is
significant at the 0.05, but not at the more stringent 0.01 level).
Because
the overall ANOVA null hypothesis is rejected, we can now do pair-wise
comparisons. What do Tukey’s 95% simultaneous
confidence intervals tell you? There seems to be a significant difference in
average weight gains between Diet 1 (low-protein beef) and Diet 2 (high-protein
beef). Specifically, rats on Diet 2 seem to gain more weight on average than
rats on Diet 1. Note we also need to consider the practical significance.
If we
simply look at the confidence intervals descriptively, then we can consider the
intervals given by Fisher’s method. Note that these intervals additionally show
a possible difference in average weight gains between Diet 2 and Diets 3 and 4.
Perhaps these three diets (or perhaps just Diet 2 and Diet 4) can be studied
again in another experiment (depending on the research questions of interest).
Suppose an experiment is done where the response variable
is the time (in minutes) for a certain chemical reaction to occur and the one
factor is temperature (60, 80, or 100 degrees Fahrenheit). The data in this
Minitab project are results from this hypothetical experiment.
Before running a formal ANOVA analysis, we should first
look descriptively at our data. Create boxplots of
the reaction times, separately for each temperature setting. Does the
constant-variance assumption seem to be met? In what way is it systematically
not met? (Also look at numerical summaries for reaction times, separately for
each temperature setting. Note that our rule of thumb is not met when comparing
the largest sample standard deviation to the smallest sample standard
deviation.)
When the variability in responses between treatments
increases in size as the means increase, a logarithm
transformation is often helpful in stabilizing the variance. Title column 3 in your worksheet “Log Reaction Time.” Then from the Calc menu choose Calculator.
Store your result in Log Reaction Time,
and create an expression which takes the natural logarithm of the Reaction Time variable.
Now create boxplots of the log
reaction times, separately for each temperature setting. Notice that the
variability is now more similar. Also, get numerical summaries for log reaction
times, separately for each temperature setting, and note our rule of thumb is
now met (just barely).
Use the Stat>ANOVA>One-Way
procedure to perform the ANOVA analysis where Log Reaction Time is the response variable (get appropriate plots
of the residuals and choose Tukey’s multiple
comparisons).
Looking at graphs of the residuals, does the normality
assumption seem reasonable? Since this assumption seems to be met, now consider
the overall F test—is it significant? Since there is a significant F-test, we
can consider the pair-wise comparisons. The results of Tukey’s procedure indicates a significant difference
in average log reaction times between temperatures 60 and 100 degrees.
Note that all the inference is on the average of the
logarithm reaction times, not of the reaction times themselves. This makes interpretation more difficult.
We can say there is a significant difference in average log reaction times
between temperatures 60 and 80. Because the inference is in the log scale, this
result might not be informative. If transforming the data makes interpretation
difficult (or meaningless), then non-parametric methods should be employed
(which keep the data in their original form).
Description of Cleanliness.MPJ
An
experiment was conducted to gauge the effect of both temperature setting and
detergent type on the cleanliness of soiled t-shirts put through a washing
cycle. Eighteen identically soiled t-shirts are randomly assigned to a
treatment (that is, to a combination of temperature setting and detergent).
Three wash-cycle temperatures are used: cold, warm, and hot. Two different
detergents are considered: Detergent A and Detergent B. The response variable
is a cleanliness rating (on a 1–10 scale). Note there are a total of 6
treatments, and since there are 18 t-shirts, we have 3 replications within each
treatment. The numerical results from the experiment are shown in this data
file.
Analysis
We
already considered these data in an in-class example, so we know the
conclusions, but now we’ll see how to use Minitab to perform the analysis.
Because
we have two factors, we want to perform a two-way ANOVA analysis. Within
Minitab there is a specific two-way ANOVA procedure, but it doesn’t include all
the analyses we want (it doesn’t include interaction plots and multiple
comparisons). Hence, we need to use the general
linear model option (Stat>ANOVA>General
Linear Model). This option includes the capability of two-way ANOVA, as
well as more complicated models (e.g., including covariates in the model).
Choose Cleanliness Score as your response
variable. Now we need to specifiy our “model.” We
have replications, so we can test for an interaction (which is always good
practice). Hence, our full model includes both factors plus their interaction.
Select both individual factors (Temperature
and Detergent) to be in the model.
Then add an additional term: Temperature*Detergent (Minitab will recognize this
as an interaction). Now click on the Graphs
button and select a normal plot of residuals and residuals versus fits (we’ll
use these graphs to check the normality and constant-variance assumptions).
Additionally, click on the Factor Plots
button. Notice you can ask for both main effects plots and interaction plots.
Choose Temperature and Detergent as the factors for the
interaction plot (you can also choose main effects plots). Finally, click on
the Comparisons button. If the
interaction effect is not significant, yet main effects are, then we want to do
pair-wise comparisons using Tukey’s method. Select
Temperature and Detergent as the terms for the pair-wise comparisons (Tukey’s method is the default choice). Also, unclick the
Test option—we’ll simply use confidence intervals, as we discussed in class.
Remember the order of analysis
steps: 1) check
the conditions (normality and constant-variance) via plots of the residuals; if
the conditions appear to met then, 2) check if the interaction effect is
significant; if the effect is significant, then interpret the results using the
interaction plot (and possibly an interpretation of main effects, if
appropriate); if the interaction effect is not significant, then 3) check if
the main effects are significant; if either or both main effect is significant
then, 4) use Tukey’s method to see specifically where
the significant differences are.
Important note: In the general linear model
procedure, the Adj SS is exactly the same as the sum
of squares we’ve talked about for one- and two-way ANOVA (you can ignore the Seq SS).
Iron-deficiency anemia is the most common form of
malnutrition in developing countries, affecting about 50% of children and women
and 25% of men. Iron pots for cooking food had traditionally been used in many
of these countries, but they have been largely replaced by aluminum pots, which
are cheaper and lighter. Some research has suggested that food cooked in iron
pots will contain more iron than food cooked in other types of pots. One study
designed to investigate this issue compared the iron content of different
Ethiopian foods cooked in aluminum, clay, and iron pots. The response variable
is the iron in the food as measured in milligrams of iron per 100 grams of
cooked food. There are four replications in each combination of factor levels.
The numerical results of the experiment are shown in this data file.
Note it’s possible to first look at boxplots of the iron content for
the separate combinations of factors (that is, look descriptively at the data
before performing a formal ANOVA analysis). From the Graph menu select Boxplot>One Y With Groups. Select Iron
Content as the graph variable and then select both Type of Pot and Type of Food
as the categorical variables. What do you notice in these plots? Does the
constant-variance assumption seem reasonable? (We’ll follow up on this
assumption via a residual plot.) Note that each of these boxplots
is only based on 4 observations.
Perform a two-way ANOVA analysis (using the general
linear model procedure in Minitab), including an interaction effect in your
model.
Do the assumptions seem to be met? Normality seems
somewhat plausible, but the constant-variance assumption is clearly not met
(notice the big changes in variation within the residual plot). But because the
constant-variance assumption isn’t violated in a systematic way, it may be
difficult to transform the response variable in a way that stabilizes the
variance (log and square root transformations don’t work).
Because an assumption is pretty severely violated, we
really shouldn’t use the ANOVA procedures. Simply as example of interpretation,
though, we’ll discuss the rest of the results. In this case, it seems reasonable to interpret both the significant
interaction and the significant main effects (why?). What does the interaction plot tell us? Average iron
content for food cooked in iron pots was higher than for food cooked in other
pots, but particularly so for meat.