Recall
Minitab is on the campus network, so you can work through this lab handout
using a computer in any lab on campus. Double click on the My Computer
icon on the desktop. Then double click on the campus_share on 'curtis' (U:) drive and then
the Class_Share folder. Finally, double click
on the Math folder and then the math_445
folder. In this folder are three files you need for this lab: Disease.MPJ,
FrenchTest.MPJ, and HousePrices.MPJ. Copy these files into your account and
then double-click on the FrenchTest.MPJ file (this will both open Minitab and
open that particular file).
Description of FrenchTest.MPJ
The
National Endowment for the Humanities sponsors summer institutes to improve the
skills of high school teachers of foreign languages. One such institute hosted
20 French teachers for 4 weeks. At the beginning of the period, the teachers
were given the Modern Language Association’s listening test of understanding
spoken French. After 4 weeks of immersion in French in and out of class, the
listening test was given again. (The actual French spoken in the two tests was
different, so simply taking the first test should not improve the score on the
second test.) The maximum possible score on the test is 36. The pre- and
post-test scores for these 20 teachers are shown in this Minitab project.
Analysis
You
want to determine if this immersion program improves test scores on average.
What is the data set-up? These data are obviously paired. If we let
be the average improvement in test score
(post-test score minus pre-test score) for the population of French teachers
attending the summer institutes, then we want to test
with a one-sided alternative,
.
It’s
always a good idea to look at graphical and numerical summaries of data before performing more complicated
analyses (such as significance tests). Minitab (sometimes) allows you to take
this graphical/numerical look as part of the testing process. From the Stat menu select Basic Statistics>Paired t. Note that “Paired t evaluates the
first sample minus the second sample.” Hence, enter Post-test Score as your first sample and Pre-test Score as your second sample. Then click on the Options button and change the
alternative to “greater than.” (Note that Minitab’s default value for the
null-hypothesized mean is 0, which is what you want.) It’s important to check
the condition of normality (especially since the sample size is small). Hence,
click on the Graphs button and select
both a histogram of the differences and a boxplot of
the differences.
Before
looking at the test results, first look at the plots of the differences. Does
the normality condition seem reasonable? (Yes.)
Now
look at the output—what’s given here? How do you interpret the results? (Remember, you must use your brain—not just
the computer—when selecting the appropriate analysis, checking conditions, and
interpreting the results.) Are the results statistically significant? (Yes.)
Practically significant? (Hmmm. Perhaps not, but we should ask an expert an
increase of 1 on this test is of practical importance.)
Suppose
you wanted to investigate the normality condition additionally through a
normal-quantile plot. Label column 3 “Difference in
Scores.” Then from the Calc menu
select Calculator. Store your result
in the Difference in Scores variable,
and in the expression box subtract the pre-test score from the post-test score.
Now you have a single column of differences. Go to the Graphs menu and select Probability
Plots>Single. Enter Difference in
Scores as your variable. Click on the Distribution
button and you can select the distribution to which you want to compare (note
that a normal distribution is the default). Notice the p-value on the
normal-probability plot is big (which indicates there is not strong-enough evidence
against normality—that is, the normality-condition is plausible for these
data).
A researcher in a large city is currently investigating
a rare disease and wants to compare the proportion of females and males
infected with the disease. The researcher takes a random sample of 500 females
and 550 males and tests them for infection. The data file includes the
following variables:
Female
Disease Incidence
– results for the 500 females (0 – not infected, 1 – infected)
Male
Disease Incidence –
results for the 550 males (0 – not infected, 1 – infected)
The researcher wonders if there is a difference
in the population proportions of males and females who are infected with this
disease. Before you carry out a large-sample inference procedure, you must
check to see if the distribution of the sample proportions is well approximated
by a normal curve (i.e., check if the
sample number of infected and not infected is at least 10 for both samples). Note that Minitab will carry out the
large-sample test whether or not it is appropriate. It is up to you, the user
of Minitab, to check any conditions of the statistical methods. To check
our rule of thumb, go to the Stat
menu and select Tables>Tally
Individual Variables. Enter both the Female Disease Incidence and Male
Disease Incidence variables (the default is to show counts, which is what
we want). Note that all sample counts are greater than 10, so you can proceed
with the large-sample test.
To carry out the test, go to the Stat menu and select Basic Statistics>2 Proportions. When
using Minitab to do inference, you must be aware of how your data are set up in
the worksheet. In this care, there are two separate columns. Hence, click on
the “Samples in Different Columns” circle, and then enter Female Disease
Incidence and Male Disease Incidence as your variables. Now click on
the Options button. Here you can
select your confidence level (95% is the default), the value of the difference
to test in your null hypothesis (0 is always the default, which is the only
situation we’ve discussed), and the direction of the alternative hypothesis
(not equal is the default, which is what we want). Furthermore, there’s an
option to select “Use pooled estimate of p for test.” This is the test we discussed in class (which uses a pooled estimate of
p in the standard error), so select this option.
The results are then printed to the Session
Window. Minitab provides the direction of the differencing, hypotheses,
confidence interval, test statistic, and p-value. It is up to you to interpret
these results. Is the difference statistically significant? (No.) Additionally,
notice the “Fisher’s Exact Test” output in the session window. Fisher’s Exact
Test is a randomization test, and is
not based on the normal distribution. Knowing that both the large-sample test
(condition of normality) and the randomization test (no conditions) agree makes
us feel even better about our analysis.
Description of
HousePrices.MPJ
This
data file contains the selling prices (in thousands of dollars) for a sample of
homes (some 3-bedroom and some 4-bedroom) in
Analysis
On
average, do homes with 4 bedrooms sell for more money than homes with 3
bedrooms? Intuitively, this would seem to be true, but do these data support
this claim? First we should consider the data set-up—this is a two-sample
problem.
Before
performing a significance test, compare the house prices visually. (Again, it’s
good practice to do some simple graphical and numerical summaries of your data,
before moving to more complex analyses.) Boxplots are
a good way to do this initial comparison (Graphs>Boxplot>Multiple Ys>Simple; if you want horizontal,
not vertical, boxplots, click on the Scale button and select the “Transpose
value and category scale” button). Based on these boxplots,
do you think a t-test will discover a significant difference in average selling
prices? (Sometimes we can anticipate the result of a significance test by
looking at simple graphs of the data.)
You
can further assess the normality condition of the t-test by creating histograms
and normal-probability plots of the two sets of data (Graph>Histogram or Graph>Probability
Plot). Does the normality condition seem reasonable for both samples?
(Yes.)
Now
you can conduct the actual 2-sample t-test (Stat>Basic
Statistics>2-Sample t). Note that the samples are in different columns, you
don’t want to assume equal variances, and you want a one-sided alternative.
What do the results tell you? The small p-value shouldn’t surprise you. What’s
valuable from the output is the confidence bound on the difference in average
selling price (definitely seems practically important).
Minitab allows you to perform all the tests
we’ve discussed in class, plus many we haven’t discussed (e.g., nonparametric tests, Chi-square tests). It’s important that you know when to use each test and that you always
check conditions of the test (Minitab doesn’t do this for you). When running a
specific test, it’s important that you understand how your data are listed in
your worksheet (e.g., separate columns, single column). Finally, it’s important
that you understand the output Minitab gives you and that you can provide a
conclusion in layperson’s terms (not using technical statistical language).
Minitab can also perform power calculations (and sample size determination) for all the
tests we’ve discussed (Stat>Power and
Sample Size). Choose one of these procedures from the Power and Sample Size sub-menu. Notice that Minitab lists three
quantities: 1) Sample size, 2) Difference (that you want to be able to detect),
and 3) Power. (The significance level and direction of the alternative are
specified via the Options button.) If
you supply two of these values to Minitab, then it will determine the third.
For example, in a two-sample t-test, if you want 0.90 power of detecting a
difference in means of 5, when the standard deviation is 50, then Minitab
determines that a sample size of 2,103 is needed (based on a two-sided test at
the 0.05 significance level). Verify this example.