Math 445 Computer Lab: Two-Sample and Paired-Data Inference Using Minitab

 

Getting the Needed Files

Recall Minitab is on the campus network, so you can work through this lab handout using a computer in any lab on campus. Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the math_445 folder. In this folder are three files you need for this lab: Disease.MPJ, FrenchTest.MPJ, and HousePrices.MPJ. Copy these files into your account and then double-click on the FrenchTest.MPJ file (this will both open Minitab and open that particular file).

 

Description of FrenchTest.MPJ

The National Endowment for the Humanities sponsors summer institutes to improve the skills of high school teachers of foreign languages. One such institute hosted 20 French teachers for 4 weeks. At the beginning of the period, the teachers were given the Modern Language Association’s listening test of understanding spoken French. After 4 weeks of immersion in French in and out of class, the listening test was given again. (The actual French spoken in the two tests was different, so simply taking the first test should not improve the score on the second test.) The maximum possible score on the test is 36. The pre- and post-test scores for these 20 teachers are shown in this Minitab project.

 

Analysis

You want to determine if this immersion program improves test scores on average. What is the data set-up? These data are obviously paired. If we let  be the average improvement in test score (post-test score minus pre-test score) for the population of French teachers attending the summer institutes, then we want to test  with a one-sided alternative, .

 

It’s always a good idea to look at graphical and numerical summaries of data before performing more complicated analyses (such as significance tests). Minitab (sometimes) allows you to take this graphical/numerical look as part of the testing process. From the Stat menu select Basic Statistics>Paired t. Note that “Paired t evaluates the first sample minus the second sample.” Hence, enter Post-test Score as your first sample and Pre-test Score as your second sample. Then click on the Options button and change the alternative to “greater than.” (Note that Minitab’s default value for the null-hypothesized mean is 0, which is what you want.) It’s important to check the condition of normality (especially since the sample size is small). Hence, click on the Graphs button and select both a histogram of the differences and a boxplot of the differences.

 

Before looking at the test results, first look at the plots of the differences. Does the normality condition seem reasonable? (Yes.)

 

Now look at the output—what’s given here? How do you interpret the results? (Remember, you must use your brain—not just the computer—when selecting the appropriate analysis, checking conditions, and interpreting the results.) Are the results statistically significant? (Yes.) Practically significant? (Hmmm. Perhaps not, but we should ask an expert an increase of 1 on this test is of practical importance.)

 

Suppose you wanted to investigate the normality condition additionally through a normal-quantile plot. Label column 3 “Difference in Scores.” Then from the Calc menu select Calculator. Store your result in the Difference in Scores variable, and in the expression box subtract the pre-test score from the post-test score. Now you have a single column of differences. Go to the Graphs menu and select Probability Plots>Single. Enter Difference in Scores as your variable. Click on the Distribution button and you can select the distribution to which you want to compare (note that a normal distribution is the default). Notice the p-value on the normal-probability plot is big (which indicates there is not strong-enough evidence against normality—that is, the normality-condition is plausible for these data).

 

Description of Disease.MPJ

A researcher in a large city is currently investigating a rare disease and wants to compare the proportion of females and males infected with the disease. The researcher takes a random sample of 500 females and 550 males and tests them for infection. The data file includes the following variables:

Female Disease Incidence – results for the 500 females (0 – not infected, 1 – infected)

Male Disease Incidence – results for the 550 males (0 – not infected, 1 – infected)

 

Analysis

The researcher wonders if there is a difference in the population proportions of males and females who are infected with this disease. Before you carry out a large-sample inference procedure, you must check to see if the distribution of the sample proportions is well approximated by a normal curve (i.e., check if the sample number of infected and not infected is at least 10 for both samples). Note that Minitab will carry out the large-sample test whether or not it is appropriate. It is up to you, the user of Minitab, to check any conditions of the statistical methods. To check our rule of thumb, go to the Stat menu and select Tables>Tally Individual Variables. Enter both the Female Disease Incidence and Male Disease Incidence variables (the default is to show counts, which is what we want). Note that all sample counts are greater than 10, so you can proceed with the large-sample test.

To carry out the test, go to the Stat menu and select Basic Statistics>2 Proportions. When using Minitab to do inference, you must be aware of how your data are set up in the worksheet. In this care, there are two separate columns. Hence, click on the “Samples in Different Columns” circle, and then enter Female Disease Incidence and Male Disease Incidence as your variables. Now click on the Options button. Here you can select your confidence level (95% is the default), the value of the difference to test in your null hypothesis (0 is always the default, which is the only situation we’ve discussed), and the direction of the alternative hypothesis (not equal is the default, which is what we want). Furthermore, there’s an option to select “Use pooled estimate of p for test.” This is the test we discussed in class (which uses a pooled estimate of p in the standard error), so select this option.

The results are then printed to the Session Window. Minitab provides the direction of the differencing, hypotheses, confidence interval, test statistic, and p-value. It is up to you to interpret these results. Is the difference statistically significant? (No.) Additionally, notice the “Fisher’s Exact Test” output in the session window. Fisher’s Exact Test is a randomization test, and is not based on the normal distribution. Knowing that both the large-sample test (condition of normality) and the randomization test (no conditions) agree makes us feel even better about our analysis.

 

Description of HousePrices.MPJ

This data file contains the selling prices (in thousands of dollars) for a sample of homes (some 3-bedroom and some 4-bedroom) in West Lafayette, Indiana in 2001.

 

Analysis

On average, do homes with 4 bedrooms sell for more money than homes with 3 bedrooms? Intuitively, this would seem to be true, but do these data support this claim? First we should consider the data set-up—this is a two-sample problem.

 

Before performing a significance test, compare the house prices visually. (Again, it’s good practice to do some simple graphical and numerical summaries of your data, before moving to more complex analyses.) Boxplots are a good way to do this initial comparison (Graphs>Boxplot>Multiple Ys>Simple; if you want horizontal, not vertical, boxplots, click on the Scale button and select the “Transpose value and category scale” button). Based on these boxplots, do you think a t-test will discover a significant difference in average selling prices? (Sometimes we can anticipate the result of a significance test by looking at simple graphs of the data.)

 

You can further assess the normality condition of the t-test by creating histograms and normal-probability plots of the two sets of data (Graph>Histogram or Graph>Probability Plot). Does the normality condition seem reasonable for both samples? (Yes.)

 

Now you can conduct the actual 2-sample t-test (Stat>Basic Statistics>2-Sample t). Note that the samples are in different columns, you don’t want to assume equal variances, and you want a one-sided alternative. What do the results tell you? The small p-value shouldn’t surprise you. What’s valuable from the output is the confidence bound on the difference in average selling price (definitely seems practically important).

 

 

More on Inference with Minitab

Minitab allows you to perform all the tests we’ve discussed in class, plus many we haven’t discussed (e.g., nonparametric tests, Chi-square tests). It’s important that you know when to use each test and that you always check conditions of the test (Minitab doesn’t do this for you). When running a specific test, it’s important that you understand how your data are listed in your worksheet (e.g., separate columns, single column). Finally, it’s important that you understand the output Minitab gives you and that you can provide a conclusion in layperson’s terms (not using technical statistical language).

Minitab can also perform power calculations (and sample size determination) for all the tests we’ve discussed (Stat>Power and Sample Size). Choose one of these procedures from the Power and Sample Size sub-menu. Notice that Minitab lists three quantities: 1) Sample size, 2) Difference (that you want to be able to detect), and 3) Power. (The significance level and direction of the alternative are specified via the Options button.) If you supply two of these values to Minitab, then it will determine the third. For example, in a two-sample t-test, if you want 0.90 power of detecting a difference in means of 5, when the standard deviation is 50, then Minitab determines that a sample size of 2,103 is needed (based on a two-sided test at the 0.05 significance level). Verify this example.