Note: In this lab, we will use Minitab
rather than SPSS (since Minitab is better for simulation).
Open Internet
Explorer and type in the following URL (this is the textbook website):
www.whfreeman.com/ips5e From this web page (under the Student Tools section),
select Statistical Applets, and then select Normal Approximation to
the Binomial. This applet illustrates the normal approximation to binomial
probabilities for different values of n
and p. Read the instructions and use
the applet to explore the normal approximation for different values of n and p. When is the approximation good and when is it bad? Do your
findings agree with our rule of thumb from class?
Open Minitab
and label the first two columns Sample1 and Sample2. Roll the die
30 times and record all the rolls (that is, record the up face) in the first
column. Take the rolling seriously and do not count a roll when the die is
stopped by another object (e.g., when
the die hits the computer and stops). After you have finished the first sample,
roll the die another 30 times and record all the rolls in the second column.
Suppose we
consider a “success” to be rolling either a 1 or a 2. Then we are in the
binomial setting: B – either (1 or 2) or (not 1 or 2); I – die rolls are
independent; N – fixed 30 rolls; S – assuming the die is fair, the probability
of a success is always 1/3.
We are
interested in the sampling distribution of the sample proportion of 1’s and
2’s. In this situation, is it appropriate to use the normal approximation? Yes,
because np = 10
10 and n(1 –
p) = 20
10. If the die
is fair, then the distribution of the sample proportion,
, is approximately normal with mean
0.333 and standard error about 0.086 (verify these values).
We can simulate
the sampling distribution of the sample proportion. Each of you has two samples
of size 30. For each of your samples, calculate the sample proportion of 1s and
2s. (In Minitab, from the Stat menu
select Tables>Tally and enter
Sample1 and Sample2 as the variables. Then select “Cumulative percents” from
the Display menu. These percents will appear in the session window. For each
sample, find the cumulative percent in the second row. This gives you the
sample proportion of 1s and 2s for each sample.)
As a class we’ll
enter the values of all the sample proportions. Furthermore, we’ll graphically
and numerically describe the simulated sampling distribution. Does it look
normal? Are the mean and standard deviation close to the theoretical values?
Whenever you
get confused about the sampling distribution of
, think about this exercise (
is
a statistic that varies from sample to sample).
Consider now a
population that is defined by the following distribution:
|
Value of X |
1 |
2 |
3 |
4 |
5 |
6 |
|
Probability |
|
|
|
|
|
|
The mean for
this population is 3.5 and the standard deviation is about 1.708 (verify
these). Then by the Central Limit Theorem (since n = 30
30), the
sampling distribution of the sample mean,
, is approximately normal with mean 3.5
and standard error about 0.312 (verify these).
Assuming the
dice are fair, each of you has two samples from the above distribution, so we
can simulate the sampling distribution of the sample mean. For each of your
samples, calculate the sample mean (from the Stat menu select Basic
Statistics>Display Descriptive Statistics and enter Sample1 and Sample2
as the variables).
As a class we’ll
enter the values of all the sample means. Furthermore, we’ll graphically and
numerically describe the simulated sampling distribution. Does it look normal?
Are the mean and standard deviation close to the theoretical values?
Whenever you
get confused about the sampling distribution of
, think about this exercise (
is a statistic that varies from sample to sample).
Generating Sampling Distributions with
Minitab
We will
consider the uniform distribution. This is the distribution defined on the
interval [0, 1] whose density curve is simply a square of height 1. We can have
Minitab randomly generate values from this distribution. Open a new worksheet
(from the File menu select New>Minitab Worksheet). Label the
first column “Uniform Distribution Values.” Then go to the Calc menu and select Random
Data>Uniform. Generate 1000 rows of data and store them in the Uniform Distribution Values column.
Graphing the values in this column gives us an estimate of what the uniform
distribution looks like. Create a histogram of the Uniform Distribution Values variable. (From the Graph menu select Histogram>Simple. Choose Uniform
Distribution Values as your “Graph variable.” You can title the graph by
clicking on the Labels button.) What
does it look like? Since it’s a single sample, it should look like the
population from which it came.
We will
consider the uniform distribution our population, and we’ll simulate repeated
sampling from this population. Go back to the Calc menu and select Random
Data>Uniform. Again, generate 1000 rows, but now store them in C2-C51.
Now let’s think carefully about the data we have. We have 50 columns, each of
which contains 1000 random draws from the uniform distribution. But we can also
think about the data across rows. That is, we can think of the first row (of
columns C2-C51) as a random sample of 50 draws from the uniform distribution. Then we have 1000 samples of size 50 (since
we have 1000 rows of data).
For each sample
of 50, we can calculate the sample mean. This is done by going to the Calc menu and selecting Row Statistics. Select the mean as the
statistic. Then highlight variables C2-C51 in the left-hand column and select
them to be the “Input variables.” In the “Store result in” box, type “Sample Mean”
(Minitab will then label the next open column, C52, “Sample Mean,” and store
the results in it—be sure you put quotation marks around the column title). In
your worksheet, scroll over to column 52. Each value in this column is a sample
mean based on a sample of size 50 from the uniform distribution. Hence, a graph
of these values will be an estimate of the sampling distribution of the sample
mean. (Recall that this sampling distribution is the distribution of values the
sample mean takes in all possible samples of size 50 from the uniform
distribution).
Create a
histogram of the Sample Mean
variable. (From the Graph menu select
Histogram>Simple. Choose Mean as your “Graph variable.” You can
title the graph by clicking on the Labels
button.) What is the shape of this distribution? (Note: You can add a
normal-curve fit to the graph by right-clicking on the histogram window and
selecting Add>Distribution Fit from
the options—a normal-curve fit is the default option.) This simulation
illustrates the Central Limit Theorem. That is, since the sample size is “large,” the sampling distribution of the
sample mean is approximately normal (regardless of the shape of the original
population). What a cool and powerful result!