Math 117 Computer Lab – Sampling Distributions

Note: In this lab, we will use Minitab rather than SPSS (since Minitab is better for simulation).

 

Normal Approximation Applet

Open Internet Explorer and type in the following URL (this is the textbook website): www.whfreeman.com/ips5e From this web page (under the Student Tools section), select Statistical Applets, and then select Normal Approximation to the Binomial. This applet illustrates the normal approximation to binomial probabilities for different values of n and p. Read the instructions and use the applet to explore the normal approximation for different values of n and p. When is the approximation good and when is it bad? Do your findings agree with our rule of thumb from class?

 

Die Rolling Activity

Open Minitab and label the first two columns Sample1 and Sample2. Roll the die 30 times and record all the rolls (that is, record the up face) in the first column. Take the rolling seriously and do not count a roll when the die is stopped by another object (e.g., when the die hits the computer and stops). After you have finished the first sample, roll the die another 30 times and record all the rolls in the second column.

 

Sampling Distribution of a Sample Proportion

Suppose we consider a “success” to be rolling either a 1 or a 2. Then we are in the binomial setting: B – either (1 or 2) or (not 1 or 2); I – die rolls are independent; N – fixed 30 rolls; S – assuming the die is fair, the probability of a success is always 1/3.

 

We are interested in the sampling distribution of the sample proportion of 1’s and 2’s. In this situation, is it appropriate to use the normal approximation? Yes, because np = 10  10 and n(1 – p) = 20  10. If the die is fair, then the distribution of the sample proportion, , is approximately normal with mean 0.333 and standard error about 0.086 (verify these values).

 

We can simulate the sampling distribution of the sample proportion. Each of you has two samples of size 30. For each of your samples, calculate the sample proportion of 1s and 2s. (In Minitab, from the Stat menu select Tables>Tally and enter Sample1 and Sample2 as the variables. Then select “Cumulative percents” from the Display menu. These percents will appear in the session window. For each sample, find the cumulative percent in the second row. This gives you the sample proportion of 1s and 2s for each sample.)

 

As a class we’ll enter the values of all the sample proportions. Furthermore, we’ll graphically and numerically describe the simulated sampling distribution. Does it look normal? Are the mean and standard deviation close to the theoretical values?

 

Whenever you get confused about the sampling distribution of , think about this exercise ( is a statistic that varies from sample to sample).

 

Sampling Distribution of a Sample Mean

Consider now a population that is defined by the following distribution:

Value of X

1

2

3

4

5

6

Probability

 

The mean for this population is 3.5 and the standard deviation is about 1.708 (verify these). Then by the Central Limit Theorem (since n = 30  30), the sampling distribution of the sample mean, , is approximately normal with mean 3.5 and standard error about 0.312 (verify these).

 

Assuming the dice are fair, each of you has two samples from the above distribution, so we can simulate the sampling distribution of the sample mean. For each of your samples, calculate the sample mean (from the Stat menu select Basic Statistics>Display Descriptive Statistics and enter Sample1 and Sample2 as the variables).

 

As a class we’ll enter the values of all the sample means. Furthermore, we’ll graphically and numerically describe the simulated sampling distribution. Does it look normal? Are the mean and standard deviation close to the theoretical values?

 

Whenever you get confused about the sampling distribution of , think about this exercise ( is a statistic that varies from sample to sample).

 

Generating Sampling Distributions with Minitab

We will consider the uniform distribution. This is the distribution defined on the interval [0, 1] whose density curve is simply a square of height 1. We can have Minitab randomly generate values from this distribution. Open a new worksheet (from the File menu select New>Minitab Worksheet). Label the first column “Uniform Distribution Values.” Then go to the Calc menu and select Random Data>Uniform. Generate 1000 rows of data and store them in the Uniform Distribution Values column. Graphing the values in this column gives us an estimate of what the uniform distribution looks like. Create a histogram of the Uniform Distribution Values variable. (From the Graph menu select Histogram>Simple. Choose Uniform Distribution Values as your “Graph variable.” You can title the graph by clicking on the Labels button.) What does it look like? Since it’s a single sample, it should look like the population from which it came.

 

We will consider the uniform distribution our population, and we’ll simulate repeated sampling from this population. Go back to the Calc menu and select Random Data>Uniform. Again, generate 1000 rows, but now store them in C2-C51. Now let’s think carefully about the data we have. We have 50 columns, each of which contains 1000 random draws from the uniform distribution. But we can also think about the data across rows. That is, we can think of the first row (of columns C2-C51) as a random sample of 50 draws from the uniform distribution. Then we have 1000 samples of size 50 (since we have 1000 rows of data).

 

For each sample of 50, we can calculate the sample mean. This is done by going to the Calc menu and selecting Row Statistics. Select the mean as the statistic. Then highlight variables C2-C51 in the left-hand column and select them to be the “Input variables.” In the “Store result in” box, type “Sample Mean” (Minitab will then label the next open column, C52, “Sample Mean,” and store the results in it—be sure you put quotation marks around the column title). In your worksheet, scroll over to column 52. Each value in this column is a sample mean based on a sample of size 50 from the uniform distribution. Hence, a graph of these values will be an estimate of the sampling distribution of the sample mean. (Recall that this sampling distribution is the distribution of values the sample mean takes in all possible samples of size 50 from the uniform distribution).

 

Create a histogram of the Sample Mean variable. (From the Graph menu select Histogram>Simple. Choose Mean as your “Graph variable.” You can title the graph by clicking on the Labels button.) What is the shape of this distribution? (Note: You can add a normal-curve fit to the graph by right-clicking on the histogram window and selecting Add>Distribution Fit from the options—a normal-curve fit is the default option.) This simulation illustrates the Central Limit Theorem. That is, since the sample size is “large,” the sampling distribution of the sample mean is approximately normal (regardless of the shape of the original population). What a cool and powerful result!