Math 207—Summary of Small-Sample Inference

 

In the large-sample settings, we estimate the unknown population standard deviation, , with the sample standard deviation, , and we don’t worry much about the error of estimation (since our estimate is based on a large sample). In the small-sample setting, we can no longer ignore the additional variability that comes from estimating  with . To appropriately accommodate for this added variability, we use the t-distribution when determining multipliers for a confidence interval or the p-value of a significance test. If we inappropriately use the z-distribution when we should use the t-distribution, it’s possible, for example, to declare results statistically significant even though they aren’t.

 

Small-Sample Inference for a Population Mean

Suppose we have a small (), random sample from a normally-distributed population with unknown mean, , and with unknown standard deviation.

 

Before using the t-procedures for inference, we must check the condition that the population values follow a normal distribution. We can estimate the population distribution using an appropriate graph of our sample data values. If the distribution of our sample looks mound-shaped, then the condition has been met, and we can continue with our analysis. If the sample-data distribution deviates slightly from normality, then we can still use the t-procedures (these procedures are “robust” in that the probability calculations required are insensitive to small violations of the required conditions). But if the sample-data distribution looks very non-normal, then the t-procedures should not be used. (Non-parametric inference can be used in these situations—a topic not covered in this course, but one you can read about on your own.)

 

Then a level  confidence interval for  is  , where  is the t-value corresponding to an area  in the upper tail of the t-distribution with  degrees of freedom.

Recall our “confidence” is in the method we use to create this interval, not in our one particular interval. The method is “correct” (i.e., contains the actual population mean value)  of the time.

 

If we want to test the null hypothesis, . Then we first calculate the test statistic, , which tells us the number of standard errors our particular sample average is from the null-hypothesized population mean. Then we use the t-distribution with  degrees of freedom determine the approximate p-value (recall the p-value depends on the direction of the alternative hypothesis).

 

 

Finally, we define and interpret the p-value in the context of the problem and provide a conclusion (which might depend on a given significance level, ). And, if the results are statistically significant, we also consider the practical significance.

 

Relationship between Confidence Interval and Significance Test:

A level , two-sided significance test rejects the hypothesis   exactly when the value  falls outside the   confidence interval for. (Put another way, the significance test does not reject the hypothesis if the value  falls inside the corresponding confidence interval.)

 

 

 

Paired-t (or z) Inference

Suppose we have a matched-pairs experimental design, where each experimental unit receives two treatments (i.e., each unit serves as its own control). Let  be the mean of the population of differences in responses to the two treatments. To test  (always the null hypothesis) or to find a confidence interval for , we 1) compute the differences for our sample, 2) determine  and  for the differences, and 3) use the one-sample t-procedures (if normality condition is met) on the differences (or use the one-sample z-procedures if the sample size is large).

 


 

Small-Sample Inference for a Difference in Population Means

Suppose we have two distinct normally-distributed populations with unknown means,  and , and unknown standard deviations (the standard deviations are unknown, but we assume they are the same). Furthermore, suppose we have small, independent random samples from each population.

 

Before using the t-procedures for inference, we must check the conditions that 1) each set of population values follows a normal distribution, and 2) the population variances are the same.

·         We can check the normality condition by looking at graphs of the two sets of sample data (recall the t-procedures are generally “robust,” but if the sample-data distributions look very non-normal, then the t-procedures should not be used).

 

·         As a rule-of-thumb, if  , then the equal-variances condition is violated and we should not use these procedures. (There is a test that doesn’t have an equal-variances condition. In fact, this test is used most often in practice. This alternative test is also based on the t-distribution, but the degrees of freedom are grungy to calculate—easy for a computer to do, but difficult for you to do by hand.)

 

Then a level  confidence interval for  is  , where  is the t-value corresponding to an area  in the upper tail of the t-distribution with  degrees of freedom, and   is the “pooled” estimate of the common variance, .

 

Recall our “confidence” is in the method we use to create this interval, not in our one particular interval. The method is “correct” (i.e., contains the actual population mean value)  of the time.

 

If we want to test the null hypothesis,  (this will always be the default null hypothesis). Then we first calculate the test statistic, , which tells us the number of standard errors our particular difference in sample averages is from the null-hypothesized difference in population means. Then we use the t-distribution with  degrees of freedom determine the approximate p-value (recall the p-value depends on the direction of the alternative hypothesis).

 

 

Finally, we define and interpret the p-value in the context of the problem and provide a conclusion (which might depend on a given significance level, ). And, if the results are statistically significant, we also consider the practical significance.

 

Relationship between Confidence Interval and Significance Test:

A level , two-sided significance test rejects the hypothesis   exactly when the value 0 falls outside the   confidence interval for. (Put another way, the significance test does not reject the hypothesis if the value 0 falls inside the corresponding confidence interval.)

 

 

Final Remarks

·         Small-sample inference in the binomial setting can also be done. Instead of using the z-distribution (as in the large-sample situation), the binomial probability distribution is used.

 

·         With strong conceptual knowledge of inference, you can easily learn new procedures (e.g., a significance test on the slope of a population regression line). Keep in mind the big picture (e.g., what is confidence? what is a p-value?) and you can apply it to specific situations.