Elementary Statistics – Sampling

 

The population is the entire collection of individuals about which we want information.

The sample is the collection of individuals we actually measure.

 

 

Potential Problems with Sampling/Data Collection

 

·       Voluntary response (a voluntary response sample is often biased because people with strong opinions, especially negative opinions, are more likely to respond) 

 

·       Undercoverage (when some groups in the population are left out of the process of choosing a sample)

 

·       Nonresponse (e.g., person chooses not to answer, person isn’t home)

 

·       Response error (e.g., person lies or remembers incorrectly)

 

·       Wording of the question/Interview process (e.g., leading questions, prompting by interviewer)

 

·       Processing error (e.g., data entry error, misrecording on the form)

 

 

Important Notes

 

·       Undercoverage occurs when some people are left out of the process of choosing sampling (e.g., people without phones are excluded). Nonresponse occurs if someone who is meant to be sampled is not contacted (or refuses contact). That is, undercoverage is a problem with the process of choosing a sample, and nonreponse is a problem with the actual process of data collection.

 

·       Not all problems with sampling will necessarily lead to bias. For example, if undercoverage occurs, but the people left out share nothing in common that affects the response, there may not be a bias. You need to make the case that a sampling problem will lead to bias.

 

 

Simple Random Sample

 

A simple random sample of size n is a sample chosen such that all groups of size n have the same change of being the selected sample. (This can be done using a random number table or statistical software.)

 

·       This is the gold standard, but it is sometimes difficult to do in practice.

 

·       This solves the problem of undercoverage, but doesn’t solve the other problems (e.g., even if you select a simple random sample, you’re not guaranteed to get information from everyone in the sample).

 

·       Some situations call for more complex sampling plans. (For example, stratified sampling, similar to a block experimental design, selects separate random samples for each stratum).