Elementary Statistics—Sampling
and Data Collection (things to think about)
The population is the entire collection of
individuals about which we want information.
The sample is the collection of individuals
we actually measure.
Potential Problems with Sampling/Data
Collection
·
Voluntary
response (a voluntary response sample is almost always biased because people
with strong opinions, especially negative opinions, are more likely to
respond)
·
Undercoverage
(when some groups in the population are left out of the process of choosing a
sample)
·
Nonresponse
(e.g., person chooses not to answer,
person isn’t home)
·
Response
error (e.g., person lies or remembers
incorrectly)
·
Wording
of the question/Interview process/Ordering of questions (e.g., leading questions, prompting by interviewer, certain order of
questions to prompt a desired response)
·
Processing
error (e.g., data entry error, misrecording on the form)
Important Notes
·
Undercoverage
occurs when some people are left out of the process of choosing sampling (e.g., people without phones are
excluded). Nonresponse occurs if someone who is meant
to be sampled is not contacted (or refuses contact). That is, undercoverage is a problem with the process of choosing a sample, and nonreponse is a problem with the actual process of data collection. [This is illustrated in
Example 2 on the “Sampling Examples” handout.]
·
Not
all problems with sampling will necessarily lead to bias. For example, if undercoverage occurs, but the people left out share nothing
in common that affects the response, there may not be a bias. You need to make the case that a sampling
problem will lead to bias.
Simple Random Sample
A simple random sample of size n is a sample chosen such that all
groups of size n have the same change
of being the selected sample. (This can be done using a random number table or
statistical software.)
·
This
is the gold standard, but it is sometimes difficult to do in practice.
·
This
solves the problem of undercoverage, but doesn’t
solve the other problems (e.g., even
if you select a simple random sample, you’re not guaranteed to get information
from everyone in the sample).
·
Some
situations call for more complex sampling plans. (For example, stratified
sampling, similar to a block experimental design, selects separate random
samples for each stratum).