Elementary Statistics – Sampling
The population is the entire collection of individuals about which we
want information.
The sample is the collection of individuals we actually measure.
Potential Problems with Sampling/Data Collection
·
Voluntary
response (a voluntary response sample is often biased because people with
strong opinions, especially negative opinions, are more likely to respond)
·
Undercoverage (when
some groups in the population are left out of the process of choosing a sample)
·
Nonresponse (e.g., person chooses not to answer,
person isn’t home)
·
Response
error (e.g., person lies or remembers
incorrectly)
·
Wording of
the question/Interview process (e.g.,
leading questions, prompting by interviewer)
·
Processing
error (e.g., data entry error, misrecording on the form)
Important Notes
·
Undercoverage occurs
when some people are left out of the process of choosing sampling (e.g., people without phones are
excluded). Nonresponse occurs if someone who is meant
to be sampled is not contacted (or refuses contact). That is, undercoverage is a problem with the process of choosing a sample, and nonreponse is a problem with the actual process of data collection.
·
Not all
problems with sampling will necessarily lead to bias. For example, if undercoverage occurs, but the people left out share nothing
in common that affects the response, there may not be a bias. You need to make
the case that a sampling problem will lead to bias.
Simple Random Sample
A simple random sample of size n
is a sample chosen such that all groups of size n have the same change of being the selected sample. (This can be
done using a random number table or statistical software.)
·
This is the
gold standard, but it is sometimes difficult to do in practice.
·
This solves
the problem of undercoverage, but doesn’t solve the
other problems (e.g., even if you
select a simple random sample, you’re not guaranteed to get information from
everyone in the sample).
·
Some
situations call for more complex sampling plans. (For example, stratified sampling,
similar to a block experimental design, selects separate random samples for
each stratum).