Probability Theory Assignment—Discussion of
Journal Articles
For class Monday (October 26), you must read both “Does
Your iPod Really
Play Favorites?” and the corresponding “Reviving the Negative Hypergeometric Model.” These are articles from The American Statistician, which is a
peer-reviewed journal published by the American Statistical Association.
Note:
As Yannan mentioned in class, computers are incapable
of creating true “random” numbers. For the purposes of our discussion, though,
consider the iPod-shuffle software “random enough.” The big issues center around misconceptions of randomness and the ability to model
randomness with probability distributions.
General Discussion Guidelines
When reading articles for class
discussion, I expect you to dig in, take notes (which you can easily reference
in class), ask questions, and think carefully about the information presented.
It’s a good idea to read the article twice—once to get the big-picture idea(s)
and another time to ensure you understand the details. (Note—you should think hard, on your own or with
classmates, about how to answer any questions you have. If you’re still
left wondering, then your question is a good one to raise
in class discussion.)
Specific Guidelines for this Particular Discussion
For this specific
assignment the following questions/issues should be addressed (and other issues
can be addressed, too):
·
In
what ways to people typically mis-conceive of
randomness? (You can bring in additional points not mentioned in the paper.)
·
Consider
the Miller & Fridell paper:
o
Summarize
the relationship between the four discrete distributions mentioned in the
paper.
o
Show
how the first equality (in regard to P(X
= x)) on p. 348 simplifies to the second equality.
o
Provide
a practical example of the Negative Hypergeometric
distribution (either from the paper or from your own creative mind).
·
Back
to the Froehlich, Duckworth, & Culhane paper:
o
Section
3.1
§
Verify
that T, the minimum number of songs until the ith
artist is played, has a negative hypergeometric
distribution.
§
How
do probability calculations from this model refute a claim by Levy?
o
Section
3.2
§
Why
do the probability-model calculations alone not refute the claim by Levy? What
do the authors speculate might be an explanation?
o
Section
3.3
§
How
is the Levy claim of three-songs-from-the-same-album on the autofill
related to the “birthday problem”?
§
Can
you clearly explain the simplified model presented by the authors? How does
this simplified model refute Levy’s claim?
o
Section
3.4
§
Verify
that S, the number of shuffles required until a particular song is played in
the first n songs, has a geometric distribution. How does the geometric model
refute Levy’s specific claim?
·
Here’s a quick explanation of
significance testing…
There are two hypotheses (the null hypothesis, which is a statement of “status
quo,” and the alternative hypothesis). In this situation, the null hypothesis
is that the iPod actually shuffles randomly. In a significance test we collect
data and see how much evidence we have against
the null hypothesis (against random shuffling). This particular test uses a
chi-square goodness-of-fit test, comparing actual observations with what we
expect if the shuffling is indeed random. Large deviations between observed and
expected would suggest the iPod is not randomly shuffling. The p-value is the
probability of getting the observed data under the assumption of random
shuffling. So large p-values provide no evidence against random shuffling,
while small p-values provide evidence against random shuffling.
o
Now,
based on that brief explanation, how do the significance tests in Section 4
refute a claim by Levy? (You might need to lean on the students in class who
have already studied significance testing.)
·
What
lingering questions do you have about the articles? Also, did you find any part
of the articles especially interesting?