Probability Theory Assignment—Discussion of Journal Articles

For class Monday (October 26), you must read both “Does Your iPod Really Play Favorites?” and the corresponding “Reviving the Negative Hypergeometric Model.” These are articles from The American Statistician, which is a peer-reviewed journal published by the American Statistical Association.

 

Note: As Yannan mentioned in class, computers are incapable of creating true “random” numbers. For the purposes of our discussion, though, consider the iPod-shuffle software “random enough.” The big issues center around misconceptions of randomness and the ability to model randomness with probability distributions.

 

General Discussion Guidelines

When reading articles for class discussion, I expect you to dig in, take notes (which you can easily reference in class), ask questions, and think carefully about the information presented. It’s a good idea to read the article twice—once to get the big-picture idea(s) and another time to ensure you understand the details. (Note—you should think hard, on your own or with classmates, about how to answer any questions you have. If you’re still left wondering, then your question is a good one to raise in class discussion.)

 

Specific Guidelines for this Particular Discussion

For this specific assignment the following questions/issues should be addressed (and other issues can be addressed, too):

·         In what ways to people typically mis-conceive of randomness? (You can bring in additional points not mentioned in the paper.)

 

·         Consider the Miller & Fridell paper:

o   Summarize the relationship between the four discrete distributions mentioned in the paper.

o   Show how the first equality (in regard to P(X = x)) on p. 348 simplifies to the second equality.

o   Provide a practical example of the Negative Hypergeometric distribution (either from the paper or from your own creative mind).

 

·         Back to the Froehlich, Duckworth, & Culhane paper:

o   Section 3.1

§  Verify that T, the minimum number of songs until the ith artist is played, has a negative hypergeometric distribution.

§  How do probability calculations from this model refute a claim by Levy?

 

o   Section 3.2

§  Why do the probability-model calculations alone not refute the claim by Levy? What do the authors speculate might be an explanation?

 

o   Section 3.3

§  How is the Levy claim of three-songs-from-the-same-album on the autofill related to the “birthday problem”?

§  Can you clearly explain the simplified model presented by the authors? How does this simplified model refute Levy’s claim?

 

o   Section 3.4

§  Verify that S, the number of shuffles required until a particular song is played in the first n songs, has a geometric distribution. How does the geometric model refute Levy’s specific claim?

 

·         Here’s a quick explanation of significance testing… There are two hypotheses (the null hypothesis, which is a statement of “status quo,” and the alternative hypothesis). In this situation, the null hypothesis is that the iPod actually shuffles randomly. In a significance test we collect data and see how much evidence we have against the null hypothesis (against random shuffling). This particular test uses a chi-square goodness-of-fit test, comparing actual observations with what we expect if the shuffling is indeed random. Large deviations between observed and expected would suggest the iPod is not randomly shuffling. The p-value is the probability of getting the observed data under the assumption of random shuffling. So large p-values provide no evidence against random shuffling, while small p-values provide evidence against random shuffling.

 

o   Now, based on that brief explanation, how do the significance tests in Section 4 refute a claim by Levy? (You might need to lean on the students in class who have already studied significance testing.)

 

·         What lingering questions do you have about the articles? Also, did you find any part of the articles especially interesting?