How random is it?

The Python language includes a number of standard libraries that provide access to useful functions. For example, to perform mathematical calculations involving standard math functions such as the square root function or the various trig functions we would import the math module:

import math
x = math.sqrt(2)

Another handy module is the random module, which provides functions for generating pseudorandom number sequences. One such function is random.randint(a,b), which returns a random integer n in the range [a,b]. In this assignment we are going to use this function to generate a seemingly random list of integers in the range [1,10]. We are then going to apply a number of simple statistical tests to determine to what extent the list of numbers appears random and uniformly distributed in the interval [1,10].

To start with, lets generate a list of 10,000 random integers in the range [1,10]:

import random
list = [random.randint(1,10) for n in range(10000)]

Statistical tests

There are a number of simple statistical tests we can perform on our list of integers to determine whether or not the list is random and uniformly distributed. In each case we can make a theoretical predition of the value the test would return if the list were truly random and uniformly distributed. We can then compare the theoretical prediction with the actual results in each case.

To perform the individual statistical tests below you will construct a function that computes each statistic. After constructing the code for each of the functions, write some code that calls that function to perform the test and then print the results of the test.

Frequency counts

The first test is to collect frequence counts. For each number in the range from 1 to 10 we can count how many times that number appears in the list. If the numbers in the list are close to being uniformly distributed then each number in the range should appear roughly 10000/10 = 1000 times.

To test this hypothesis, construct a function frequencies(list) that constructs and returns a list of length 10. Item n in that list should be a count of how many times the number n+1 appears in the given list.

Call this function in your test program and print the resulting list of frequencies. The frequencies should be roughly similar and should each equal approximately 1000.

Mean and Standard Deviation

Theoretically, a list of integers generated by a uniformly distributed discrete distribution in the range [1,10] should have a mean value of (10+1)/2 = 5.5 and a standard deviation of

(For a derivation of these two predictions, see this page.)

The standard deviation of a sequence of numbers xi with mean value μ is

Construct two functions, mean(list) and stddev(list), that can compute these two quantities and then call them on your list of random integers. Print the results: how close are they to the theoretical predictions given above?

Correlation testing

If the numbers in our list are truly random, then each number in the sequence should be uncorrelated with the number that comes right after it in the list. To test this hypothesis, we can construct a list of 9999 tuples, where tuple n consists of numbers n and n+1 from the original list. We can then compute a correlation coefficient for the tuples in our list.

The correlation coefficient for a sequence of pairs (X,Y) is given by

We can simplify this expression by noting that the means and standard deviations for the first and second numbers in each of our tuples will be approximately equal to the mean and standard deviation of the original list of integers, so this simplifies to

where N is the number of tuples in your list of tuples.

Construct a function correlation(list,mean,stddev) that can compute the correlation coefficient for your list of tuples. Call this function and print its result. How close does it land to a correlation of 0, which is what we should see if the list of pairs were truly random?

Due date

This assignment is due by the start of class on Tuesday, Oct. 3