Adding packages to your environment

At the start of the term I recommended that you use the Anaconda Navigator application to set up an environment for use in this course. Over the next few weeks I will be introducing you to a number of Python packages: in many cases we will need to start by adding some additional packages to your environment.

In today's lecture we will be working with matplotlib, which is a package for constructing plots. To use matplotlib you will need to add it to your environment.

Start the Anaconda Navigator application and click the environments tab on the left of the main screen. Select the CMSC210 environment from the list of available environments. In the pane on the right select Not Installed from the menu and then type matplotlib in the search box.

Click the checkbox next to matplotlib in the resulting list, then click apply.

Next, to be able to display plots in Visual Studio Code you will need to install a second package. Follow the procedure outlined above to also install the jupyter package.

Introducing pyplot

One module within the matplotlib library is pyplot, which presents a simple and easy to use interface for constructing plots.

For our first pyplot example I will be building on the linear regression example from one of our earlier lectures. This time around I want to compute the regression line and then use pyplot to plot both the raw data points and the regression line.

Here is the complete source code for that example. Below I will break this down and cover some specific aspects of using pyplot.

import matplotlib.pyplot as plt

def cleanLine(line):
    """Converts a raw line list into an appropriate
       data format."""
    return (int(line[0]), float(line[1]))

def readData(fileName):
    """Generic data reading function. Uses cleanLine to
       format lines of data."""
    data = []
    with open(fileName) as f:
        for line in f.readlines():
            data.append(cleanLine(line.split()))
    return data

def means(pairs):
    xSum = 0
    ySum = 0
    for x,y in pairs:
        xSum += x
        ySum += y
    N = len(pairs)
    return (xSum/N,ySum/N)

def covariance(pairs,means):
    sum = 0
    for x,y in pairs:
      sum += (x-means[0])*(y-means[1])
    return sum

def xVariance(pairs,xMean):
    sum = 0
    for x,y in pairs:
        sum += (x-xMean)*(x-xMean)
    return sum

def regressionCoeffs(pairs):
    """Computes linear regression coefficients (a,b) from a list of (x,y) pairs."""
    m = means(pairs)
    beta = covariance(pairs,m)/xVariance(pairs,m[0])
    alpha = m[1]-beta*m[0]
    return (alpha,beta)

rawData = readData("farm.txt")
pairs = [cleanLine(line) for line in rawData]
a,b = regressionCoeffs(pairs)

# pyplot expects the data to be plotted to be in the form
# of a list of x values and a list of y values.
# Reconfigure the data to suit pyplot.
x = [year for year,pop in rawData]
y = [pop for year,pop in rawData]
s = [1930,1990]
t = [a+b*1930,a+b*1990]
# Now plot the data sequence and the regression line
plt.plot(x,y,'rs',s,t,'b-')
# Set the axis details
plt.axis([1930,1990,0,35])
# Add title and labels
plt.title("Regression Example")
plt.xlabel("Year")
plt.ylabel("Farm Population (in millions)")
plt.show()

To run this program in Visual Studio Code, right-click the file in the files view and select the option "Run file in new interactive window".

Basics of pyplot

To use pyplot we start with an import statement.

import matplotlib.pyplot as plt

matplotlib is part of the Anaconda distribution, so you should already have matplotlib installed.

Important note: Mac users may need to replace this import statement with the alternative shown below:

import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

The first and most important method in pyplot is the plot() method, which plots a set of data points. plot() expects the data points to be provided as a list of x-values and a separate list of y-values. If your data is not organized in that way you will have to write a little code to reorganize it. In this example the data that we read from the file gets stored as a list of (x,y) tuples, so we have to start with some code to build separate x and y lists from that data:

x = [year for year,pop in rawData]
y = [pop for year,pop in rawData]

To plot the regression line we simply construct a data series consisting of a couple of points, the endpoints of the line we seek to plot:

s = [1930,1990]
t = [a+b*1930,a+b*1990]

The plot() command can plot one or more data series. After each series we also need to provide a format string to specify how we want the series plotted:

plt.plot(x,y,'rs',s,t,'b-')

The format string consists of two characters, a color and a plot style. Here is a table of the color values available and their letter codes:

colorcode
blueb
greeng
redr
cyanc
magentam
yellowy
blackk
whitew

Here is a table of some of the plot styles available:

stylecode
line-
dotted line:
point.
circleo
squares

The plot example we used above will plot the data sequence using red squares and the regression line as a blue line.

After setting up the plot you will usually also want to set up the axes and label them. These commands do that.

plt.axis([1930,1990,0,35])
plt.xlabel("Year")
plt.ylabel("Farm Population (in millions)")

The axis() method takes a list that gives the start and end values for the x axis followed by the start and end values for the y axis.

The last step in plotting is telling pyplot to show the plot.

plt.show()

This will open a graphics window that displays the plot. The graphics window provides controls that will allow you to edit or save the image. Here is the plot generated by this example program.

You can also ask pyplot to save the plot in an image file:

plt.savefig('farm.png')

This will save the image file in the same directory as the Python program's source code file.

Plotting a smooth curve

In our next example we are going to plot a smooth curve of the logistic function

Here is the plot that pyplot will generate for us.

Here is the code for the program that generates this plot.

import matplotlib.pyplot as plt
import math

def f(x):
    return 1/(1+math.exp(-x))


xs = [-5 + 0.1*i for i in range(0,101)]
ys = [f(x) for x in xs]

# Now plot this data as a smooth curve
plt.plot(x, y, 'b-')
# Set the axis details
plt.axis([-5,5,0,1])
# Add title and labels
plt.title("Logistic Function")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.savefig('smooth.png')

The basic approach here is the same as in the last example. We start by making a list of samples along the x axis. When then construct a second list that contains y values made by passing each of the x samples to the function f(x). We then pass those two lists to the pyplot plot() function with a format string of 'b-', which connects the sample points together to make a large number of blue line segments. Since the line segments are individually quite short, the effect of putting them all together is to create what looks like a smooth curve.

To facilitate making the x samples and evaluating them, we use a pair of list comprehesions.

Further information on pyplot and matplotlib

The two examples I showed above demonstrate enough of what pyplot can do to meet our plotting needs for the rest of the course. pyplot and matplotlib are capable of doing much more, and provide facilities for constructing a wide range of different plots. If you are curious, I invite you to visit the official matplotlib web site at matplotlib.org.

Programming Assignment

A first order ordinary differential equation is an equation of the form

along with initial conditions

x(0) = x0

Solving the equation requires that we find an expression for x(t) that satisfies both the equation and the initial condition. Not all differential equations can be solved exactly; however, there are a number of relatively simple and straightforward methods for finding numerical approximations to solutions. In this assignment we will look at three such methods.

When we solve a differential equation exactly, we are trying to determine the function x(t) that satisfies the equation and initial conditions. If we were able to find x(t) we would of course be able to evaluate this solution function at any point in time t. The methods that we are going to see below are all methods that attempt to construct approximations for x(t) at isolated points in time. Specificially, we will try to estimate x(ti) for a collection of isolated points ti = t0 + i h, where h is some step size we select.

Let us think for a minute about how we might go about finding the first of these unknown points, x(t1).

One way to obtain x(t1) would be to start at the known point (t0 , x0) and draw a line with just the right slope to hit the exactly right value for x(t1).

The problem of course is that we don't know the exactly correct slope needed to hit our target. The various methods we will study below are all aimed at doing a better and better job of approximating the correct slope.

The first method uses the only slope we have available to us as a starting guess for the correct slope. Since we know both t0 and x0, we can plug them into the differential equation to compute the slope of x(t) at t = t0:

If we were to use this as our slope and follow the line out to t = t1 we would arrive at a point

x1 = x(t0) + f(t0 , x0) (t1 - t0) = x(t0) + f(t0 , x0) h

which is a fairly bad estimate for x(t1). The picture below illustrates the situation.

From (t1 , x1) we could simply repeat the process:

x2 = x1 + f(t1 , x1) (t2 - t1) = x1 + f(t1 , x1) h

More generally, this method gets us from an estimated point (tn , xn) to a new estimate (tn+1 , xn+1) via the process

tn+1 = tn + h

k1 = f(tn, xn)

xn+1 = xn + k1 h

Again, I want to emphasize that this is a lousy method (as the picture above shows). We can improve the method marginally by decreasing the step size h. With a smaller step size in the t direction, there is less opportunity for the estimated point x1 to depart from the actual value of the solution, x(t1). More importantly, we can improve the method by replacing the slope k1 we used for this method with a better slope that does a better job of getting us closer to the target point.

The Modified Euler Method attempts to improve on the Euler method by using an average of a couple of derivatives as a better approximation for mystery slope we seek. The Modified Euler method starts out by using the Euler method to compute an approximation for x(tn+1) called p1. Plugging tn+1 and the approximate x into the differential equation provides an approximation for the derivative x(tn+1), called k2. The Modified Euler method then takes the average of this derivative and k1 = x(tn), uses that as an approximation for the mystery slope, and then computes a better estimate for xn+1.

Here are the relevant formulas.

tn+1 = tn + h

k1 = f(tn , xn)

p1 = xn + k1 h

k2 = f(tn+1 , p1)

Although the Euler and the Modified Euler methods are both very easy to understand, neither one of these methods does a very good job at approximating the exact solution to a problem. To get better results, we typically have to apply more sophisticated methods. One such method which typically produces pretty good results is the Runge-Kutta method. Like both the Euler and the Modified Euler methods, this method uses previously computed points xn in combination with information from the f(t,x(t)) function to compute subsequent points xn+1. To do this, the Runge-Kutta uses a more sophisticated set of formulas, as shown here:

tn+1 = tn + h

k1 = f(tn , xn)

k2 = f(tn + h/2 , xn + k1 h/2 )

k3 = f(tn + h/2 , xn + k2 h/2 )

k4 = f(tn + h , xn + k3 h)

Here is a concrete problem we will be working with for this assignment:

Write a program that uses the Runge-Kutta method to construct a sequence of approximate solution points (tn , xn) starting from (t0 , x0) = (0 , 1/2) and ending at t = 1.0. Allow the user to enter the number of steps N to take. N steps corresponds to a value of h = 1/N. Using that value of h have your program compute list of estimates for x.

Finally, have your program produce a plot of the estimated points along with a curve for the actual solution. The actual solution for this differential equation is