Python dictionaries

We have worked extensively with Python lists and tuples in this course. There is one additional data structure that Python offers for storing data, the dictionary. A dictionary is essentially a list of key/value pairs.

For example, the dictionary countryData below stores facts about a country, such as its population, its area, and its GDP.

countryData = {'code':'USA', 'pop':327167434, 'area':9833520}

You can access the entries of a dictionary using a syntax that looks like the indexing syntax for lists or tuples. The only difference is that you use the keys of the dictionary in place of integer indices.

print('The area of this country is '+str(countryData['area']))

You can also use the indexing notation to add new key/value pairs to the dictionary.

countryData['gdp'] = 20891000000

Dictionaries are frequently used in situations where you need to construct bundles of data but also want to preserve some flexibility in deciding what goes into each bundle. For example, suppose we are writing software that collects readings from a network of stations spaced around a city. Each station reading will include a station ID number and a timestamp, but beyond that the data included in each reading may vary because different stations are equipped to measure different quantities. For example, one station may be equipped to measure temperature and rainfall, while another station measures air polution and noise levels. This would lead to measurements that look like

{'ID':45,'time':'14:05','temperature':71,'rain':0.2}
{'ID':52,'time':'14:10','pmi':2.5,'noise':42,'nox':12.1}

Software that collects these readings can check to see if a particular key is present by using this syntax:

if 'pmi' in reading.keys():
  # Do something with reading['pmi']

JSON

JSON, or JavaScript Object Notation, is a specialized data description language. This language is most commonly used to make structured text representations of data including primitive types such as numbers and text, along with structured types such as lists and dictionaries. JSON was originally designed to provide text representations for data in the JavaScript language. Since objects in JavaScript have a structure that aligns very closely with that of Python dictionaries and JavaScript arrays are very close to Python lists, when we convert a JSON data representation to a Python data structure the transformation is pretty direct.

For example, here is a JSON expression:

[12,'green',{'option':'yes','amount':3}]

Translating this JSON text into Python produces a list containing a number, a string, and a dictionary.

To translate text from JSON form into a Python data structure you can use the Python json package.

One scenario in which you might need to deal with some JSON data is one in which you have some JSON data stored in a file. The following example program is designed to work with a file that contains population data about a large number of contries. The file contains a single large array whose typical element looks like

{
 "Country Name": "Mexico",
 "Country Code": "MEX",
 "Year": "1966",
 "Value": "46229966"
}

The program prompts the user to enter a three letter country code and then computes the population growth rate for that country from 1995 to 1996:

import json

# Load the data into a list.
filename = 'population_data.json'
with open(filename) as f:
    pop_data = json.load(f)


def findPopulation(code, year):
    for entry in pop_data:
        if entry['Country Code'] == code and entry['Year'] == str(year):
            return float(entry['Value'])
    return 0


code = input('Enter a country code:')
pop95 = findPopulation(code,1995)
pop96 = findPopulation(code,1996)
if pop95 != 0 and pop96 != 0:
    rate = 100*(pop96-pop95)/pop95
    print('The growth rate of '+code+' is '+str(rate))
else:
    print('Missing data.')

The json.load() function provides a simple and convenient way to load JSON data from a file. All we have to do is to pass this function a file descriptor for the file, and it converts the entire file from JSON into Python data.

Since the file in this case translates into a Python list of dictionaries, we use a combination of a loop and some dictionary code to find the entries for the given country in the years 1995 and 1996.

Getting JSON from a server

Another very common scenario in which JSON crops up is in information transfer on the internet. Recently, it has become very popular to set up data servers on the Internet that use the REST architecture. In this architecture clients use HTTP, the language that web browsers use to communicate with web servers, to send requests for data to data servers. Instead of sending the HTML for a web page, the REST server will instead send back a JSON representation of some data the client has requested.

Fetching data from REST servers is a very popular activity in Python, so naturally there are several packages designed to make this easy to do. One such package is the requests package. The requests package contains a useful function, requests.get(), that you can use to fetch JSON data from a REST server. Most REST servers use specially constructed URLs to manage requests for information. To interact with such a server, all you have to do is to pass the URL to request.get(). That function will return a special response object. That object implements a json() method that will return the response JSON converted into a Python data structure for you.

The program below illustrates how this process works. In the example program I am going to send a request for weather forecast information to the server at api.darksky.net. To use this server you have to first go to darksky.net/dev in your browser and sign up for a free developer account. After you sign up you will be issued an api key that you will need to insert in place of the <key> placeholder in the URL in the code below.

import requests
import time

# Get the time 24 hours from now
timeTomorrow = int(time.time())+60*60*24
# Set up the URL for the REST request
url = 'https://api.darksky.net/forecast/<key>/44.26,-88.39,'+str(timeTomorrow)
# Send the request and translate the response from JSON
response = requests.get(url)
data = response.json()
# Print a forecast
print('The weather this time tomorrow will be '+data['currently']['summary'])
print('The temperature will be '+str(data['currently']['temperature']))

The code above assumes some knowledge of the structure of the response that comes back from api.darksky.net. You can either use the debugger to examine the contents of data, or you can use the program to construct the URL and then copy the URL into a browser to see what the server sends back as a response. Either way, you can examine the response to see how it is structured and then write the code that drills down into the data contents to get the information you want.

Programming assignment

Below you will find a link to a zip file containing two large data sets. The first data set is the population data JSON file that I used in the sample program above. The entries there make it possible to look up population data for different years based on standard country codes. The second data set is a large CSV file I downloaded from the World Bank. This data set gives historical data on national GDP for a large number of countries over a span of many years. The second data set uses the same country codes as the first to identify countries, so this should give you a way to aggregate population and GDP data for a large number of countries. You can use pandas to load this CSV file and look up GDP data. To make countries easier to find by their country codes you should use this code to set the country code column of the CSV file to be the index for the data frame:

df = df.set_index('Country Code')

You can then look up individual cells in the data frame by using the syntax

df.loc[code,col]

where code contains a country code for a country you want to look up and col contains the name of the column you want to look in.

The data sets

Is there a correlation between the rate of a country's population growth and the rate of growth of its economy? In this exercise you will aggregate and plot some relevant data to help answer this question.

To do this analysis you will want to start by collecting population and GDP data for a pair of years, 1995 and 2005. For each country in your analysis compute the percentage change in the country's population over that period and the percentage change in that country's GDP per capita. (Note: to compute GDP per capita you will need to divide the GDP of each country by that country's population.)

Next, using pyplot construct a plot that plots population change vs GDP per capita change. If there is any correlation at all between population growth and GDP per capita growth it should show up in the shape of the data cloud that you plot.