Jupyter NotebookSolar Project

Working with time

The Jupyter notebook linked above contains some code samples to demonstrate the basic mechanics of working with time information in Python.

The Belgian Solar example

The main example today is an example that involves merging time series data that comes from a couple of different sources into a database table and then reading some of that data to construct a plot.

The data set involves power output data published by a utility company in Belgium. This company tracks the output of solar power installations connected to the grid all over Belgium and publishes monthly summaries of power production in the form of a CSV file. We are going to integrate this power production data with weather data downloaded from Dark Sky in a database table. We will then construct a program that uses this data to construct a plot of power production, cloud cover and temperature over a three day period.

The first step in building our database is to use SQLite Studio to construct the database and table. The database file is named 'solar.db' and it contains a single table named 'power'. The power table has the following structure:

ColumnTypeDescription
timeINTUNIX time value
powerREALPower produces in Megawatts
tempREALTemperature in degrees F
cloudREALCloud cover

The process of filling the database with data involves two steps: reading the power production data from a CSV file and fetching the weather data from Dark Sky.

Here is the source code for the first function, which loads the initial data from a CSV file.

import sqlite3
import time

def cleanLine(line):
    """Converts a raw line list into an appropriate
       data format."""
    return (int(time.mktime(time.strptime(line[0],"%d/%m/%Y %H:%M"))), float(line[1]))

def readData(fileName):
    """Generic data reading function. Uses cleanLine to
       format lines of data."""
    data = []
    with open(fileName) as f:
        for line in f.readlines():
            data.append(cleanLine(line.split(',')))
    return data

readings = readData("Belgium.csv")

conn = sqlite3.connect('solar.db')
c = conn.cursor()
print("Opened database successfully")

for reading in readings:
    c.execute("INSERT INTO power(time,power) VALUES (?,?)",reading)

conn.commit()
print("Records created successfully")
conn.close()

Two things are worth noting here. The first is the logic in the cleanLine function. A typical row in the CSV file looks like

01/04/2019 11:30,2135.91

Since we are going to be storing time values as UNIX time numbers, we have to use the time.strptime() function to convert the text time information to a time value. The code

time.mktime(time.strptime(line[0],"%d/%m/%Y %H:%M"))

takes care of this step. Note also that the cleanLine() function converts each line in the CSV file into a tuple with two entries, a time value and a power value. This structure will be necessary when we go to insert the data into the database.

After reading the data from the CSV file we will end up with a long list of tuples. To automate the process of inserting all of these tuples into the database we use a variant of the cursor's execute method. This variant makes use of an SQL string containing placeholder ? characters. The second parameter to execute is a tuple containing values to be substituted for the placeholders.

for reading in readings:
    c.execute("INSERT INTO power(time,power) VALUES (?,?)",reading)

The second step in preparing the database is to load weather data. We will download this weather data from Dark Sky and insert the temperature and cloud cover data into the database via a loop that uses an SQL update command.

import sqlite3
import requests
import time

# Set up the database connection
conn = sqlite3.connect('solar.db')
c = conn.cursor()
print("Opened database successfully")

# Set up the start and end times.
startTime = int(time.mktime(time.strptime("01/04/2019 00:00","%d/%m/%Y %H:%M")))
endTime = int(time.mktime(time.strptime("01/06/2019 00:00","%d/%m/%Y %H:%M")))

currTime = startTime
while currTime <= endTime:
    # Set up the URL for the REST request
    url = 'https://api.darksky.net/forecast/<key>/50.7,4.5,' + str(currTime)
    # Send the request and translate the response from JSON
    response = requests.get(url)
    data = response.json()
    for hr in data['hourly']['data']:
        time = hr['time']
        if 'temperature' in hr.keys():
            temp = hr['temperature']
        if 'cloudCover' in hr.keys():
            cloud = hr['cloudCover']
        c.execute("UPDATE power SET temp=?,cloud=? WHERE time >= ? AND time < ?", (temp,cloud,time,time+60*60))
    currTime += 24*60*60

conn.commit()
print("Records updated successfully")
conn.close()

Once the data is in the database we are ready to extract data and construct our plot. The final program prompts the user to enter a day in the month. The program will pull three days worth of data from the database and construct a plot showing power production, temperature, and cloud cover over a 72 hour time window.

import sqlite3
import datetime
import time
import matplotlib.pyplot as plt

start = int(input("Enter a start day:"))
start_date = datetime.datetime(2019,4,start,0,0,0)
start_time = int(time.mktime(start_date.timetuple()))
end_time = start_time + 3*24*60*60

# Set up the database connection
conn = sqlite3.connect('solar.db')
c = conn.cursor()
print("Opened database successfully")
# Run the query
sql = "SELECT time, power, temp, cloud FROM power WHERE time>=? AND time <=?"
c.execute(sql,(start_time,end_time))

# Process the results
t = []
p = []
tp = []
cl = []
for time, power, temp, cloud in c.fetchall():
    t.append((time-start_time)/(60*60))
    p.append(power/33.7)
    tp.append(temp)
    cl.append(cloud*100)

plt.plot(t,p,'r-',t,tp,'b-',t,cl,'y-')
plt.title('Starting day '+str(start))
plt.savefig('plot.png')

If you would like to run this code, you can click the button at the top of these lecture notes to download the project files for this project. Note that I have intentionally removed my key from the Dark Sky URL in the code. You will need to obtain your own API key from Dark Sky and put it in place of the <key> in that URL for the code to work correctly.

Postscript

In the project folder I linked to above I have also included a Jupyter notebook. The code in this notebook demonstrates how you can use pandas alone to gather and organize the data for plotting. The example code there shows some ways that pandas supports time series data and doing things like merging two data sources that report values at differing times.