Classifying road signs

On the course web site at the URLs

    http://www.lawrence.edu/fast/greggj/CMSC490/Signs.zip
    http://www.lawrence.edu/fast/greggj/CMSC490/Signs_Test.zip

you will find two archives containing the data set for this assignment. The Signs.zip archive contains the training data you will use, while Signs_Test.zip contains test data you can use to form a validation set and a test set.

In both archives you will find a large number of images of road signs, divided into distinct classes. In this assignment you will construct a simple CNN that can classify an image of a road sign into the proper class.

Preprocessing the data

Just as in the cats and dogs example we will be using a keras dataset to read data from the image files and pass the images to our network. To prepare for doing this you will have to copy data from the original test folder into two new folders for your validation and test sets. Both of these folders will contain subfolders labeled with the class numbers containing images you have copied from the original test folder.

To set up the folder for the correct structure for the validation and test keras dataset you will need to run some Python code. You can base your code on the code that the author used for the cats and dogs example in chapter eight:

import os, shutil, pathlib

original_dir = pathlib.Path("train")
new_base_dir = pathlib.Path("cats_vs_dogs_small")

def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir / subset_name / category
        os.makedirs(dir)
        fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir / fname)

make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)

To construct the correct code for this project you may find it useful to use the os.listdir(<dir>) function, which returns a list of the names of all of the files in the directory <dir>. You should also use the random.shuffle(<list>) function to shuffle the list of file names before copying half of them to your validation folder and the other half to your test folder.

Since this code is straight Python code and does not make any use of Tensorflow, you can run this code on your laptop first to debug it and make sure that works correctly before you try to run the code on Colab or one of our remote servers.

Building a network

You should be able to build a relatively simple CNN classifier to solve this problem. Your goal will be to construct and train a network that can achieve an accuracy of 95% or greater on your test set.