First Assignment

A data set

In this first assignment we are going to build a simple neural network model to solve a classification task. The starting point for any neural network project is a data set. For this project we are going to use a data set on diabetes classification. The data set contains health information on a set of patients, including whether or not each patient has been diagnosed with diabetes. In this project you are going to build a neural network model to predict whether or not a patient has diabetes based on the other health information available on that patient.

I have placed a copy of the csv file containing the data for this project on the course web site at

http://www.lawrence.edu/fast/greggj/CMSC490/diabetes.csv

Preprocessing the data

If you open up the CSV file you will see that most of the data columns are true/false values containing either 0s or 1s. These columns are immediately suitable for feeding to a neural network model.

Two of the columns require some further processing. The age column contains a range of integer values. The best way to handle this column is to normalize the data in this column. The gender column is a categorical column with two categories Male and Female. Here is a link to a Stack Overflow discussion that shows how to transform a simple categorical column with two categories into 0/1 encoding in Pandas.

Building a classifier

Your job in this assignment is to construct a simple neural network to do classification on this data set. In setting up the neural network you will need to make decisions about

How many hidden layers to use
How many units to use in each hidden layer
What activation function to use for the output unit
What loss function to use
What optimizer to use
How many epochs to train the network

Experiment with different values for these choices to produce the best network you can construct.

Turning in your work

To submit your work for grading send me your Juypter notebook as an attachment to an email message.