Lab 7: Neural Networks

Lab 7: Neural Networks
Due March 27 by midnight

Part 1: Neural Network Implementation

Your task is to implement a small neural network with sigmoid activation functions, trained by backpropagation. This will require implementing three functions for sigmoid neurons:

compute_activation: weighted sum of inputs passed through a sigmoid function
compute_output_deltas: (output)(1 - output)(target - output)
compute_hidden_deltas: (output)(1 - output)(weighted sum of next-layer deltas)
update_weights: weight += (learning_rate)(delta)(input activation)

The first three functions store their results in the node's self.activation and self.delta fields. update_weights changes the weights of the edges stored in self.in_edges.

The Network class is already set up to initialize a neural network with input, bias, and sigmoid nodes. You must implement the following functions for the neural network

predict: computes the network's output for a given input.
backpropagation: updates the network's weights using the error between the most recent output and the target.
train: calls predict then backpropagation on each example in the (shuffled) training set until convergence or max epochs.
test: computes the fraction of inputs for which the output matches the target.

Much more detail on all of these functions can be found in the comments in neural_net.py. The main function sets up a network with a single two-neuron hidden layer and trains it to represent the function XOR. Unit tests for many of the functions and an additional example data set will be available soon.

Part 2: Keras for MNIST

For this part of the lab, you will be classifying images from the MNIST handwritten digit data set. Each input is a 28x28 array of integers between 0 and 255 representing grayscale images. The corresponding labels are digits 0–9.

To run the Jupyter notebooks, you first need to activate the virual environment:

. ~bryce/public/cs63/cs63s17/bin/activate

You can deactivate the virtual environment at any time with the command deactivate. To start up Jupyter, you should first navigate to your lab directory, then run:

jupyter notebook --browser=firefox

The starting point notebooks demonstrate setting up the inputs for training a neural network using the Keras library. In the first, the network is set up with parameters that should be familiar from lecture and part 1:

sigmoid activation functions
a single densely-connected hidden layer
minimizing squared error
updating weights with stochastic gradient descent

With these parameters, the network achieves only ~60% accuracy on the test set. The second notebook demonstrates a number of possible variations on these parameters:

ReLU and softmax activation functions
multiple convolutional hidden layers
categorical crossentropy loss function
RMSprop optimizer

With these settings, accuracy of almost 90% should already be possible. Your task is to find parameters that result in better than 97% test set accuracy on the MNIST data set. You should consider varying some or all of the following parameters:

activation functions
alternative layer structures, especially convolutional and dropout
optimizers, including varying parameters like momentum
loss functions, especially categorical ones
You can also vary the number and size of layers, and for convolutional layers, you have many options for their layout.
You should not rely purely on training for additional epochs. Increasing training time is acceptable, but should not be the main thing you vary.

If trial-and-error doesn't get you to 97% accuracy, you should try searching for what others have done. Training neural networks on the MNIST data set is common enough that you should be able to find lots of relevant google results. In addition, Wednesday's reading mentions three different papers that achieved over 98% accuracy on MNIST, and Friday's reading links to another.

Writeup

You have been provided with a latex document called nn.tex, where you have two sections to complete. In the first section, you should fill in the weights found by your neural network from part 1, using a non-zero random_seed of your choosing (for which learning converges). You should then explain the weights that the neural network learned and how the network is computing XOR. In particular, you should be able to recognize a specific Boolean function (a.k.a. logic gate) corresponding to each hidden/output node.

In the second section, you will describe the parameters you found that achieved over 97% accuracy on the MNIST handwritten digit classification task. You should describe how you came up with them or cite where you found them. Then you should explain what features of the parameters you found make them more successful than the defaults that we started with. Be sure that your descriptions are sufficiently detailed that we can re-create your network and reproduce your results.

You can edit the nn.tex file with any text editor, such as gvim. There's lots of great help available on line for latex; just google "latex topic" to find lots of tutorials and stack exchange posts. To compile your latex document into a pdf, run the following command:

pdflatex nn.tex

You can then open the pdf from the command line with gnome-open or, of course, by double clicking its icon. Feel free to use services like sharelatex to edit your latex file.

Submitting

You should submit the files neural_net.py and nn.tex. As usual, use git add, git commit and git push before the deadline to submit your work.