CS63: Lab 5

Run update63 in a terminal window to get a copy of this week's files.

Introduction

This lab gives you the chance to formulate a machine learning problem and use a neural network to solve it. You will focus on a data set containing images of people's faces.

First you will explore using neural networks to solve some of the simpler problems we discussed in class such as the logic problems AND, OR, and XOR, the auto-encoder problem where the network develops a binary-like representation in its hidden layer, and the handwritten digit recognizer.

Experimenting with neural networks

In all of the examples below the network will be called n. In the Python interpreter, you can use the following methods:

n.showPerformance() will display how the network responds to each of the training patterns. Initially it will get every pattern wrong since the weights are randomly initialized and no learning has taken place.
n.printWeights(layer1, layer2) will display the network's current weights between layer1 and layer2. For simple two-layer networks, the layers are called 'input' and 'output'.
n.train() will repeatedly train the network on the set of training patterns. Each time through all of the patterns is called an epoch. When the network is successfully learning, the total amount of error should decrease over time.

At the unix prompt do: python -i and-net.py. Before training, test the AND network's performance and look at its weights. Then train the network and re-test its performance and check out how the weights have changed. Do the weights make sense to you?

At the unix prompt do: python -i or-net.py and try all of the same commands as before. Convince yourself that the weights make sense.

Next run the file xor-net.py in the same way. When you train this network it will be unable to learn.
Run the file xor-3layer.py. In this case the network has three layers (input, hidden, and output) instead of just two (input, output). To see all of the weights for this network requires two commands:
- n.printWeights('input','hidden')
- n.printWeights('hidden','output')
After training this three-layer network, draw the network with all of the trained weights and biases and figure out how it is solving this problem.

Next try the file 8bit-net.py that learns to take eight-bit patterns and reproduce them on the output layer after re-coding them in a three-unit hidden layer. After training, use the showPerformance() method. Write down each of the hidden layer repersentatations created to re-code each input pattern. Has the network re-coded them using a binary-like representation?

Finally try the file digit-recognizer.py that learns to categorize handwritten digits. When you run this file it will open up a number of additional windows. Several windows show the activations of particular layers, and several windows show the weights between the input and hidden layers. Move the windows around so that you can see all of them. Train the network then test all of patterns. Look at the hidden layer weights in the displays. How has the network learned to recognize each type of handwritten digit?

Classifying images

We will be using the same types of images that were discussed in this week's reading. Re-read section 4.7 from Tom Mitchell's chapter on neural networks (pages 32-36 of the pdf document).

The directory /home/meeden/public/cs63/faces_4/ contains 624 images stored in PGM format. You can view one of these images using the xv command. Each file is named according to the following convention:

userid_pose_expression_eyes_scale.pgm

userid is the user id of the person in the image. This field has 20 values: ani2, at33, boland, bpm, ch4f, cheyer, choon, danieln, glickman, karyadi, kawamura, kk49, megak, mitchell, night, phoebe, saavik, steffi, sz24, and tammo.
pose is the head position of the person, and this field has 4 values: straight, left, right, and up.
expression is the facial expression of the person, and this field has 4 values: neutral, happy, sad, and angry.
eyes: is the eye state of the person, and this field has 2 values: open and sunglasses.
scale: is the scale of the image. All of the images are of type 4 which indicates a quarter-resolution image (32 x 30).

Using a neural network to learn a classification task involves the following steps: determining a task, gathering appropriate data, creating the training set of inputs and targets, creating a network with the appropriate topology and parameter settings, training the network, and finally analyzing the results. Each of these steps is explained in more detail below. After you have tried these steps on the sunglasses example, you will repeat this process on a task of your choosing.

Choose a task
The first step is deciding what classification task you would like to learn. There are a number of possibilities such as presence of sunglasses, head position, emotion, or identifying particular individuals. Some of these classification tasks will be easier to learn than others. As an example, let's focus on the relatively easy task of determining whether or not a person is wearing sunglasses.

Gather data
Next we need to select which of the images to use in our training set. As a starting point, let's focus on images where the person is looking straight ahead and has a neutral expression. We need to gather together all of the filenames that meet these criteria. We can use ls with the wildcard * to select the images we want, and then use the greater than sign to save those image file names into a file in our own directory:
```
cd /home/meeden/public/cs63/faces_4
ls *straight*neutral* > ~/cs63/labs/5/sunglassesfiles
```
Using the unix command wc ~/cs63/labs/5/sunglassesfiles we can see that this file has 40 lines in it. We may want a slightly larger training set. So let's add in all the images where a person is looking straight ahead and has a happy expression. We can do this by again using ls to select these images and using two greater than signs to append these additional names to the end of the same file:
```
ls *straight*happy* >> ~/cs63/labs/5/sunglassesfiles
```
Using the wc command again we see that we now have 79 images.
Create training set
Once we have selected a good set of images, we need to convert them into a format that is appropriate for the neural network. The PGM files have pixel values between 0 and 255; we need to normalize these values between 0 and 1. In addition, we need to create input files where each normalized image is written one per line. I have written some python functions in the file processingFaces.py to help you prepare these image files:
- The function getImages takes a directory name where the images are stored, a filename containing the names of image files in that directory, and a filename to put the normalized image values.
- The function getTargets takes a filename containing the names of image files, and a filename to put the target values for the task. It uses some aspect of the filename to determine an appropriate target value. In the sunglasses example, if it finds the word 'sunglasses' in the filename, then the target value should be 1. Otherwise the target value should be 0.

Executing this file by doing python processingFaces.py will create two files called glasses-inputs.dat and glasses-targets.dat.

Set up network
Once the training data has been prepared we can create the neural network. I have provided you with example code in the file sunglasses-recognizer.py to do this. This creates a new class called SunglassesRecognizer that inherits from the class BackpropNetwork. It adds two additional methods classify and evaluate. At the bottom of the file, it sets up the network:
- First it creates an instance of the class. Then it creates a three-layer neural network with 960 inputs (to represent the 32 by 30 pixel values in each image), 1 hidden unit, and one output unit. For harder tasks you will likely need more hidden units. For tasks with more categories, you will also need an output unit for each category.
- Then it sets the learning parameters (epsilon, momentum, and tolerance).
- Next it creates windows to show the activations of all of the layers in the network, as well windows to display all of the weights from the input to the hidden layer.
- Finally, it randomly splits the data into a training set and a test set. Typically you will want to use about 75-85 percent of the data for training, and the remainder for testing.
Execute this file by doing python -i sunglasses-recognizer.py. The -i will leave you in the python interpreter after setting up the network. It will create a number of windows. Move them around so that you can see them all.

Train network
One of the advantages of using neural networks is their generalization ability. We want to train the network using the training set, but we don't want the network to memorize the training set. We want the trained network to be able to respond appropriately to novel input. We need to be careful not to overtrain the network. Thus the goal is not to achieve 100 percent correctness. We will train the network for a number of epochs and then test the network on novel data to monitor how well it is generalizing. We will repeat this process until the network is no longer improving its performance on the novel data. For example, try the following commands in the python interpreter after you have set up the network:

>>> n.train(5)
Epoch #     1 | TSS Error: 15.3588 | Correct: 0.0000
Epoch #     2 | TSS Error: 15.2270 | Correct: 0.0000
Epoch #     3 | TSS Error: 15.2524 | Correct: 0.0000
Epoch #     4 | TSS Error: 15.1156 | Correct: 0.0000
Epoch #     5 | TSS Error: 14.8360 | Correct: 0.0000
Reset limit reached; ending without reaching goal
----------------------------------------------------
Final #     5 | TSS Error: 14.8360 | Correct: 0.0000
----------------------------------------------------

You can see that error is dropping, but so far the network has not learned to respond correctly to any of the training patterns. Let's check on how well the network is doing on the test patterns.

>>> n.swapData()
Swapping training and testing sets...
19 training patterns, 60 test patterns
>>> n.evaluate()
network classified image #0 (sunglasses) as ???
network classified image #1 (sunglasses) as ???
network classified image #2 (eyes) as ???
network classified image #3 (sunglasses) as ???
network classified image #4 (eyes) as ???
network classified image #5 (eyes) as ???
network classified image #6 (eyes) as ???
network classified image #7 (eyes) as ???
network classified image #8 (eyes) as ???
network classified image #9 (eyes) as ???
network classified image #10 (sunglasses) as ???
network classified image #11 (sunglasses) as ???
network classified image #12 (eyes) as ???
network classified image #13 (eyes) as ???
network classified image #14 (eyes) as ???
network classified image #15 (eyes) as ???
network classified image #16 (eyes) as ???
network classified image #17 (sunglasses) as ???
network classified image #18 (eyes) as ???
19 patterns: 0 correct (0.0%), 19 wrong (100.0%)

As expected, it is not responding correctly to these either. Let's reset the data, continue training, and then re-test.

>>> n.swapData()
Swapping training and testing sets...
60 training patterns, 19 test patterns
>>> n.train(5)
Epoch #     1 | TSS Error: 14.2245 | Correct: 0.0000
Epoch #     2 | TSS Error: 12.9923 | Correct: 0.0000
Epoch #     3 | TSS Error: 10.3263 | Correct: 0.0000
Epoch #     4 | TSS Error: 7.8044 | Correct: 0.0833
Epoch #     5 | TSS Error: 5.8430 | Correct: 0.3500
Reset limit reached; ending without reaching goal
----------------------------------------------------
Final #     5 | TSS Error: 5.8430 | Correct: 0.3500
----------------------------------------------------
>>> n.swapData()
Swapping training and testing sets...
19 training patterns, 60 test patterns
>>> n.evaluate()
network classified image #2 (eyes) as ???
network classified image #4 (eyes) as ???
network classified image #5 (eyes) as ???
network classified image #6 (eyes) as ???
network classified image #7 (eyes) as ???
network classified image #8 (eyes) as ???
network classified image #9 (eyes) as ???
network classified image #10 (sunglasses) as ???
network classified image #12 (eyes) as ???
network classified image #13 (eyes) as ???
network classified image #14 (eyes) as ???
network classified image #15 (eyes) as ???
network classified image #16 (eyes) as ???
network classified image #18 (eyes) as ???
19 patterns: 5 correct (26.3%), 14 wrong (73.7%)

Clearly the performance of the network is improving. After several more iterations of this process of reseting the data and additional training, the network will start performing well on both the training set and the testing set and learning can be stopped. Remember the goal is to achieve good performance on the training set while still being able to respond appropriately to the novel data in the testing set.

Analyze results
Once the training process is complete, we can look more closely at the activations and weights to try to understand how the network has solved the task. In the python interpreter do n.showPerformance() and observe which hidden units are activated for particular images. How are the hidden units coding for the different categories of images? For the sunglasses task there is only one hidden unit, so this is pretty straight forward. Next look at the hidden weights. What parts of the image are the hidden units focusing on? Why are these locations of the image important for the given task?

After you have successfully tried this process on the sunglasses example, repeat this process on an image-based task of your choosing. In the file description.txt explain in detail each of the six steps given above. You may want to take a screen shot of your resulting hidden weights so that you can refer to them in your description.

Optional Extension

Apply neural network learning to some other problem of interest to you. Be sure to explain in a README file what the task is and how the data is represented.

Submit

Once you are satisfied with your classification experiments, hand them in by typing handin63 in a terminal window.

CS63 Lab 5: Neural Network Classification