Run
update63 in a terminal window to get a copy of this week's files.
Introduction
This lab gives you the chance to formulate a machine learning
problem and use a neural network to solve it. You will focus on a
data set containing images of people's faces.
First you will explore using neural networks to solve some of the
simpler problems we discussed in class such as the logic problems AND,
OR, and XOR, the auto-encoder problem where the network develops a
binary-like representation in its hidden layer, and the handwritten
digit recognizer.
Experimenting with neural networks
In all of the examples below the network will be called
n. In the Python interpreter, you can use the following
methods:
-
n.showPerformance() will display how the network responds to
each of the training patterns. Initially it will get every
pattern wrong since the weights are randomly initialized and no
learning has taken place.
-
n.printWeights(layer1, layer2) will display the network's
current weights between layer1 and layer2. For
simple two-layer networks, the layers are called 'input'
and 'output'.
-
n.train() will repeatedly train the network on the set of
training patterns. Each time through all of the patterns is called an
epoch. When the network is successfully learning, the total amount of
error should decrease over time.
-
At the unix prompt do: python -i and-net.py.
Before training, test the AND network's performance and look at its
weights. Then train the network and re-test its performance and check
out how the weights have changed. Do the weights make sense to you?
-
At the unix prompt do: python -i or-net.py and try all of the
same commands as before. Convince yourself that the weights make sense.
-
Next run the file xor-net.py in the same way. When you train
this network it will be unable to learn.
-
Run the file xor-3layer.py. In this case the network has
three layers (input, hidden, and output) instead of just two (input,
output). To see all of the weights for this network requires
two commands:
-
n.printWeights('input','hidden')
-
n.printWeights('hidden','output')
After training this three-layer network, draw the network with all of
the trained weights and biases and figure out how it is solving this
problem.
-
Next try the file 8bit-net.py that learns to take eight-bit
patterns and reproduce them on the output layer after re-coding them
in a three-unit hidden layer. After training, use
the showPerformance() method. Write down each of the hidden
layer repersentatations created to re-code each input pattern. Has
the network re-coded them using a binary-like representation?
-
Finally try the file digit-recognizer.py that learns to
categorize handwritten digits. When you run this file it will open up
a number of additional windows. Several windows show the activations
of particular layers, and several windows show the weights between the
input and hidden layers. Move the windows around so that you can see
all of them. Train the network then test all of patterns. Look at
the hidden layer weights in the displays. How has the network learned
to recognize each type of handwritten digit?
Classifying images
We will be using the same types of images that were discussed in this
week's reading. Re-read section 4.7 from Tom
Mitchell's chapter on neural networks (pages
32-36 of the pdf document).
The directory /home/meeden/public/cs63/faces_4/ contains 624
images stored in PGM format. You can view one of these images using
the xv command. Each file is named according to the
following convention:
userid_pose_expression_eyes_scale.pgm
- userid is the user id of the person in the image. This
field has 20 values: ani2, at33, boland, bpm, ch4f, cheyer, choon,
danieln, glickman, karyadi, kawamura, kk49, megak, mitchell, night,
phoebe, saavik, steffi, sz24, and tammo.
- pose is the head position of the person, and this field has
4 values: straight, left, right, and up.
- expression is the facial expression of the person, and this
field has 4 values: neutral, happy, sad, and angry.
- eyes: is the eye state of the person, and this field has 2
values: open and sunglasses.
- scale: is the scale of the image. All of the images are of
type 4 which indicates a quarter-resolution image (32 x 30).
Using a neural network to learn a classification task involves the
following steps: determining a task, gathering appropriate data,
creating the training set of inputs and targets, creating a network
with the appropriate topology and parameter settings, training the
network, and finally analyzing the results. Each of these steps is
explained in more detail below. After you have tried these steps on
the sunglasses example, you will repeat this process on a task of
your choosing.
- Choose a task
The first step is deciding what classification task you would
like to learn. There are a number of possibilities such as
presence of sunglasses, head position, emotion, or identifying
particular individuals. Some of these classification tasks will be
easier to learn than others. As an example, let's focus on the
relatively easy task of determining whether or not a person is
wearing sunglasses.
- Gather data
Next we need to select which of the images to use in our
training set. As a starting point, let's focus on images where the
person is looking straight ahead and has a neutral expression. We
need to gather together all of the filenames that meet these
criteria. We can use ls with the wildcard * to
select the images we want, and then use the greater than sign to
save those image file names into a file in our own directory:
cd /home/meeden/public/cs63/faces_4
ls *straight*neutral* > ~/cs63/labs/5/sunglassesfiles
Using the unix command wc ~/cs63/labs/5/sunglassesfiles we
can see that this file has 40 lines in it. We may want a slightly
larger training set. So let's add in all the images where a person is
looking straight ahead and has a happy expression. We can do this by
again using ls to select these images and using two greater
than signs to append these additional names to the end of the same file:
ls *straight*happy* >> ~/cs63/labs/5/sunglassesfiles
Using the wc command again we see that we now have 79 images.
- Create training set
Once we have selected a good set of images, we need to convert
them into a format that is appropriate for the neural network.
The PGM files have pixel values between 0 and 255; we need to
normalize these values between 0 and 1. In addition, we need to
create input files where each normalized image is written one per
line. I have written some python functions in the
file processingFaces.py to help you prepare these image
files:
- The function getImages takes a directory name where the
images are stored, a filename containing the names of image files
in that directory, and a filename to put the normalized image
values.
- The function getTargets takes a filename containing the
names of image files, and a filename to put the target values for
the task. It uses some aspect of the filename to determine an
appropriate target value. In the sunglasses example, if it finds
the word 'sunglasses' in the filename, then the target
value should be 1. Otherwise the target value should be 0.
Executing this file by doing python processingFaces.py
will create two files called glasses-inputs.dat
and glasses-targets.dat.
- Set up network
Once the training data has been prepared we can create the
neural network. I have provided you with example code in the
file sunglasses-recognizer.py to do this. This creates a
new class called SunglassesRecognizer that inherits from
the class BackpropNetwork. It adds two additional
methods classify and evaluate. At the bottom of
the file, it sets up the network:
- First it creates an instance of the class. Then it creates a
three-layer neural network with 960 inputs (to represent the 32 by 30
pixel values in each image), 1 hidden unit, and one output unit. For
harder tasks you will likely need more hidden units. For tasks
with more categories, you will also need an output unit for each
category.
- Then it sets the learning parameters (epsilon, momentum, and
tolerance).
- Next it creates windows to show the activations of all of the
layers in the network, as well windows to display all of the weights
from the input to the hidden layer.
- Finally, it randomly splits the data into a training set and a
test set. Typically you will want to use about 75-85 percent of the
data for training, and the remainder for testing.
Execute this file by doing python -i
sunglasses-recognizer.py. The -i will leave you in
the python interpreter after setting up the network. It will create
a number of windows. Move them around so that you can see them all.
- Train network
One of the advantages of using neural
networks is their generalization ability. We want to train the
network using the training set, but we don't want the network to
memorize the training set. We want the trained network to be able to
respond appropriately to novel input. We need to be careful not to
overtrain the network. Thus the goal is not to achieve 100 percent
correctness. We will train the network for a number of epochs and then
test the network on novel data to monitor how well it is
generalizing. We will repeat this process until the network is no
longer improving its performance on the novel data. For example, try
the following commands in the python interpreter after you have
set up the network:
>>> n.train(5)
Epoch # 1 | TSS Error: 15.3588 | Correct: 0.0000
Epoch # 2 | TSS Error: 15.2270 | Correct: 0.0000
Epoch # 3 | TSS Error: 15.2524 | Correct: 0.0000
Epoch # 4 | TSS Error: 15.1156 | Correct: 0.0000
Epoch # 5 | TSS Error: 14.8360 | Correct: 0.0000
Reset limit reached; ending without reaching goal
----------------------------------------------------
Final # 5 | TSS Error: 14.8360 | Correct: 0.0000
----------------------------------------------------
You can see that error is dropping, but so far the network has not
learned to respond correctly to any of the training patterns. Let's
check on how well the network is doing on the test patterns.
>>> n.swapData()
Swapping training and testing sets...
19 training patterns, 60 test patterns
>>> n.evaluate()
network classified image #0 (sunglasses) as ???
network classified image #1 (sunglasses) as ???
network classified image #2 (eyes) as ???
network classified image #3 (sunglasses) as ???
network classified image #4 (eyes) as ???
network classified image #5 (eyes) as ???
network classified image #6 (eyes) as ???
network classified image #7 (eyes) as ???
network classified image #8 (eyes) as ???
network classified image #9 (eyes) as ???
network classified image #10 (sunglasses) as ???
network classified image #11 (sunglasses) as ???
network classified image #12 (eyes) as ???
network classified image #13 (eyes) as ???
network classified image #14 (eyes) as ???
network classified image #15 (eyes) as ???
network classified image #16 (eyes) as ???
network classified image #17 (sunglasses) as ???
network classified image #18 (eyes) as ???
19 patterns: 0 correct (0.0%), 19 wrong (100.0%)
As expected, it is not responding correctly to these either. Let's
reset the data, continue training, and then re-test.
>>> n.swapData()
Swapping training and testing sets...
60 training patterns, 19 test patterns
>>> n.train(5)
Epoch # 1 | TSS Error: 14.2245 | Correct: 0.0000
Epoch # 2 | TSS Error: 12.9923 | Correct: 0.0000
Epoch # 3 | TSS Error: 10.3263 | Correct: 0.0000
Epoch # 4 | TSS Error: 7.8044 | Correct: 0.0833
Epoch # 5 | TSS Error: 5.8430 | Correct: 0.3500
Reset limit reached; ending without reaching goal
----------------------------------------------------
Final # 5 | TSS Error: 5.8430 | Correct: 0.3500
----------------------------------------------------
>>> n.swapData()
Swapping training and testing sets...
19 training patterns, 60 test patterns
>>> n.evaluate()
network classified image #2 (eyes) as ???
network classified image #4 (eyes) as ???
network classified image #5 (eyes) as ???
network classified image #6 (eyes) as ???
network classified image #7 (eyes) as ???
network classified image #8 (eyes) as ???
network classified image #9 (eyes) as ???
network classified image #10 (sunglasses) as ???
network classified image #12 (eyes) as ???
network classified image #13 (eyes) as ???
network classified image #14 (eyes) as ???
network classified image #15 (eyes) as ???
network classified image #16 (eyes) as ???
network classified image #18 (eyes) as ???
19 patterns: 5 correct (26.3%), 14 wrong (73.7%)
Clearly the performance of the network is improving. After several
more iterations of this process of reseting the data and additional
training, the network will start performing well on both the training
set and the testing set and learning can be stopped. Remember the
goal is to achieve good performance on the training set while still
being able to respond appropriately to the novel data in the testing
set.
- Analyze results
Once the training process is complete, we
can look more closely at the activations and weights to try to
understand how the network has solved the task. In the python
interpreter do n.showPerformance() and observe which hidden
units are activated for particular images. How are the hidden units
coding for the different categories of images? For the sunglasses task
there is only one hidden unit, so this is pretty straight
forward. Next look at the hidden weights. What parts of the image are
the hidden units focusing on? Why are these locations of the image
important for the given task?
After you have successfully tried this process on the sunglasses
example, repeat this process on an image-based task of your choosing.
In the file
description.txt explain in detail each of the six
steps given above. You may want to take a screen shot of your
resulting hidden weights so that you can refer to them in your
description.
Optional Extension
Apply neural network learning to some other problem of interest to
you. Be sure to explain in a README file what the task is and how the
data is represented.
Submit
Once you are satisfied with your classification experiments, hand them
in by typing
handin63 in a terminal window.