Computational Models of Language Homework 2
Computational Models of Language
Spring 2001
Homework 2: Experimenting with Backpropagation and Multi-Layer Networks
Due: Wednesday, February 14 by midnight
GETTING STARTED
Try the exercise involving XOR described in chapter 5 of the modeling
text.
DESCRIPTION
For this homework we will experiment with the learning rate, momentum,
and hidden layer size to get a feel for how these parameters affect
backpropagation results.
- The learning rate determines what proportion of the error
derivative is used to make changes to the weights in the network.
Very small learning rates may cause backpropagation to become stuck in
local minima in the error surface. However, very large learning rates
may cause backpropagation to jump over good areas of the error surface.
- The momentum parameter determines what proportion of the weight
changes from the previous trial will be used on the current
trial. This has the effect of maintaining weight change in the network
when the error surface flattens out and speeding up weight changes
when the error surface is steep.
- You have already seen that hidden nodes are crucial for solving
some problems such as XOR. However it is not clear how many hidden
nodes should be used.
INSTRUCTIONS
- Open up the XOR project from the Chapter 5 folder for the Tlearn
software. Under training options, set the following parameters: the
number of sweeps to 50000, seed randomly, train randomly, and halt if
RMS error falls below 0.1. We will vary the learning rate and
momentum to try to find the best combination. For each combination we
will try three different training trials and record the number of
sweeps needed to learn the task. Then we will calculate the averages
and compare. For some combinations, backpropagation may not be able
to converge on an answer. Simply record the maximum number of sweeps
for these cases. Feel free to split up the experiments with other
students and share your results.
Which combinations seem to be the best? Why do you think this is
the case?
| Learning Rate
| Momentum
| Trial 1 Sweeps
| Trial 2 Sweeps
| Trial 3 Sweeps
| Avg Sweeps
|
| 0.1 | 0 | | | |
|
| 0.1 | 0.1 | | | |
|
| 0.1 | 0.25 | | | |
|
| 0.1 | 0.5 | | | |
|
| 0.1 | 0.75 | | | |
|
| 0.1 | 1.0 | | | |
|
| 0.25 | 0 | | | |
|
| 0.25 | 0.1 | | | |
|
| 0.25 | 0.25 | | | |
|
| 0.25 | 0.5 | | | |
|
| 0.25 | 0.75 | | | |
|
| 0.25 | 1.0 | | | |
|
| 0.5 | 0 | | | |
|
| 0.5 | 0.1 | | | |
|
| 0.5 | 0.25 | | | |
|
| 0.5 | 0.5 | | | |
|
| 0.5 | 0.75 | | | |
|
| 0.5 | 1.0 | | | |
|
| 0.75 | 0 | | | |
|
| 0.75 | 0.1 | | | |
|
| 0.75 | 0.25 | | | |
|
| 0.75 | 0.5 | | | |
|
| 0.75 | 0.75 | | | |
|
| 0.75 | 1.0 | | | |
|
| 1.0 | 0 | | | |
|
| 1.0 | 0.1 | | | |
|
| 1.0 | 0.25 | | | |
|
| 1.0 | 0.5 | | | |
|
| 1.0 | 0.75 | | | |
|
| 1.0 | 1.0 | | | |
|
- For this set of experiments use the best combination of learning
rate and momentum that you found above. This time we will vary the
number of hidden units and keep all of the other parameters fixed.
You will need to modify the configuration file to change the number of
hidden units.
| Hidden Units
| Trial 1 Sweeps
| Trial 2 Sweeps
| Trial 3 Sweeps
| Avg Sweeps
|
| 1 | | | |
|
| 2 | | | |
|
| 3 | | | |
|
| 4 | | | |
|
| 5 | | | |
|
Consider the table of results. Does increasing the number of hidden
units seem to help the network in solving the task? Now look closely
at the connection weights of several of the solutions using 5 hidden
units. Describe the kinds of roles that individual hidden units are
taking on to solve this task. Are the sorts of roles seen here
significantly different than what we saw earlier in a network with
only 2 hidden units?
TURNING IN YOUR ANSWERS
Email your answers to both ekako1 and meeden@cs by
the due date.