### CS63 Fall 2005 Lab 9: Learning to Predict using Simple Recurrent Networks Due: Monday, November 21 by 11am

#### INTRODUCTION

For this lab you will reproduce one of the experiments described in Elman's paper Finding Structure in Time. You may do any of the following tasks from the paper: structure in letter sequences, discovering the notion of word, or discovering lexical class from word order. I will outline how to solve the first task. If you choose to do one of the other two tasks, you should approach it in a similar manner.

You may work with a partner on this lab.

#### REPRODUCING AN EXPERIMENT

1. Create an instance of the SRN class from conx and initialize it appropriately.

For the letter sequence experiment (using guu, dii, ba), you'll need to create a recurrent network with 6 inputs, 20 hiddens, and 6 outputs. The goal of this network is to predict the next input based on the current input and its context memory of the recent past.

```from pyrobot.brain.conx import *

n = SRN()
n.predict('input','output')
```
The method addSRNLayers creates and connects up the input, hidden, output, and context layers for you. The method predict automatically creates the target from the next input value.

2. Create a similar training set to the one Elman used.

For the letter sequence experiment, Elman created one continuous sequence that contained 1000 subsequences. He randomly chose a consonant (either g, d, or, b) and then followed the consonant with the appropriate vowels (three u's for g, two i's for d, and one a for b). Then each letter was represented by a unique 6-bit pattern. I suggest you do this in two steps.

First, write a function called generateLetterSequence(n) that takes an integer n and with equal probability randomly selects either "guuu", "dii", or "ba" to append to a string n times. It then returns this string.

Next, write a function called generateNetworkSequence(letters) that takes a string of the form produced by the previous function and returns a list of bits where each letter is represented with the appropriate list of 6 bits (see Table 1 in Elman's paper for details). This should be one long continuous list of bits. You can use the python function extend to add each sublist on to the final list.

Once this list is created, you can add it to the nework's inputs as shown below. Notice that the list, called listOfBits, has been inserted inside another enclosing list. This is because for Elman style networks, the input is expected to be a set of sequences. Instead we have created one single, very long sequence, therefore we need the additional enclosing list.

```n.setInputs([listOfBits])
n.setSequenceType('random-continuous')
```
The method setSequenceType is required whenever you are using an SRN. The first word must be either random or ordered. This indicates whether the set of input sequences should be presented in the same order each time (the ordered case). The second word must be either continuous or segmented. This indicates whether the context layer should be reset between sequences (the segmented case). Since we are using one long sequence, these settings are not really relevant, but are still necessary.

3. Reproduce the training conditions as closely as possible.

For the letter sequece experiment, use an epsilon of 0.25 and a momentum of 0.1, and train your network on this data for 20 epochs. Elman states that he used 200 epochs, but after only 20 epochs the network should be performing pretty well. Save the weights to a file, so that as you work on the testing phase you don't have to re-run the learning phase each time.

```n.setResetEpoch(20)
n.setResetLimit(1)
n.setReportRate(1)
n.setEpsilon(0.25)
n.setMomentum(0.1)
n.setTolerance(0.15)
n.train()
n.saveWeightsToFile("elmanExp.wts")
```

The method setResetEpoch can be used to designate a cutoff point for the learning process. In the above case the learning will end after 20 epochs. The method setResetLimit can be used to designate how many times to try again. In our case we only want to do the learning once. The method setReportRate allows you to determine how often you'd like to see an update on the learning progress. The method setTolerance determines how close an output must be to the target to be considered correct.

4. Create another program to run the testing phase of the experiment.

For the letter sequence experiment this testing program should re-create the network, load the saved weights, set up prediction, set the inputs to do a short test sequence (something like "diiguuubadii" would be fine), set learning off, and set interactive on. Finally it should sweep through the test sequence one time while displaying the results. Remember to convert the test sequence into a list of bits just as we did before.

```n = SRN()
n.predict('input','output')
n.setInputs([testNetworkSeq])
n.setSequenceType("random-continuous")
n.setLearning(0)
n.setInteractive(1)
n.sweep()
```

5. Reproduce the analyses performed by Elman.

For the letter sequence experiment we'd like to create graphs like those in Figures 4 and 5 of Elman's article. To do this we'll need to calculate the overall error for each pattern as well as the individual error on particular output units. Unfortunately the interactive mode with the sweep method does not report error. So let's create our own enhancement to the SRN class and add some code to do this.

In your testing program, create a new class, as shown below, which is a specialization of the SRN class. It is exactly the same as the original SRN class except that it adds a new definition of a method called postStep. For every step of a sequence, the sweep method calls the step method. The postStep method is something that can be overridden to add your own special processing at the end of every step. In our case, we'd like to save error values for particular units on the output layer as well as for the entire output layer.

```class MySRN(SRN):
def postStep(self):
'''
Assumes that three files have already been opened for writing:
totalError, unit0Error, and unit3Error.
'''
unitErrors = map(abs, self.getLayer('output').error)
sumError = 0
for index in range(len(unitErrors)):
sumError += unitErrors[index]
if index == 0:
unit0Error.write("%s %s\n" % (self.count, unitErrors[index]))
if index == 3:
unit3Error.write("%s %s\n" % (self.count, unitErrors[index]))
totalError.write("%s %s\n" % (self.count, sumError))
```
The instance variable self.count counts how many times the forward propagate method has been called. So in our case it represents a step count.

Prior to defining the MySRN class, you should open the files that the postStep method assumes will be available. At the end of your test program you should also close them.

```totalError = open("toterr", "w")
unit0Error = open("unit0", "w")
unit3Error = open("unit3", "w")
totalError.write('"toterr"\n')
unit0Error.write('"unit0"\n')
unit3Error.write('"unit3"\n')
```
In your test program, create an instance of MySRN rather than the standard SRN class. Also comment out the line that sets interactive on. Now when you run your test program it should generate three files that can be viewed using xgraph as shown below.
```xgraph -P toterr unit0 unit3
```
These graphs should demonstrate that network is recognizing the predictable portions of the sequence as well as the predictable portions of the individual bit representations. In particular, bits 0, 1, 2, and 5 should be predictable, while bits 3 and 4 are not.
6. (EXTRA CREDIT) Once you have successfully reproduced the original experiment, you have the option to try a new variation.

#### HAND IN

Use cs63handin to turn in a tar file of an entire directory that should contain all of the code needed to run your experiments, as well as examples of your post-training analysis, and a text file.

In the text file you should discuss whether the reproduction of the experiment was successful. If not, you should try to account for the differences. If you tried any additional experiments, describe them in detail.

To tar up your directory do the following:

`tar -cf srn.tar directoryName `