CS81 Lab5: Unsupervised categorization

Due by noon next Thursday

To begin, run update81 to copy the starting point files into your home directory (cs81/labs/5/).

You'll also need to make a change to your path so that you will have access to a principal components analysis program. Edit your .bashrc file and add the line:

export PATH=$PATH:/usr/local/pyrobot/tools/cluster
Then at the unix prompt do: source .bashrc. This will only update the current terminal window, so you may want to log out and back in again.

Introduction

Growing Neural Gas (GNG) was developed by Bernd Fritzke and described in his 1995 paper A growing neural gas learns topologies. GNG is an unsupervised learning method for dimensionality reduction. Given some high dimensional distribution of data, such as the sensorimotor data from a robot, a GNG will find a topological structure that closely matches the given distribution.

The GNG consists of a network of units and edges that are used to characterize the topological space in which the input vectors reside. Each unit contains a model vector that summarizes a portion of the overall distribution. The number of units and edges is not fixed in advance. The GNG is able to expand and contract as necessary to better fit the data by adding or deleting units and edges.

A GNG begins with two units that are assigned random initial model vectors. To process an input vector, the GNG finds the nearest and next nearest unit model vectors based on Euclidean distance. This distance is also used as a measure of error, which the GNG stores in the nearest unit. All units connected to the nearest unit are shifted toward the input by a fraction of the error. If the nearest and next nearest units are not connected, an edge is added between them.

In the original GNG, a new unit is added after a fixed number of time steps determined by the user. This unit's model vector is placed between the unit with the greatest accumulated error and its neighbor with the greatest accumulated error. In an alternative implementation of GNG, called an Equilibrium GNG, units are only added when the average error across all of the GNG's units exceeds a given threshold. This approach makes it possible to grow the GNG in response to new data that doesn't fit the current topology of the network, but prevents unnecessary units when incoming data is similar to existing model vectors.

Part 1. Observing the GNG on various distributions of data

The file gngdists.py contains the definitions for a number of different distributions that can be used to test the GNG:


Part 2. Observing the GNG on sensorimotor data from a robot

The file categorize.py contains a brain for a simulated pioneer robot that is wall following on the right. The file twoRoomWorld.py contains the definition of a pyrobot environment containing two rooms, where there is a short hallway between the rooms and one room is much larger than the other. On each time step the robot will check its sonar sensors and determine a motor action. It will also summarize its sensorimotor data into a 5-dimensional vector containing the minimum left, front, and right sonars as well as its translate and rotate motor values. All values in this vector are scaled to the range [0,1]. This vector will be input into a GNG. After approximately one circuit around the environment, the robot will stop and a GNG movie will be generated.

Answer the following questions:

  1. Describe each model vector in words based on the sonar and motor values. Each GNG model vector consists of five values. The first three values are scaled sonars, where high values indicate there are distant obstacles and low values indicate there are close obstacles. The last two values are scaled translate and rotate commands. To unscale these values, multiply by 2 and subtract 1. For example, a scaled translate value of 0.75, when unscaled is 0.5, which equates to moving forward at half speed. Similarly a scaled translate value of 0.25, when unscaled is -0.5, which equates to moving backward at half speed.
  2. Based on your descriptions from above, do the model vectors seem to equate with reasonable categories for this environment and task?

Part 3. Experimenting with Equilibrium GNG

Modify the file gng.py so that instead of inserting a unit on a fixed schedule it will only insert a unit when the average error in the GNG exceeds a threshold. Be sure to add a print statement to indicate the time step and error in the network when a unit is added.

Answer the following questions:

  1. Experiment with threshold values of 0.5, 0.75, and 1.0. Feel free to try others in addition to these. Record the total number of units created in each case.
  2. Re-run the GNG on the planes distribution. How do the results differ from the first run?
  3. Re-run the GNG on the wall following robot. Where is the robot in the environment when units are added? Do the unit insertions seem to coincide with key locations in the world? Do the model vectors created by the equilibrium GNG seem more category-like than those created by the standard GNG?

Submit

Write up your observations in a file called summary.txt and run handin81 to submit it.