CS81 Lab6: Intrinsic Motivation

Due next Friday by noon

To begin, run update81 to copy the starting point files into your home directory (cs81/labs/6/).

Introduction

In this lab we will try two intrinsic motivation systems, IAC and CBIM, on a simple experiment and compare the kinds of categories that are formed and the developmental stages that result.

Intelligent Adaptive Curiosity (IAC)

IAC was developed with the goal of providing a robot with an intrinsic motivation system that pushes it to focus on situations that maximize its learning progress. IAC contains a memory that is sub-divided into sensorimotor regions. Each region contains:

In the original description, IAC's memory is organized hierarchically as a tree. Initially all exemplars are members of a single region. When the region grows too large it is split into two new regions. The split is made based on a feature of the SM(t) component of the exemplars such that the sum of the variances of the S(t+1) component, weighted by the number of exemplars in that new region, is minimized. This process continues recursively on any region that grows too large.

The main processing loop of IAC works as follows:

Category-Based Intrinsic Motivation (CBIM)

CBIM uses much of the framework of IAC but replaces the hierarchical memory with a Growing Neural Gas (GNG) that only forms new categories when the existing ones are sufficiently different from the current data. Unlike IAC, the growth of CBIM's memory is bounded by the complexity of the robot's sensory and motor capabilities.

Another difference is that in the original IAC model, the experts are k-nearest neighbors. In CBIM, the experts are feed-forward neural networks.


A simple experiment

In their paper Intrinsic motivation systems for autonomous mental development, Oudeyer, Kaplan, and Hafner describe an experiment that is designed to test the effectiveness of IAC's goal of maximizing learning progress. One part of the domain is easy to learn, one part of the domain is more complex, and another part is unlearnable. We should expect to see both IAC and CBIM first focus on the easy portion, then focus on the more complex portion, while ignoring the unlearnable portion.

We will do a similar experiment to the one they described. The robot is placed in an environment with a smart toy. The robot can translate forward or backward and also emit a frequency that makes the toy move. When the translation amount is:

When the frequency is: The robot then senses its distance from the toy (with some added noise).

The toy starts at a fixed position in front of the robot. If the robot emits the first frequency, then the distance to the toy is unpredictable. If the robot emits the second frequency and moves forward, then it will decrease the distance from the toy. If the robot emits the second frequency and moves backward, then it will increase the distance from the toy. Thus the second frequency is predictable, but the resulting distance depends on the direction of motion. Finally, if the robot emits the third frequency, then the distance to the toy will be close to 0. This is the most predictable case.

In this setup, SM(t) consists of three items:

  1. currentDistance
  2. translation
  3. frequency
and S(t+1) consists of one item:
  1. nextDistance

The paper describes an experiment that lasts for 5000 steps. In the first 250 steps, the robot emits all three frequencies equally. This is when the initial region first splits, and the robot begins focusing on the third frequency. From then until about time step 3000, the robot emits the third frequency (the easiest one to predict) about 90% of the time. From that point on it focuses on emitting the second frequency (the more complex one to predict) about 85% of the time. It never spends more that 10% of the time on the first frequency, which is unlearnable.

Running IAC

Let's begin with a short experiment that will last only 500 steps. To start the experiment do:

python iac.py

Near the end of the iac.py file, an IACBrain is constructed and passed a number of arguments including the maximum region size, the motor vector size, the sensor vector size, the maximum number of steps in the experiment, and the probability of a random action. This is where you can modify the parameter settings for additional experiments.

Running iac.py will generate a number of data files recording the percentage of time the robot emits each of the three frequencies during every 50 time steps. These data files are updated continuously and can be viewed while the experiment is running. The following will produce a graph similar to Figure 4 on page 274 of the IAC paper:

xgraph -ly 0,1 -P *.data

After the experiment is completed, additional data files will be written. One called log will contain a description of all of the regions formed during the experiment. Others with a .err extension summarize the mean error of each region over time. The following will produce a graph similar to Figure 5 on page 274 of the IAC paper:

xgraph -P *.err

Based on the information in the log file, draw a tree representation of the regions formed by IAC. Include the cut points at each branching point. When IAC makes a split, exemplars containing values less than the cut point will be placed in the left branch, and exemplars containing values greater than the cut point will be placed in the right branch. Based on these cut points, which regions should be predictable?

Now look at the error graphs. Is the error dropping for predictable regions? What do the error graphs look like for the unpredictable regions?

Now look graphs showing how often the robot emits each frequency. Do these graphs show similar trends as in the original experiment?

Running CBIM

Again let's begin with a short experiment that only lasts 500 steps. To start the experiment do:

python cbim.py

Near the end of the cbim.py file, a CBIMBrain is constructed and passed a number of arguments including the motor vector size, the sensor vector size, the error threshold for when new GNG units will be added, the probability of a random action, and the maximum number of steps in the experiment. This is where you can modify the parameter settings for additional experiments.

CBIM will also generate three .data files summarizing how often the robot emits the three different frequencies. It will create a log file summarizing each step. These files can be viewed while the experiment is running.

After the experiment is completed, it will generate .err files summarizing the mean error of each region over time. The file gngfinal.txt will contain a summary of the GNG units formed, including their model vector, error, visits, and some exemplars.

Based on the information in the file gngfinal.txt describe each model vector and determine which represent predictable situations and which do not. Look carefully at the sample exemplars for each model vector. Are there cases where exemplars from two different frequencies are being categorized together?

Now look at the error graphs. Is error dropping for predictable regions? What do the error graphs look like for the unpredictable regions?

Now look at the graphs showing how often the robot emits each frequency. How soon does the robot begin focusing on most predictable frequency?

Comparing IAC and CBIM

Now we will run some longer experiments and compare the results obtained using both approaches. Do three experiments of at least 2000 steps for both IAC and CBIM.

Analyze the results. For each experiment report on:

Contrast the two approaches. What are some of the advantages and disadvantages of each?

Submit

Email your write up to meeden@cs.swarthmore.edu. Be sure to include all partner names.