CS63 Spring 2004
Project 3: Robot learning
Due: Monday May 3, by noon

Contents


Introduction

For this project you will use some form of reinforcement learning to teach a robot to perform a task of your choice. As a first step you must determine an appropriate task given the available tools. You should assume you will be using the Player/Stage simulator to control a Pioneer robot through Pyro. Here are some of the capabilities that you could incorporate into your task:

If you plan on using one of the services, then experiment with it to be sure you understand its capabilities. There are examples in the
Pyro Modules on the wiki.

Formulating a task

Create a detailed description of your task. How will you represent the input state? How will you represent the motor output? Since we will be using a neural network as the learning mechanism, you should think about how to scale the inputs and outputs to values between 0 and 1 or between -1 and 1 (you'll need to use a different activation function to do this).

In order to use a reinforcement learning method you will need to create a reinforcement procedure. Typically this procedure would take two states, the state prior to executing an action and the state that resulted from executing that action. It would then return a reinforcement value: negative for punishment, 0 for none, or positive for reward. If you plan on using a Genetic Algorithm to evolve the weights of a neural network, then the reinforcement values must always be positive. If you plan on using Qlearning with a neural network to learn expected values, then the reinforcement values must be between -1 and 1.

The frequency with which you provide a non-zero reinforcement value will determine how difficult the task is to learn. Delayed tasks, where reinforcement is only given at the time of goal achievement, are the hardest. Immediate tasks, where reinforcement is given at every time step, are the easiest. Intermediate tasks, where reinforcement is sporadic, are also possible. If you are using a Genetic Algorithm, then every task is essentially delayed because feedback from the fitness function is only given at the end of a task.


Reinforcement learning options


Pyro tips


Paper guidelines

You will turn in a 4-6 page paper describing your project. Your paper should include the following:

Your grade will not be based on whether your experiment succeeds or fails. Negative results are as important as positive results. Your grade will be based solely on the thoroughness and readability of your paper.


Handing in your paper and programs