CS 10, Spring 1998 -- Lab 8.2

CS 10, Spring 1998

Lab 8.2 Artificial Intelligence-- Computer Learning

Assignment for Thursday

Study for the final exam

Lab 8.2 Instructions

Copy nimgame5 from the "module8" folder on the "Classes" file server to your disk.

The goal of this lab is to perform some experiments to evaluate which conditions optimize the speed at which the computer learns to win at nimgame. To do this, you'll need to try different values for the learning parameters fwingain, flosegive, swingain, and swingive, which are set in the script of the "Reset Everything" button.

During lab today, choose one or more of the questions below to investigate. When you've finished, call one of us over to discuss your results. When evaluating how well the computer has learned, consider not only the number of games it has won, but also how well its move matrix represents the correct strategy for winning every time.

Does the computer learn faster against an opponent whose moves are determined completely at random, or against a smart opponent?
- To see how quickly player 1 learns against a random opponent, set player 2's learning parameters (flosegive and fwingain) to 0.
- To see how player 1 fares against a smart opponent, set player 2's learning parameters to 0, and set the values of player 2's move matrix so that they represent the best move to make in each case.
Does the computer learn better from both "positive" and "negative reinforcement" (learning both when it wins and when it loses), or from only "positive reinforcement" (learning only when it wins)? Setting the learning parameter flosegive to 0 will cause the computer to learn only from wins.
Does the computer learn better when the changes it makes to its strategy are more gradual (this happens when the learning parameters are closer to 0) or more drastic (this happens when the learning parameters are closer to 1)?
What happens when the computer learns against a pathologically stupid player2? Will it learn the correct strategy for winning every game, or will it learn to beat only a player which makes all the wrong moves? To test this, set player 2's learning parameters to 0, and set the entries in its move matrix so that player2 will always make the worst possible move (example: when there are two sticks left, player2 will pick up both sticks)
What happens when two learners play against each other? How quickly does player 1 learn when it plays against a learner? To test this, compare what happens when player2's learning parameters are both set to 0 (so player 2 isn't learning at all) to what happens when player 2's learning parameters are set to the same values as player 1's learning parameters are set to.
How much longer does it take the computer to learn when it plays with 10 matchsticks instead of 6? To test this, change the value of BegPileSize from 6 to 10 in the script of "Reset Everything."