Lab 4: General Game Playing

Lab 4: General Game Playing
Due March 6 by midnight

 - - - - - - - -
\ · · · · · · · · \
 \ · · · · · · · · \
  \ · · · · · · · · \
   \ · · · · · · · · \
    \ · · · · · · · · \
     \ · · · · · · · · \
      \ · · · · · · · · \
       \ · · · · · · · · \
          - - - - - - - -


 - - - - - - - -
\ · · · · · · ● ● \
 \ · · · · · · ● ● \
  \ · · · · · · ● · \
   \ · · · · · ● · · \
    \ · · · · · · · · \
     \ · ● · · · · · · \
      \ · · · ● · · · · \
       \ · · · · · · · · \
          - - - - - - - -


 - - - - - - - -
\ · · · · · · ● ● \
 \ · · · · · · ● ● \
  \ · · · · · · ● ● \
   \ · · · · · ● ● ● \
    \ · · · ● ● ● ● · \
     \ ● ● ● ● ● · ● · \
      \ ● ● ● ● · · · · \
       \ · · · · · · · · \
          - - - - - - - -

General game players are systems able to accept descriptions of arbitrary games at runtime and able to use such descriptions to play those games effectively without human intervention. In other words, they do not know the rules until the games start. Unlike specialized game players, general game players cannot rely on algorithms designed in advance for specific games.

Starting point code

As in the previous lab, use Teammaker to form your team. You can log in to that site to indicate your partner preference. Once you and your partner have specified each other, a GitHub repository will be created for your team.

Introduction

Before you begin, copy over your MinMaxPlayers.py, Heuristics.py, and (if you implemented it) TournamentPlayers.py from last week. If you did not implement TournamentPlayers.py, a basic version has been provided for you. You'll need to edit this to make it viable for game play. This will allow you to play your MCTS agent (once you have implemented it) against last week's alpha/beta pruning agents, for the games Mancala and Breakthrough.

In this lab you will use Monte Carlo tree search to implement a general game playing agent. The primary Python file you will be modifying is MonteCarloTreeSearch.py. You have also been provided several files that should look familiar from lab 3:

PlayGame.py
Mancala.py
Breakthrough.py
BasicPlayers.py

There are also two new python programs:

Hex.py, which implements hex, the game depicted above, where players attempt to form a connected path between opposing sides of the board.
BoardGame.py, containing a base class that implements common functionality for, hex, mancala, breakthrough, and the mystery games.

And finally, you should have one "compiled" python program:

MysteryGames.pyc, which implements two unspecified games, MysteryGame0 and MysteryGame1 (see if you can figure out the rules).

Just like last week, all games are played using PlayGame.py. The interface has been updated slightly; you can use the -h option for more information.

To get started, try playing hex against your partner. This game has a large branching factor, so you'll likely have to scroll up to see the game board between turns. The game is played on a grid where (0,0) is a the top left corner and (7,7) is the bottom right corner.

./PlayGame.py hex human human

Implementation Tasks

Node class: You will need to implement the Node.UCBWeight() method used in node selection. The UCB_weight is calculated according the formula in the node selection section of mcts.ai. This class will also need to implement the Node.updateValue() method used for value backpropagation. It is up to you how to define the value of a node, but it should roughly correspond to a winning percentage. The only strict requirements are that:
- wins are better than draws and draws are better than losses
- value is always positive (this is necessary for UCBWeight method)
You may also wish to store some extra information in the Node class to facilitate these computations, such as number of wins, and a list of untried moves. Make sure you create a COPY of the available moves when initializing the untried moves.
MCTS: Your primary implementation task is to code the Monte Carlo tree search algorithm. The MCTS() function takes a node from which to start the search, and a number of rollouts to perform. Each rollout:
- navigates explored nodes using the UCB weight to select the best option until it reaches the frontier
- then expands one new node
- then performs a random playout to a terminal state
- then propagates the outcome back to expanded nodes along the path of selection and expansion
When all rollouts are complete, the function deterministically returns the best move available from its start node. Pseudocode for MCTS is provided in the next section. You will certainly want to break this task down using several helper functions.

Monte Carlo Tree Search

Each time it is your agent's turn to move, your agents getMove function should call MCTS(), passing in a node from which to start the search. Your MCTS() function should implement the following pseudocode.

function MCTS(root_node)
  loop for number of rollouts
    node = root_node
    initialize path with node
    # selection
    while all children of node are expanded and node not terminal
       node = choose child with max UCB value
              (based on parent player's perspective)
       add node to path
    # expansion
    if node not terminal
       pick random unexplored move
       apply move to the node's state to get a next_state
       if str(next_state) in tree's nodes
          find next_node in tree's nodes using key str(next_state) 
       else
          create a next_node for the next_state
          add next_node to graph's nodes using key str(next_state)
       add the next_node to the node's children using move as key
       add the next_node to the path
       # simulation
       outcome = random_playout(next_node's state)
    else
       outcome = node's state winner
    # backpropagation
    update values for all nodes in path based on outcome

Extensions

When you have completed the above implementation tasks, you are encouraged to try some of the following extensions:

Play your agent against itself.
Play your agent against your TournamentPlayer agents from last week.
Try varying the number of rollouts.
Try varying the UCB_CONST parameter to trade off exploration vs exploitation. How does the optimal UCB_CONST value depend on the game? How does it depend on the amount of prior knowledge? The number of rollouts?
Try saving the tree that MCTS has built for re-use in the next game. Useful libraries for accomplishing this include json and pickle. Try playing the version with saving against one without. How much does this improve the agent's play?
Try to determine the rules of the games implemented in MysteryGames.pyc.

For any extension that you try, describe what you did and the outcome in a file called extensions.txt.

Submitting your code

Before submitting, ensure that you have added your name(s) to the top-level comments. To submit your code, you need to use git to add, commit, and push the files that you modified.