Lab 4: General Game Playing
Due October 15th by midnight

 - - - - - - - -
\ · · · · · · · · \
 \ · · · · · · · · \
  \ · · · · · · · · \
   \ · · · · · · · · \
    \ · · · · · · · · \
     \ · · · · · · · · \
      \ · · · · · · · · \
       \ · · · · · · · · \
          - - - - - - - -


 - - - - - - - -
\ · · · · · ·   \
 \ · · · · · ·  ● \
  \ · · · · · ·  · \
   \ · · · · ·  · · \
    \ · · · · · · · · \
     \ ·  · · · · · · \
      \ · · ·  · · · · \
       \ · · · · · · · · \
          - - - - - - - -


 - - - - - - - -
\ · · · · · ·   \
 \ · · · · · ·   \
  \ · · · · · ·   \
   \ · · · · ·    \
    \ · · ·     · \
     \      ·  · \
      \     · · · · \
       \ · · · · · · · · \
          - - - - - - - -

General game players are systems able to accept descriptions of arbitrary games at runtime and able to use such descriptions to play those games effectively without human intervention. In other words, they do not know the rules until the games start. Unlike specialized game players, general game players cannot rely on algorithms designed in advance for specific games.

Starting point code

As in the previous lab, use Teammaker to form your team. You can log in to that site to indicate your partner preference. Once you and your partner have specified each other and the lab has been released, a GitHub repository will be created for your team.

Introduction

The objective of this lab is to use Monte Carlo Tree Search to implement a general game playing agent.

The primary Python file you will be modifying is MonteCarloTreeSearch.py. You have also been provided several files that should look familiar from lab 3:

There are also several new python programs:

And finally, you should have one "compiled" python program:

Just like last week, all games are played using PlayGame.py. The interface has been updated slightly; you can use the -h option for more information.

To get started, try playing hex against your partner. This game has a large branching factor, so you'll likely have to scroll up to see the game board between turns. The game is played on a grid where (0,0) is a the top left corner and (7,7) is the bottom right corner.

./PlayGame.py hex human human

Implementation Tasks

To begin, open the file MonteCarloTreeSearch.py and complete the implementations of the two classes provided.

  1. Node class:

    1. You will need to implement the method UCBWeight(UCB_constant, parent_visits, parent_turn) used in node selection. The UCB_weight is calculated according the formula in the node selection section of mcts.ai.
    2. This class will also need to implement the method updateValue(outcome) used for value backpropagation. The outcome will be either +1, -1, or 0 representing a win for the maximizer, a win for the minimizer, or a draw. Recall from class that we will be calculating value according to this formula:
        value = 1 + (wins-losses)/visits
      
      The benefits of this formula are that:
      • wins are valued better than draws and draws are valued better than losses
      • value will always be positive, in the range [0,2], and positive values are necessary for the UCBWeight method

  2. MCTSPlayer class:

    1. You will need to implement the method getMove(game_state) which is called by the PlayGame.py program to determine the player's next move. It should:
      • check whether a node already exists for the given game state, and if not create one
      • call MCTS on the node
      • determine the best move from the node, taking into account the current player at the node
      • return the best move

      Here's pseudocode that fleshes out these steps:

      getMove(game_state)
         # Find or create node for game_state
         key = str(game_state)
         if key in tree
            curr_node = get node from tree using key
         else
            curr_node = create new node with game_state
            add curr_node to tree using key 
         # Perform Monte Carlo Tree Search from that node
         MCTS(curr_node)
         # Determine the best move from that node
         bestValue = -float("inf"); bestMove = None
         for move, child_node in curr_node's children
            if curr_node's player is +1
               value = child_node.value
            else
               value = 2 - child_node. value
            if value > bestValue
               update bestValue and bestMove
         return bestMove
      

    2. Debugging MCTS can be challenging due to the randomness inherent in the rollouts. Implement the status(node) method so that you can easily view the contents of a particular node within the tree. For example, let's play a game of Nim starting with 7 pieces, where we do 1000 rollouts per turn:
      ./PlayGame.py nim mcts random
        
      Here's an example status that might be printed for the root node after the first turn:
       node wins 988, losses  12, visits 1000, value 1.98
      child wins   0, losses   2, visits    2, value 0.00, move 3
      child wins   7, losses   4, visits   11, value 1.27, move 1
      child wins 981, losses   7, visits  988, value 1.99, move 2
        
      Notice that the best move based on the rollouts is to take 2, which puts our opponent at 5 pieces. We saw in class that, with optimal strategy, playing from 5 pieces is a guaranteed loss. MCTS has also discovered this via the rollouts.
    3. Lastly, you must complete the MCTS(node, num_rollouts) method. This method takes a node from which to start the search, and the number of rollouts to perform.

      Each rollout:

      • selection: navigates explored nodes using the UCB weight to select the best option until it reaches the frontier (traverse tree by always choosing the child with highest UCB value)
      • expansion: expands one new node (choose an untried move, find the next-state resulting from that move)
      • simulation: performs a random playout to a terminal state (play random v. random until endgame)
      • backpropagation: propagates the outcome back to expanded nodes along the path (update value and visit counts for nodes on the path from root to selected_node) of selection and expansion

      Pseudocode for MCTS is provided below:

       MCTS(current_node)
          repeat num_rollout times
             path = selection(current_node)
             selected_node = final node in path
             if selected_node is terminal
                outcome = winner of selected_node's state
             else
                next_node = expansion(selected_node) 
                add next_node to end of path
                outcome = simulation(next_node's state) 
             backpropagation(path, outcome)
          status(current_node) # use for debugging
      
      You will certainly want to break this task down using several helper methods, at least one for each phase of the algorithm.

As always, be sure to follow good development practices; this includes things like incremental testing, keeing your code readable (good comments, good names, etc.), making use of modularity (using helper functions, polymorphism, etc.), and so forth.

Extensions

When you have completed the above implementation tasks, you are encouraged to try at least one of the following extensions:

For any extension that you try, describe what you did and the outcome in the file called extensions.md. To get full credit for the assignment, you'll need to do at least one extension; doing more than one can get you bonus points. Note that the description in extensions.md is required to get credit.

Submitting your code

Use git to add, commit, and push your code.