AlphaHex project

Project Option 2: AlphaHex

 - - - - - - - -
\ · · · · · · ● ● \
 \ · · · · · · ● ● \
  \ · · · · · · ● ● \
   \ · · · · · ● ● ● \
    \ · · · ● ● ● ● · \
     \ ● ● ● ● ● · ● · \
      \ ● ● ● ● · · · · \
       \ · · · · · · · · \
          - - - - - - - -

AlphaGo

In 2016, Google stunned the AI and game-playing world when AlphaGo defeated former world champion Lee Sedol in a 5-game go match. Prior to this match most experts believed that a human-level go player was at least a decade away. Since the match, Google has published multiple papers on AlphaGo and its improved variants, and AlphaGo's innovations have spurred many advances in AI and machine learning.

This semester, we have learned about each of the foundational pieces that AlphaGo is built on, including

Monte Carlo tree search
reinforcement learning
deep neural networks

In this project you will have the chance to put them all together to train an agent that plays 8x8 hex.

Getting Started

Your first step in working on this project should be to grab the starting point code from my public directory:

cd /your/project/directory
cp -r /home/bryce/public/hex_project .
git add hex_project

Most of the files in this directory resemble ones you have seen before. Hex.py implements and allows you to play the game of hex. You should copy over your MonteCarloTreeSearch.py agent from lab 4 so that so that you can play against it (you will also have to uncomment lines 120 and 125 in Hex.py, which import it). hex_data.npz has the same format as the similarly named file from lab 9, but now contains over 24k examples from HexPlayerBryce self-play games. AlphaHex.py is currently empty except for a stub that allows Hex.py to import it.

Reference Material

You are encouraged to read as much as you can about AlphaGo. Google's papers on AlphaGo are all available online:

You should also refer back to the readings I've assigned on AlphaGo:

Also, check out the other AlphaGo related links from the resources section of the course page:

AlphaGo, in context
AlphaGo Zero: Minimal Policy Improvement, Expectation Propagation and other Connections
Reddit discussion of AlphaGo Zero There are lots more resources about AlphaGo out there; a quick Google search turns up dozens more.

AlphaGo versus AlphaZero

There are multiple variants of AlphaGo, and different references will describe different attributes of each. You are welcome to replicate any version of AlphaGo, or piece together parts from different versions, or design your own solutions to problems you encounter that differ from the ones AlphaGo implements. The key requirement is that you use deep reinforcement learning, Monte Carlo tree search, and self-play data to train your agent.

As you try variations and as you train, be sure to save intermediate versions of your agent so that you can test them against your final agent. This will be crucial to the write-up both in terms of demonstrating the success of your agent and experimentally justifying the design choices you made.

Submitting

Before the deadline, you need to submit the following things through git:

The python code you wrote to implement your expermints.
A README file explaining how to run your code and access your data set(s).
The latex file with your write-up: project.tex

In addition, you must turn in a hard copy of the writeup pdf outside my office.

In the LaTex file, project.tex, you will describe your project. This file already contains a basic structure that you should follow. Feel free to change the section headings, or to add additional sections. Recall that you use pdflatex to convert the LaTex into a pdf file.

As your project develops and you create more files, be sure to use git to add, commit, and push them. Run: git status to check that all of the necessary files are being tracked in your git repo. Don't forget to update the README so that I can test your code!