CS87: Lab 4

For this program, you will implement Conway's Game of Life using OpenMP. You will implement two versions of your program, each one using a different assignment of grid elements to threads. Once you have your two implementations, you will run scalability experiments comparing your two solutions varying the number of threads and the size of the grid. You will present your results in a written report.

Contents:
Project Details
Starting Point Code and Tips for Getting Started
Useful Functions and Links to more Resources
Designing and Running Experiments
Written Report
Submission and Demo

Project Details

Conway's Game of Life

Conway's Game of Life is an example of discrete event simulation. Here you are simulating a world where entities live, die, or are born based on their surroundings. Each time step simulates another round of living or dying.

Your world is represented by a 2-D array of values (0 or 1). If a grid cell's value is 1, it represents a live object, if it is 0 a dead object. At each discrete time step, every cell in the 2-D grid gets a new value based on the current value of its eight neighbors:

A live cell with zero or one live neighbors dies from loneliness.
A live cell with four or more live neighbors dies due to overpopulation.
A dead cell with exactly three live neighbors becomes alive.

Your 2-D world should be a TORUS; every cell in the grid has exactly eight neighbors. In the torus world, cells on the edge of the grid have neighbors that wrap around to the opposite edge. For example, the grid locations marked with an 'x' are the eight neighbors of the grid cell whose value is shown as 1.

x  1  x  0  0  0  0
x  x  x  0  0  0  0 
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
0  0  0  0  0  0  0
x  x  x  0  0  0  0

Conway's Game of Life description from Wikipedia, shows some example patterns you can use to test the correctness of your solution (like Blinker, Toad or Beacon).

You should implement two different solutions to the game of life based on how threads are assigned to grid cells:

Row based assignment.
Column based assignment.

For example, for a 8x8 grid with 4 threads, here are the row and column based assignments showing which thread (1:4) is assigned to which grid element:

row                                  column
---                                  ------
1 1 1 1 1 1 1 1                      1 1 2 2 3 3 4 4 
1 1 1 1 1 1 1 1                      1 1 2 2 3 3 4 4 
2 2 2 2 2 2 2 2                      1 1 2 2 3 3 4 4 
2 2 2 2 2 2 2 2                      1 1 2 2 3 3 4 4 
3 3 3 3 3 3 3 3                      1 1 2 2 3 3 4 4 
3 3 3 3 3 3 3 3                      1 1 2 2 3 3 4 4 
4 4 4 4 4 4 4 4                      1 1 2 2 3 3 4 4 
4 4 4 4 4 4 4 4                      1 1 2 2 3 3 4 4

OpenMP

You do not need to learn an enormous amount of OpenMP to solve this problem. You will, of course, need to use the #pragma omp parallel to fork a set of threads to do something in parallel, and you will likely need a parallel for loop and some synchronization.

You should be careful to stick with the fork-join, fork-join, fork-join model of OpenMP; don't do things in the parallel parts that are really not parallel or you will get some weird/unexpected behavior. Do not try to "optimize" your code by reducing fork-join blocks. You should, however, think about minimizing other parallel overheads as you design a solution; your solution should be designed so that there is a performance improvement from parallelization. If your 1 thread execution wins out over the multi-thread ones, think about how you can remove some parallel overhead (think of space/time trade-offs, think about synchronization costs, ...).

Starting Point Code and Tips for Getting Started

Start by grabbing my starting point code:

cp ~newhall/public/cs87/lab4/* .

Take a look at at the code and try running it with different command line argument values for the number of threads:

./gameoflife                  # this will use default # tids based on num cpus
./gameoflife  2               # run with 2 threads

Once you understand what this code is doing, I'd recommend first trying modify the program to get a program that "assigns" threads row-wise to the the 2-D grid of values by having each thread set its part of the grid to its tid, and then test assigning tids column-wise:

./gameoflife 4

row-wise:

  0  0  0  0  0  0  0  0                    
  0  0  0  0  0  0  0  0
  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1
  2  2  2  2  2  2  2  2
  2  2  2  2  2  2  2  2
  3  3  3  3  3  3  3  3
  3  3  3  3  3  3  3  3

column-wise:

  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3
  0  0  1  1  2  2  3  3

Remember: in C, function prototypes that have 2-D array parameters, must specify the column dimension to use array[i][j] syntax for accessing bucket values in the passed array:

void printarray(int array[][GRID_DIM]);

Once you have verified that you know how to assign grid cells to threads, implement the game of life program and test it out on a small grid (16x16) for some of the test patterns shown on the Wikipedia page.

Once you have verified correctness of your solution, comment out all debug output, and add some sequential code to initialize your the grid to some random pattern of 1's and 0's (don't do a 50-50 assignment, use maybe ~20% 1's).

Add timing code to only the discrete event simulation part (not the initialization of the grid or printfs), and start running experiements.

Experimentation

You want to run experiments that will answer questions about the scalability and "goodness" of your two solutions, as the problem size and number of threads increases.

You should run experiments of your two solutions varying:

The grid size (try powers of two: 64, 128, 256, 1024, ...). You do not have to try every power of 2 between 64 and max, but run a few intermediate results between your smallest-sized and largest-sized worlds. Compile in the different sized grids. You could create differently named executables, one for each grid size, if you want to run a large number of experiements using a script.
The number of threads (1, 2, 4, 8).
The number of discrete time steps (try a couple different values that show differences across runs). You want your program to run for long enough that you have measurable differences across runs.

In addition to comparing the scalability of your solution, you should test whether or not the row-wise vs column-wise assignment has any effect on performance.

You should run multiple runs of each experiement (e.g. don't just do a single times run of a 64x64 grid, with 4 threads, and 10000 time steps).

You should be careful to control the experimental runs as much as possible. If other people using the machine you are running experiements on, their programs can interfere with your results. You can see what is running on a machine:

who to see who else is on a machine you are running on, and try another machine if you are not alone.
top to see the system load on the machine.

Also, make sure that your grid size is not too large to cause swapping. When you first try some sample runs with different grid sizes, run:

watch -n 1 cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda5                               partition       2072344 0       -1

If you notice the Used value going above 0, your grid size is too big to fit into RAM (or there are too many people running on this machine, and you need to find an idle machine).

Machine Info

The Lab Machine specs page contains information about most of the lab machines, including which are dual-core and which are quad-core.

Pick a quad-core machine to run most of your experiements. As much as possible, it is good to run all your experiements on the same machine, or at least identical machines.

In addition, the department has two 8-core machines, carrot and mushroom. You should run a limited number of experiments on these machines, as you will have to share access to them. note: mushroom is running a 64-bit version of Linux, so you will need to re-compile your program on mushroom before running it.

8-core machine reservations: I've added a lab 4, 8-core machine reservation page to the wiki. Please use this to reserve times to use these machines. If you want to run during a time someone else reserved, you can try logging in and running who to see if the machine is really in use. If it is, you will have to wait. The idea is to use this to notify each other of times you would like to use one of these machines for experiements, and to monitor your own usage by having a record of who has had access to these machines, so that we can implement some type of "fair access policy".

I suggest that you run a large round of experiements on quad-core, and then pick a few experiements from those to re-run on the 8-core, along with some additional runs with 8 threads.

Written Report

You should write a short report (no more than 3 pages) that describes the results of your experimentation. It should include the following sections:

A brief description of how you implemented the row-wise and column-wise thread assignment.
A description of the experiements you ran. What did you vary, what machine(s) did you run on, how many runs of each experiment. Also, briefly describe what you thought the expected outcome would be and why? It is fine if your expected outcome was different than what your experimental results show.
Experimental results: present your results AND describe what they show. You can use tables or graphs to present the data. Choose quality over quantity in the data you present. A couple tables with data that show scalability results in terms of number of threads and problem size is fine. It is also okay to present and discuss negative results..."we thought the X experiment would be better because...,but as shown in table 2, the Y experiment performed better. This is (or we think that this is) because ...". There is, however, a difference between negative results (well designed experiments that produce unexpected results) and bad results (results from poorly designed experiments).
Conclusions: what did you learn from your experiements? what do they say about the scalability of your solution? did they match your expectations? if not, do you have an idea of why not? did the row-wise and column-wise versions perform differently? explain why do you think they did or did not?

Useful Functions and Resources

OpenMPlinks The tutorial is a good place to start.
gettimeofday use to time sections of code. Don't include in timed sections any output stmts (printfs, cout, file I/O). The starting point code has an example of using gettimeofday.
top, who, xload get system usage info
/proc/ more useful system info.
rand C's random number generator (man rand)
CS Project Etiquette Tools and Guidelines for using shared CS resources to run long-running, intensive applications.

Tools for running experiments and collecting performance measurements

latex links. latex is Unix's document writing software. You do not have to use latex for writing your report, but it has very good support for some things that are common in scientific writing such as representing mathematical expressions, so you may want to give it a try. I have some example latex documents that you can grab to use as a starting point in ~newhall/public/latex_examples/.

Submission and Demo

Create a tar file containing:

All source, include, makefiles necessary to build and run your programs.
A README file with: (1) you and your partner's names; (2) the number of late days you used on this assignment; (3) the number of late days you used so far.
if you used scripts to run experiements, an example script.
Your written report (as a pdf)

Submit it by running cs87handin.

Demo

You and your partner will sign up for a 15 minute demo slot to demo your solution. I'd like to see examples of some different sized runs and different tid to grid cell assignments.

CS87 Lab4: OpenMP Game of Life

Conway's Game of Life

OpenMP

Machine Info

Demo