Homework 2: MPI programming assignment
Due: Sunday Feb 19 before 4pm

For this assignment you will write a simple MPI program using the lam implementation of MPI, and run and test your program on machines in the CS lab. Once your program works, run some scalability tests of your solutions and write up a report of your results.

Start by following the instruction below on how to run lam on the machines in the CS lab, and then try to run some of the example MPI programs. After you have some practice setting up lam and running example mpi applications, you should implement the Game of Life program as defined below.

Running LAM, Example MPI applications to try out, The Assignment, Links to on-line MPI help


Running LAM on CS Lab machines

You need to set up ssh so that you can ssh into lab machines without having to enter your password:
    Here's what to do:

     On some lab machine:

     % ssh-keygen -t dsa
       (accept the defaults, and just hit RETURN when asked for a passphrase)

     % cd ~/.ssh
     % cat id_dsa.pub  >> authorized_keys2    

     now try it out: ssh somemachinename 
Next, set your LAMRSH environment variable:
    # if you use bash:  add to your .bashrc or .bash_profile file:
    export LAMRSH=ssh

    # if you use another shell (you likely don't) add to your .cshrc  file:
    setenv LAMRSH ssh
The main steps that you will need to take to run MPI programs are the following:
  1. Boot lam
    To boot lam, you need to have a lamhosts file in your lam subdirectory, and run recon to see if lam can be started on all nodes in your hostfile:
     % recon -v lamhosts   # see the example below for this file's format 
         
    Then, to boot lam on the hosts in your host file (this has to be once at the beginning of each session you run mpi programs):
     % lamboot -v lamhosts 
  2. Run MPI applications
    • Compile your application using mpicc for C code, and mpic+= for C++ code (mpif77 is the mpi compiler for Fortran):
       % mpicc -o myprog myprog.c 
    • Create an appfile describing which host will run which executable(s) In some cases you may have more than one executable in your mpi application (see the mandelbrot example below where there are separate master and slave executables). Use mpirun and the appfile to run your MPI application:
       
           % cat appfile
             # 18 a.outs across 6 nodes (will spawn 3 a.outs per node)
             n0-5 /home/newhall/lam/test/a.out 
             n0-5 /home/newhall/lam/test/a.out 
             n0-5 /home/newhall/lam/test/a.out 
           % mpirun -v appfile	 
           
      or use the -np # command line option to mpirun to specify the number of processes to start on the nodes. For more options, see the mpirun man page.
            % mpirun -np 25 /home/newhall/lam/test/a.out
           

    • After each individual run of an MPI application, it is good to run lamclean to clean up any residual state from the last run before starting a new run:
       % lamclean -v 
  3. mpitask
    As your MPI application runs, you can look at the state of the tasks by running mpitask. To continuously run this program every X seconds, use watch:
           # every 1 second, run mpitask
           %  watch -n 1  mpitask
          
  4. Shutdown lam
    After you are done running your MPI applications, you should shut down lam by running lamhalt, or if lamhalt doesn't completely clean things up, then try lamwipe:
     
         % lamhalt
         % lamwipe -v lamhosts 

An Example MPI application to try

Before you start coding, try copying over and running some of the sample MPI programs in /usr/share/doc/lam4-dev/examples/main/. For example, here is what you need to do to run the mandelbrot code (following instructions from the
lam tutorial):
  %  mkdir lam
  %  cd lam
  %  vi lamhosts	# add some hosts 
     
  %  recon -v lamhosts 
  %  lamboot -v lamhosts 
  %  tping -c1 N
  %  cp -r /usr/share/doc/lam3-dev/examples/main/mandelbrot .
  %  cd mandelbrot/
  %  gunzip *.gz
  %  mpicc -o master master.c 
  %  mpicc -o slave slave.c 
  %  vi appfile			# create appfile schema file: it describes
				# the application: each node and program
				# the node names come from 'recon -v lamhosts'
  %  cat appfile
     # 1 master, 5 slaves
     n0 master 
     n1-5 slave 

  %  mpirun -v appfile 		# run the MPI application
  %  xv mandel.out 		

  %  lamclean -v		# clean up lam stuff from this run
  %  mpirun -v appfile 		# re-run the mandelbrot app 
  %  lamhalt -v			# halt lam...you are done running lam apps
  %  lamwipe -v ../lamhosts     (if halt didn't work)
Try running this application with different numbers of slaves by modifying the appfile, and login to a node running slave programs, run top, and watch as slave program(s) are spawned and start running.

The Assignment

You will implement a solution to the Game of Life Problem in MPI, and you will test your solution for different problem sizes and different numbers of nodes.

The Game of Life is a simple example of a class of problems that can be modeled using cellular automata; some physical and biological systems that evolve over time can be modeled using cellular automata. The problem space is divided into a number of cells each of which is a finite state machine. Each cell's evolution is computed in steps of time; each cell makes one state change, then each cell makes the next state change, and so on.

In the Game of Life, each cell, represented by an (i,j) element in an MxN matrix, contains an organism that is either dead or alive (represented by a 0 or 1). At each step in the game, a cell's state changes based on its current state and the state of its neighbors. The following are rules for a cell's state transition at each step:

  1. A live cell with zero or one live neighbors dies from loneliness.
  2. A live cell with four or more live neighbors dies due to overpopulation.
  3. A dead cell with two or three live neighbors becomes alive.
  4. Otherwise, a cell's state stays unchanged.

Using the lam implementation of MPI, write a message passing version of the game of life that takes three command line arguments: two dimension values (M and N) for the matrix; and K the number of iterations as input. The easiest way to parallize this program is to have one process for each cell in the MxN matrix. You are welcome to try another way of dividing up the matrix if you'd like (either way is fine). No matter how you decide to distribute the matrix, you should choose one process to be the master (0 is a good choice), and the others to be the slaves. The master should participates as a cell in the MxN matrix (or as a set of cells), but is also responsible for sending slaves their original values and receiving the final result from the slaves and printing it to stdout. I'd recommend using the SPMD model for your program, rather than having separate master and slave executables.

You should also have a debug mode where each slave sends the master its value at the end of a round, and the master after receiving values from all slaves, prints out the MxN matrix of round i values. This will help you debug your solution.

When your program is not running in debug mode, slaves should not send the master their intermediate results.

At each step, each cell needs to exchange information with its neighbors. Keep in mind that a cell in the middle has eight neighbors, and a cell in a corner has only three neighbors, and a cell on an edge has five neighbors. Make sure that at each iteration i, each process gets values from each of its neighbors corresponding to their ith state, and that you solution is deadlock free.

Here is some sample output from a run of my program with a 6x4 matrix for two rounds, with debugging turned on (I just use a compile-time flag, #define DEBUG 1, to turn it on/off):

Start:
------
0 0 0 1 
0 0 0 1 
0 1 0 0 
1 1 1 0 
0 1 0 0 
0 1 0 0 

Round 0:
-------
0 0 1 0 
0 0 1 0 
1 1 0 1 
1 0 1 0 
0 0 0 0 
1 0 1 0 

Round 1:
-------
0 1 0 1 
1 0 1 1 
1 0 0 1 
1 0 1 1 
1 0 1 1 
0 1 0 0 

You will likely want to use buffering to ensure that sends do not block until a corresponding receive has been posted (otherwise you will have to be very very careful about avoiding deadlock). Look at MPI_Buffer_attach and MPI_Buffer_detach routines in section 3.6 of the MPI standard for more information.

Once your program works, you should evaluate its performance when run on different numbers of host machines (for example, 2, 4, 8, 16), for different sized matrices (two or three different sized MxN matrices should be fine). There seems to be a limit to the number of processes lam will spawn per machine, so depending upon how you allocate processes to cells you may not be able to test all permutations of number of hosts and problem sizes.

Run each game-of-life program for a large number of iterations (make sure your program runs for long enough that you are not just measuring the MPI start-up costs). You can use the time command (% time mpirun -v appfile), to measure total execution time of each run. Also, I suggest writing a script to run all your tests so that you don't have to sit there and run one after the other by hand.

Measure total execution times for each combination, summarize the results, and write a 2-3 paged report describing

  1. how you divided cells up per processes,
  2. your tests,
  3. your results,
  4. and what, if any, conclusions you can draw from your results.
You may want to include one or two tables or graphs that summarize your important results (I don't want to see all your numbers, just a summary of your interesting results).

Look at ~newhall/public/latex_examples, for some Latex document starting points, if you want to use Latex to write up your report. gnuplot and xgraph are graph tools available on our lab machines.

You should run each experiment multiple times (why?). To avoid interference with other uses of the machines, run your experiments when the machines have a low load (i.e. early in the morning, friday or saturday night, ...). To do this you can create a script to execute some large number of timed runs and use cron to initiate execution your script sometime early in the morning. Look at the man page for cron on how to set up jobs to run at a particular time of day (crontab -e to edit a crontab file).

Hand in

Create a tar file with:
  1. your solution source code and appfile(s) and Makefile
  2. and debug output showing the rounds of a run of a 3x4 matrix for 4 iterations:
    % gameoflife 3 4 4    # or 'gameoflife 4 3 4' depending on how you interpret the command line args
    

Send me email with your tar file solution as the attachment.

Either, submit a hardcopy of your report to me (slide it under my office door before the due date), or email me a pdf version of it before the due date.

MPI Links

MPI Tutorial Links under "Another MPI Tutorial", and "Getting Started with LAM/MIP" are pretty helpful.
MPI 1 Standard. Other MPI documentation is availble here.
List of MPI functions from the MPI Standard