Start by following the instruction below on how to run lam on the machines in the CS lab, and then try to run some of the example MPI programs. After you have some practice setting up lam and running example mpi applications, you should implement the Game of Life program as defined below.
Running LAM, Example MPI applications to try out, The Assignment, Links to on-line MPI help
Here's what to do:
On some lab machine:
% ssh-keygen -t dsa
(accept the defaults, and just hit RETURN when asked for a passphrase)
% cd ~/.ssh
% cat id_dsa.pub >> authorized_keys2
now try it out: ssh somemachinename
Next, set your LAMRSH environment variable:
# if you use bash: add to your .bashrc or .bash_profile file:
export LAMRSH=ssh
# if you use another shell (you likely don't) add to your .cshrc file:
setenv LAMRSH ssh
The main steps that you will need to take to run MPI programs are the
following:
% recon -v lamhosts # see the example below for this file's format
Then, to boot lam on the hosts in your host file (this has to be
once at the beginning of each session you run mpi programs):
% lamboot -v lamhosts
% mpicc -o myprog myprog.c
% cat appfile
# 18 a.outs across 6 nodes (will spawn 3 a.outs per node)
n0-5 /home/newhall/lam/test/a.out
n0-5 /home/newhall/lam/test/a.out
n0-5 /home/newhall/lam/test/a.out
% mpirun -v appfile
or use the -np # command line option to mpirun
to specify the number of processes to start on the nodes. For more
options, see the mpirun man page.
% mpirun -np 25 /home/newhall/lam/test/a.out
% lamclean -v
# every 1 second, run mpitask
% watch -n 1 mpitask
% lamhalt
% lamwipe -v lamhosts
% mkdir lam
% cd lam
% vi lamhosts # add some hosts
% recon -v lamhosts
% lamboot -v lamhosts
% tping -c1 N
% cp -r /usr/share/doc/lam3-dev/examples/main/mandelbrot .
% cd mandelbrot/
% gunzip *.gz
% mpicc -o master master.c
% mpicc -o slave slave.c
% vi appfile # create appfile schema file: it describes
# the application: each node and program
# the node names come from 'recon -v lamhosts'
% cat appfile
# 1 master, 5 slaves
n0 master
n1-5 slave
% mpirun -v appfile # run the MPI application
% xv mandel.out
% lamclean -v # clean up lam stuff from this run
% mpirun -v appfile # re-run the mandelbrot app
% lamhalt -v # halt lam...you are done running lam apps
% lamwipe -v ../lamhosts (if halt didn't work)
Try running this application with different numbers of slaves by modifying
the appfile, and login to a node running slave programs, run top, and watch
as slave program(s) are spawned and start running.
The Game of Life is a simple example of a class of problems that can be modeled using cellular automata; some physical and biological systems that evolve over time can be modeled using cellular automata. The problem space is divided into a number of cells each of which is a finite state machine. Each cell's evolution is computed in steps of time; each cell makes one state change, then each cell makes the next state change, and so on.
In the Game of Life, each cell, represented by an (i,j) element in an MxN matrix, contains an organism that is either dead or alive (represented by a 0 or 1). At each step in the game, a cell's state changes based on its current state and the state of its neighbors. The following are rules for a cell's state transition at each step:
Using the lam implementation of MPI, write a message passing version of the game of life that takes three command line arguments: two dimension values (M and N) for the matrix; and K the number of iterations as input. The easiest way to parallize this program is to have one process for each cell in the MxN matrix. You are welcome to try another way of dividing up the matrix if you'd like (either way is fine). No matter how you decide to distribute the matrix, you should choose one process to be the master (0 is a good choice), and the others to be the slaves. The master should participates as a cell in the MxN matrix (or as a set of cells), but is also responsible for sending slaves their original values and receiving the final result from the slaves and printing it to stdout. I'd recommend using the SPMD model for your program, rather than having separate master and slave executables.
You should also have a debug mode where each slave sends the master its value at the end of a round, and the master after receiving values from all slaves, prints out the MxN matrix of round i values. This will help you debug your solution.
When your program is not running in debug mode, slaves should not send the master their intermediate results.
At each step, each cell needs to exchange information with its neighbors. Keep in mind that a cell in the middle has eight neighbors, and a cell in a corner has only three neighbors, and a cell on an edge has five neighbors. Make sure that at each iteration i, each process gets values from each of its neighbors corresponding to their ith state, and that you solution is deadlock free.
Here is some sample output from a run of my program with a 6x4 matrix for two rounds, with debugging turned on (I just use a compile-time flag, #define DEBUG 1, to turn it on/off):
Start: ------ 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 0 0 Round 0: ------- 0 0 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0 0 0 0 1 0 1 0 Round 1: ------- 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 0 0
You will likely want to use buffering to ensure that sends do not block until a corresponding receive has been posted (otherwise you will have to be very very careful about avoiding deadlock). Look at MPI_Buffer_attach and MPI_Buffer_detach routines in section 3.6 of the MPI standard for more information.
Once your program works, you should evaluate its performance when run on different numbers of host machines (for example, 2, 4, 8, 16), for different sized matrices (two or three different sized MxN matrices should be fine). There seems to be a limit to the number of processes lam will spawn per machine, so depending upon how you allocate processes to cells you may not be able to test all permutations of number of hosts and problem sizes.
Run each game-of-life program for a large number of
iterations (make sure your program runs for long enough that you are
not just measuring the MPI start-up costs). You can use
the time command (% time mpirun -v appfile),
to measure total execution time of each run. Also, I suggest writing a
script to run all your tests so that you don't have to sit there and run
one after the other by hand.
Measure total execution times for each combination, summarize the results, and write a 2-3 paged report describing
Look at ~newhall/public/latex_examples, for some Latex document starting points, if you want to use Latex to write up your report. gnuplot and xgraph are graph tools available on our lab machines.
You should run each experiment multiple times (why?). To avoid interference with other uses of the machines, run your experiments when the machines have a low load (i.e. early in the morning, friday or saturday night, ...). To do this you can create a script to execute some large number of timed runs and use cron to initiate execution your script sometime early in the morning. Look at the man page for cron on how to set up jobs to run at a particular time of day (crontab -e to edit a crontab file).
% gameoflife 3 4 4 # or 'gameoflife 4 3 4' depending on how you interpret the command line args
Send me email with your tar file solution as the attachment.
Either, submit a hardcopy of your report to me (slide it under my office door
before the due date), or email me a pdf version of it before the due date.
MPI Links
MPI Tutorial
Links under "Another MPI Tutorial", and "Getting Started with LAM/MIP"
are pretty helpful.
MPI 1 Standard. Other MPI documentation is availble
here.
List of MPI functions from the MPI Standard