Running LAM on CS Lab machines
You need to set up ssh so that you can ssh into lab machines without
having to enter your password:
Here's what to do:
On some lab machine:
% ssh-keygen -t dsa
(accept the defaults, and just hit RETURN when asked for a passphrase)
% cd ~/.ssh
% cat id_dsa.pub >> authorized_keys2
now try it out: ssh somemachinename
Next, set your LAMRSH environment variable:
# if you use bash: add to your .bashrc or .bash_profile file:
export LAMRSH=ssh
# if you use another shell (you likely don't) add to your .cshrc file:
setenv LAMRSH ssh
The main steps that you will need to take to run MPI programs are the
following:
- Boot lam
To boot lam, you need to have a lamhosts file in your lam subdirectory,
and run recon to see if lam can be started on all nodes in your hostfile:
% recon -v lamhosts # see the example below for this file's format
Then, to boot lam on the hosts in your host file (this has to be
once at the beginning of each session you run mpi programs):
% lamboot -v lamhosts
- Run MPI applications
- Compile your application using mpicc for C code, and mpic+= for
C++ code (mpif77 is the mpi compiler for Fortran):
% mpicc -o myprog myprog.c
- Create an appfile describing which host will run which executable(s)
In some cases you may have more than one executable in your mpi
application (see the mandelbrot example below where there are separate
master and slave executables). Use mpirun and the appfile to run your
MPI application:
% cat appfile
# 18 a.outs across 6 nodes (will spawn 3 a.outs per node)
n0-5 /home/newhall/lam/test/a.out
n0-5 /home/newhall/lam/test/a.out
n0-5 /home/newhall/lam/test/a.out
% mpirun -v appfile
or use the -np # command line option to mpirun
to specify the number of processes to start on the nodes. For more
options, see the mpirun man page.
% mpirun -np 25 /home/newhall/lam/test/a.out
- After each individual run of an MPI application, it is good to run
lamclean to clean up any residual state from the last run before
starting a new run:
% lamclean -v
- mpitask
As your MPI application runs, you can look at the state of the tasks
by running mpitask. To continuously run this program every X seconds,
use watch:
# every 1 second, run mpitask
% watch -n 1 mpitask
- Shutdown lam
After you are done running your MPI applications, you should shut
down lam by running lamhalt, or if lamhalt doesn't completely
clean things up, then try lamwipe:
% lamhalt
% lamwipe -v lamhosts
An Example MPI application to try
Before you start coding, try copying over and running some of the
sample MPI programs in /usr/share/doc/lam4-dev/examples/main/. For
example, here is what you need to do to run the mandelbrot code
(following instructions from the
lam tutorial):
% mkdir lam
% cd lam
% vi lamhosts # add some hosts
% recon -v lamhosts
% lamboot -v lamhosts
% tping -c1 N
% cp -r /usr/share/doc/lam3-dev/examples/main/mandelbrot .
% cd mandelbrot/
% gunzip *.gz
% mpicc -o master master.c
% mpicc -o slave slave.c
% vi appfile # create appfile schema file: it describes
# the application: each node and program
# the node names come from 'recon -v lamhosts'
% cat appfile
# 1 master, 5 slaves
n0 master
n1-5 slave
% mpirun -v appfile # run the MPI application
% xv mandel.out
% lamclean -v # clean up lam stuff from this run
% mpirun -v appfile # re-run the mandelbrot app
% lamhalt -v # halt lam...you are done running lam apps
% lamwipe -v ../lamhosts (if halt didn't work)
Try running this application with different numbers of slaves by modifying
the appfile, and login to a node running slave programs, run top, and watch
as slave program(s) are spawned and start running.
MPI Links
MPI Tutorial
Links under "Another MPI Tutorial", and "Getting Started with LAM/MIP"
are pretty helpful.
MPI 1 Standard. Other MPI documentation is availble
here.
List of MPI functions from the MPI Standard