Running MPI applications with OpenMPI on our system


Contents:


Compiling your program

Example programs:

Some example MPI programs to try out are available here:
/home/newhall/public/openMPI_examples/

Running an OpenMPI program

ssh without giving password

You want to run mpi in an environment that is set up so that you do not need to enter your password for every machine on which the mpi processes will run. To do this on our system you need to: (1) have your public rsa key in a file named ~/.ssh/authorized_keys2; and (2) usually run ssh-agent and ssh-add in the terminal from which you will run mpirun (often when starting from a machine to which you are remotely logged into).

The first step is to set up ssh'ing without passwords, so that spawing processes on remote hosts doesn't require a password.

After setting up your ssh keys, use ssh-agent to avoid having to type in your passphrase when ssh's into machines (and when mpi ssh's to machines in your host file to start your mpi processes). Note: you should run once in the terminal (bash session) before you start one or more mpirun commands from that terminal:

mpirun

mpirun is used to run mpi applications. It takes command line arguments that specify the number of processes to spawn, the set of machines on which to run the application processes (or you can specify a hostfile containing the machine names), and the command to run. For example:
To run a simple command:
------------------------
  mpirun -np 2 --host robin,loon   ./excecutable  

example:
  # using mpirun to run the uptime command on two hosts
  mpirun -np 2 --host robin,loon  uptime 
  # using mpirun to run an MPI hello world application on two hosts
  mpirun -np 2 --host robin,loon  ./helloworld

Using a hostfile

Often, mpirun is run with a hostfile command line argument that lists the set of hosts on which to spawn MPI processes.
To run a command using a hostfile
----------------------------------
% cat myhosts
  # a line starting with # is a comment
  # use slots=X, for machines with X processors
  basil
  rosemary
  nutmeg
  cinnamon

% mpirun -np 4 --hostfile myhosts ./helloworld

  Hello, world (from basil), I am 0 of 4
  Hello, world (from nutmeg), I am 2 of 4
  Hello, world (from rosemary), I am 1 of 4
  Hello, world (from cinnamon), I am 3 of 4
Typically, all MPI processes run the same executable file. However, this is not required. Some programs may be written in a boss-worker style where one process acts as the boss, handing out work and coordinating results from other processes, the workers, who perform the parallel tasks. Other programs may have separate types of tasks that subsets of processes perform. In these cases, a programmer may have separate executable files for each type of proces. To run MPI programs like this, you need to specify how may processes to spawn for each exectuable file using multiple -np command line options, one per executable file.
To run a boss/worker program 
(one process runs the boss executable, others run the worker executable)
------------------------------------------------------------------------
% cat myapp
 # boss is the name of the boss executable, worker is worker
 -np 1 ./boss
 -np 6 ./worker

% mpirun --hostfile myhosts --app myapp
boss: allocating block (0, 0) - (19, 19) to process 1
boss: allocating block (20, 0) - (39, 19) to process 2
boss: allocating block (40, 0) - (59, 19) to process 3
...
boss: allocating block (500, 500) - (511, 511) to process 2
boss: done.

Creating hostfiles

Hostfiles consist of machines, one per line. Listings without the full domain name (i.e. loon vs. loon.cs.swarthmore.edu), assume hosts in the same domain as the one running mpirun. Hostfiles can additionally specify the number of slots per machine. A slot is an allocation unit on a host, which indicates the number of MPI processes that can be spawned on the host. If no slot is specified, than MPI uses the number of cores on a host to determine the number of processes it can spawn. By adding slots=X to hostfiles, you can change how mpirun distributes processes over hosts. The slot number can be any postive number; it is not constrained by the actual number of cores on a machine, but performance may be. Here is an example of two hostfiles:
cat hostfile   
robin 
sparrow 
lark 
loon 

cat hostfile1  
robin slots=1
sparrow slots=1
lark slots=1
loon slots=1
If robin has 4 cores, then:
# spawns 4 processes on robin
mpirun -np 4 --hostfile hostfile ./helloworld

# spawns 4 processes, 1 each on robin, sparrow, lark and loon:
mpirun -np 4 --hostfile hostfile1 ./helloworld

finding out information about our machines

When you run MPI applications, you may want to spawn more mpi processes on machines with more processors. You can use the "slots=n" clause in a hostfile to do this. To find out system information on a given machine:
cat /proc/cpuinfo    # information on every core on machine
cat /proc/meminfo    # information about total RAM size (and current use)
lscpu                # summary information about processor
lsmem                # summary information about memory
Also, the following page of the CS help pages lists summary information about machines in our system: CS lab machine info

In /usr/swat/local/db are files that list all the host names for machines in different labs. You can use these to create your MPI host files. For example:

cp /usr/swat/db/hosts.bookstore  hostfile
Then edit hostfile to remove any hosts you don't want to include.

The individual files listing machines in our 4 labs are:

/usr/swat/db/hosts.256  
/usr/swat/db/hosts.bookstore  
/usr/swat/db/hosts.mainlab  
/usr/swat/db/hosts.overflow  

using autoMPIgen to generate hostfiles

On our system, you can run autoMPIgen to automatically generate MPI hostfiles from good machines, defined in different ways, on our system.

For example:

#to generate a hostfile of 10 good machines on our network:
autoMPIgen -n 10 hostfile

# to generate a hostfile of 10 good machines and include slot=4 with each entry:
autoMPIgen -n 10 -s 4 hostfile

# just list information about the top 40 machines (this doesn't fill hostfile)
autoMPIgen -n 40 -v -i  hostfile
# you can also use smarterSSH to list this same information:
smarterSSH -n 40 -v -i  

Run autoMPIgen -h to see its command line arguments for furtner configuration options.

There is also more information about autoMPIgen (and smarterSSH) off the PeerMon page

Debugging MPI programs with gdb and valgrind
You can use both gdb and valgrind to debug MPI programs. Remember that you should compile with the -g flag to ensure that the compiler includes debug information in the mpi executable file. Otherwise, you will not get source code line numbers associated with your executable file, for example.

As a general practice when debugging parallel programs, debug runs of your program with the fewest number of processes possible (2, if you can).

To use valgrind, run a command like the following:

mpirun -np 2  --hostfile hostfile valgrind ./mpiprog
This example will spawn two MPI processes, running mpiprog in valgrind. This means both processes will print valgrind errors to the terminal.

To use gdb, first create a hostfile that include only the machine on which you are logged in (gdb in MPI will create xterm windows that would need to be Xforwarded from a remote machine otherwise). Then, run a command like the following:

mpirun -np 2  --hostfile hostfile xterm -e gdb ./mpiprog
Since this spawn two terminal windows, each running a gdb session for one of the two MPI processes in this run. In each gdb session, you can then set breakpoints, choose the run command to start each running, and then use other gdb commands to examine the runtime state of thes MPI processes.

If your xterm settings with gdb output font highlighting are hard to read, you can change your default xterm settings in your .Xdefaults file and then reset them to your changes by running xrdb -merge ~/.Xdefaults. For example:

# open your .Xdefults file in an editor (vim, for example):
vim ~/.Xdefaults

  ! change background and forground settings in this file to:
  xterm*background: black
  xterm*foreground: white

# then run this command to apply them:
xrdb -merge ~/.Xdefaults

Here are some links to my gdb guide and valgrind guide. Also, Chapter 3 of Dive into Systems contains a more verbose version of much of this content.

mpirun troubleshooting and clean-up
Sometimes mpirun fails because a host in a hostfile is unreachable. If this happens, check that the host are reachable and if not, remove the host from the hostfile and try again. It is helpful to use a script to test this, here is an example checkup.sh script to check if hosts in a host file are up: A checkup.sh script might look like this:
#!/bin/bash
  
if [ "$#" -ne 1 ]
then
  echo "usage ./checkhosts.sh hostfilename"
  exit 1
fi

for  i in `cat $1`
do
  echo "checking $i"
  ssh $i uptime
done
Then run on a hostfile to check if hosts are reachable
# first make sure the script file is executable
ls -l checkup.sh
# and set permissions if not
chmod 700 checkup.sh

# run to check if hosts in a hostfile are reachable
./checkup.sh hostfile
You do not normally have to do this, but if all the nodes in a hostfile are reachable but you are having trouble re-running mpirun, you can try running orte-clean to clean-up any processes and files left over from a previous run that could be interfering with subsequent runs.
# try this first: 
# make sure all your hosts in hostfile are reachable
# and if not, take them out of your host file
./checkup.sh hostfile

# try this second: 
# clean up MPI job state on node from which you ran mpirun
orte-clean --verbose

# try this last: 
# clean up MPI job state on all nodes in a hostfile 
mpirun --hostfile hostfile orte-clean --verbose

Removing mpirun warnings
When you run on our system, you may see a warning like this:
------------------------------------------------------
A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
------------------------------------------------------
You can ignore this, as this is mpirun looking for infiniband nics and not finding them on our system (and using Ethernet instead). You can also get rid of these warnings when running mpirun by setting an MCA parameter btl_base_warn_component_unused to 0: One way to do this is in the mpirun command line
  mpirun --mca btl_base_warn_component_unused 0 -np 4 --hostfile ~/hostfile ./mesh_w_buffers
Another way is to add a shell environment variable with this setting and run mpirun without this command line option:
  setenv OMPI_MCA_btl_base_warn_component_unused 0
  mpirun -np 4 --hostfile hostfile ./mesh_w_buffers
Finally, you could add this environment variable into your .bashrc file, and it will be always added to your bash env variables, and then you can just run without this warning. Add this to your ~/.bashrc file:
  export OMPI_MCA_btl_base_warn_component_unused=0
Once you set this in your .bashrc, in any new bash shell, you can run mpirun and no longer see this warning:
  mpirun -np 4 --hostfile hostfile ./mesh_w_buffers
In an older shell, one that predates your change, you can run source ~/.bashrc to update its environment variables from the new .bashrc to get this.

See the MPI FAQ linked to from here for more info about MCA settings: Links to MPI references and tutorials

MPI Links
A bunch of links to more information about MPI: Links to MPI references and tutorials