For this lab assignment you will implement a parallel sorting algorithm
in MPI, and run and test it out on both our system and on the SDSC
Comet cluster.
This week you will focus on running your MPI application on our system,
and setting up your XSEDE and Comet account.
Starting next week and after break you will try out some large runs
on Comet.
You will work on this lab with your
lab 5 partner
Lab 5 Goals
 Learn MPI programming.
 Run an MPI application on our system using openMPI.
 Implement a parallel sorting algorithm
 Answer complexity questions about your implementation.
 Learn how to use an XSEDE resource.
 Test your MPI implementation on our system and on a large XSEDE system.
 Answer some questions about the complexity of a parallel algorithm.
Contents
Getting Started
I suggest looking at the following information:
Running OpenMPI on our system , and try out running the simple example code:
cp r ~newhall/public/openMPI_examples .
Lab 5 Starting Point Repo
 Get your LabO5 sshURL from the GitHub server for our class:
CS87S18
 On the CS system, cd into your cs87/labs subdirectory
 Clone a local copy of your shared repo in your private
cs87/labs subdirectory:
git clone [your_Lab05_URL]
Then cd into your Lab05youpartner subdirectory.
If all was successful, you should see the following files when you run ls:
Makefile README hostfile oddevensort.c run.sh
If this didn't work, or for more detailed instructions on git see:
the
Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).
Starting Point files
 Makefile: builds an openMPI parallel version of oddevensort
 questions.tex: your answers to the complexity questions
 README: see notes about #defines and sizes
 hostfile: an example hostfile
 oddevensort.c: the starting point file for your implementation.
 run.sh: an example run script
Project Details
For this lab you will implement parallel OddEven Sort in MPI. OddEven is
similar to Bubble Sort in that it compares and swaps adjacent elements
However, it does the comparisons and swaps in an order that is more easily
parallelizable than Bubble Sort.
Sequential Algorithm
The sequential OddEven Sort algorithm works as follows:
 Swapping adjacent elements is done in N rounds.
 In each round, pairs of elements are compared and swapped if not
in sorted order. The pairs compared and swapped depend on if it is
an odd or and even round:
 If round i % 2 == 0 compare even/odd pair of elements
 If round i % 2 == 1 compare odd/even pair of elements
Here is an example sorting N=6 values: 3 8 9 5 6 2
round
0: 3 8 9 5 6 2 (even/odd)
^^ ^^ ^^
1: 3 8 5 9 2 6 (odd/even)
^^ ^^
2: 3 5 8 2 9 6 (even/odd)
^^ ^^ ^^
3: 3 5 2 8 6 9 (odd/even)
^^ ^^
4: 3 2 5 6 8 9 (even/odd)
^^ ^^ ^^
5: 2 3 5 6 8 9 (odd/even)
^^ ^^
Note: the end elements are not involved in compares and swaps in every round.
Parallel Algorithm
In the parallel version, the assumption is that P (the number of processes)
is much smaller than N. The parallelization is done in terms of P not N.
Assume Pi is the id of the ith process, and that i=0P1, represents that rank
of the processes participating the th parallelization.
 Each Pi is allocated a contiguous portion of the list.
 Some number of sorting rounds occur where Pi exchanges its sorted
list of values with one of its neighbors.
At each round, process Pi does the following:
 Pi sorts its portion of the list locally using any sorting algorithm.
 If round i%2 == 0, and even/odd exchange happens
 If Pi is even, it sends its sorted portion to Pi+1
Pi keeps the smallest half of the items for the next round, Pi+1 the
largest half for the next round.
 If Pi is odd, it sends its sorted portion to Pi1
Pi keeps the largest half of the items for the next round,
Pi1 the smallest half.
 If round i%2 == 1, and odd/even exchange happens
 If Pi is even, it sends its sorted portion to Pi1
Pi keeps the largest half of the items for the next round,
Pi1 keeps the smallest half for the next round.
 If Pi is odd, it sends its sorted portion to Pi+1
Pi keeps the smallest half of the items for the next round,
Pi+1 the largest half.
Remember that the end Pis may not participate in every round.
Implementing OddEven Sort in MPI
You will implement parallel OddEven Sort in MPI. Your implementation will ignore two issues often associated with sorting:
 How to distribute the N elements over the P processes.
 How to merge the P sorted parts back into a single sorted array.
In your implementation:
 Each process will generate a set of SIZE random int values
as its portion of the N items to sort.
 Processes will compute all rounds of OddEven sort so that at the
end, process 0 has the smallest SIZE elements in sorted order, process 1
the next smallest SIZE elements in sorted order, ..., process P1 has
the largest SIZE elements in sorted order.
The sorted portions DO NOT need
to be assembled together into a single sorted array on one of the nodes.
In other words, your MPI program can exit with the sorted list distributed
in SIZE chunks over the P processes such that PO has the smallest SIZE
values in sorted order, P1 then next smallest sorted SIZE values, and so on.
Starting point code
In the starting point oddevensort.c file is a definition for SIZE.
When you run your mpi program you will specify the number of process
to spawn (e.g.
np 8 spawns 8 MPI processes for a run).
The number of values a particular run is sorting (N) is implicitly specified
by the values of SIZE and the number of processes.
The starting point code has an optional command line argument that specifies
a different value for the size of elements each process should sort:
# sort 8*SIZE values such that each of the 8 processes gets SIZE elements
mpirun np 8 hostfile hostfile ./oddevensort
# run with optional command line argument:
# sort 800 values such that each of the 8 processes gets 100 elements
mpirun np 8 hostfile hostfile ./oddevensort 100
As you first develop your solution, uncomment the DEBUG definition in
the starting point file, and try running with small SIZE values and
smallish different number of processes to test correctness. With DEBUG defined,
each MPI process will print out its portion of the sorted array before
performing oddeven sort and then after where you should see process 0
has smallest SIZE values in sorted order, process 1 the next, and so on.
Try different sized SIZE and number of process values to make sure your
sort is correct (correct number of iterations, correct Pi's exchanging at
each step).
You can also try oddeven sorting in the other direction after
doing it one way first (i.e. try a descending sort after you do the ascending
sort). Sorting an array in the opposite starting sorted order is often a
good way to see if you are missing catching an edge case. To do this you
may want to functionize all comparisons so that it is easy to
change your code to sort in either direction.
Sample Output
Here is
output from my program.
It shows a few runs with different number of processes and sizes run
in debug mode where each process prints out its portion of the unsorted
array before sorting and prints out its portion after.
It is good to add this debug printing functionality to yours to help
you verify correctness of your implementation.
Additional Requirements
In addition to a correct implementation, your solution should be well designed, well commented, with good
error detection and handling, and free of memory access errors. Use
constants, meaningful variable names, and good modular design.
See my C Style guide and other Other C programming documentation.
You should also not use global variables in your solution.
Complexity Questions: Due Friday March 9 before 5pm
In addition to implementing and testing OddEven Sort in MPI, you should
answer the following questions about OddEven Sort. Writeup your answers
in the
questions.tex file, and push the .tex
file (not the .pdf) to your git repo before the due date. In the .tex file are
some examples of using latex math mode and verbatim, which may be useful for your
writeup (you are not required to use either).
 What is the bigO complexity of the Sequential OddEven Sort Algorithm on
a list of N items?
Show how, give a detailed explain of how, you got your answer.
 Given P processors, what is the bigO runtime complexity of the
Parallel OddEven Sorting Algorithm?
Show how, give a detailed explain of how, you got your answer.
 Given P processors, how much space is needed to perform the parallel sort of N values? Explain your answer.
Your answer will likely depend on how you do the exchange step, so explain
how you did it in your answer.
 IS OddEven sort a good algorithm for sorting on a GPU? Why or Why not?
 Is OddEven sort a good algorithm for sorting on a cluster using MPI?
Why or Why not?
Feel free to list pluses and minuses comparing GPU and MPI .
XSEDE Experiments on SDSC Comet
The details of this part of the Lab 5 assignment will be assigned next week.
This Week do the following:

Make sure to set up your XSEDE and your Comet
account this week (follow all the directions under
"XSEDE and Comet Account Setup"):
XSEDE and Comet accounts.

Try ssh'ing into Comet, and try out scp'ing over my mpi example
and running in on Comet (follow the directions under "Using
Comet and submitting jobs").
Useful Functions and Resources
 srand, rand: C random number functions:
# srand: seed the random number generator using current time:
# (only seed one time)
srand(time(NULL));
# then call rand to get the next value in the random sequence
# (returns value from 0MAXINT, use % to get smaller range):
next_val = rand() % MAXVAL;
 Running OpenMPI on our system . This includes information about
my simple MPI example programs you can copy over and try out on our system:
cp r ~newhall/public/openMPI_examples .
 Using Xsede to run on Comet. my directions and examples for running MPI programs on Comet using your xsede account.
 The Xsede portal: https://portal.xsede.org/
 Professor Danner's page and resources on MPI and CUDA on Stampede (we are not using stampede, but there may be some useful directions here)
 Lab Machine specs
page contains information about most of the lab machines, including
number of cores.
 More Xsede links
 Links to other MPI references and tutorials
Submit
Before the Due Date,
one of you or your partner should push
your solution to github from one of your local repos to the GitHub
remote repo. (it doesn't hurt if you both push, but the last
pushed version before the due date is the one I will grade, so be
careful that you are pushing the version you want to submit for grading):
From one of your local repos (in your ~you/cs87/labs/Lab05partner1partner2 subdirectory)
git add questions.tex
git add oddevensort.c
git commit
git push
If you have git problems, take a look at the "Troubleshooting" section of the
Using git
page.