CS87 Mini Lab7: MPI and XSEDE

Due: Saturday March 19 before noon
(~24 hours, with some extra time to gather xcede result files and hand them in)
A Mini Lab is one that I anticipate that you can complete in a few hours, and finish, or be close to finishing, by the end of a Thursday lab session. The purpose of Mini labs are to give you some practice with a parallel or distributed programming language/utility without having you solve a larger problem using the language/utility.

I give you only about 24 hours to complete a mini-lab because I want you to stop working on it and get back to focusing your effort on the regular lab assignment.

If you don't get a mini lab fully working, just submit what you tried. Mini Labs do not count very much towards your final grade, and not nearly as much as regular Labs--they are mini.

Lab Details
For this mini lab, I want you to try running some large runs of your Lab 5 solution on stampede.

Getting Started: try running on stampede

See my Using XSEDE page for information about how to submit jobs on stampede.

Once logged into stampede, scp over my XSEDE_MPI.tar file, look at my submission script (hello.mpi), and then try it out on stampede. The directions are on my Using XSEDE page.

Once you have figured out slurm and how to submit jobs, then scp over your Lab05 solution, build on stampede, write a submision script (use hello.mpi as an example), and try submitting it to slurm.

Assignment: try some long runs of your Lab 5 solution on stampede

Try out at least two large runs of your Lab 5 solution on stampede.
  1. scp your lab5 solution over to stampede and compile it.

  2. first try a few small runs with debug printing to make sure it runs on stampede. You can modify the hello slurm script to run your sorting program and submit it.

  3. next, remove all debug output from your program (comment out #define DEBUG), except that you can keep in printing out timing information.

  4. write a couple slurm submission scripts for long runs (large sizes), submit them. In your slurm script, you will want to modify at least these lines (and you may want to try the normal queue instead of the development queue too):
    #SBATCH -N 2                  # Total number of nodes requested (16 cores/node)
    #SBATCH -n 32                 # Total number of mpi tasks requested
    #SBATCH -t 00:30:00           # Run time (hh:mm:ss) - 30 mins
    
    #SBATCH -p development     # queue (partition) -- normal, development, etc.
    
    You should choose way more than 2 nodes and 32 mpi tasks in your runs. You can also try large sized arrays for each process to sort via command line args to your exectutable. You may need to adjust the esitimated runtime (30 mins in this example). If your estimate is too small and your program runs longer than your estimate, it will be killed before it completes. If your estimate is too long, it will wait in the job queue for much longer than it should.

  5. handin your slurm scripts and stampede output files (ssh them over to your cs account to git push them to your Lab05 repo).

You should submit output from your two largest runs on stampede.

stampede user's guide

Submit
Before the Due Date, one of you or your partner should push your xsede ouput files to your Lab05 git repo.

From one of your local repos (in your ~you/cs87/labs/Lab05-partner1-partner2 subdirectory)

git add stampede_run_*  
git commit
git push

If you have git problems, take a look at the "Troubleshooting" section of the Using git page.