CS87 Mini Lab7 : MPI and XSEDE

Example XSEDE output Due: Saturday March 31 before noon

A Mini Lab is one that I anticipate that you can complete in a couple hours; finish or be close to finishing by the end of a Thursday lab session. The purpose of Mini labs are to introduce you to a parallel or distributed programming language/utility without having you solve a larger problem using the language/utility.

I give you only about 24 hours to complete a mini-lab because I want you to stop working on it and get back to focusing your effort on the regular lab assignment.

If you don't get a mini lab fully working, just submit what you tried. If you don't submit a solution, it is not a big deal. Mini Labs do not count very much towards your final grade, and not nearly as much as regular Labs--they are mini.

For this lab assignment you, with your Lab 5 partner, will run some large runs of your MPI odd-even sort on Comet, and submit some output results (do not do any runs that have debug printf stmts: your run should have no output other than timing and possibly printing out initial size and number Pis.

XSEDE Experiments on SDSC Comet

For this mini lab, I want you to try running some large runs of your Lab 5 solution on comet.

Practice First

  1. First make sure to set up your XSEDE and your Comet account this week (follow all the directions under "XSEDE and Comet Account Set-up"): XSEDE and Comet accounts.
  2. Next, try ssh'ing into Comet, and try out scp'ing over my mpi example and running in on Comet (follow the directions under "Using Comet and submitting jobs").
  3. Once you figured out slurm and how to submit jobs, then scp over your Lab05 solution, build on comet. You may need to make changes to the Makefile (see the Makefile for my XSEDE examples).
  4. Then, write a submision script for some small runs and try running. Try a few small runs with debug printing to make sure it runs on comet. You can modify the hello slurm script to run your sorting program and submit it.
  5. Finally, try a small run with printing disabled in preparation for larger runs (remove all debug output from your program, comment out #define DEBUG, except that you can keep in printing out timing information.)

Assignment: try some long runs of your Lab 5 solution on comet

Try out at least two large runs of your Lab 5 solution on comet.
  1. Make sure to disable or remove all debug output from your program (comment out #define DEBUG).

  2. write a couple slurm submission scripts for long runs (large sizes), submit them. In your slurm script, you will want to modify at least these four lines (try the compute queue instead of the debug queue):
    #SBATCH --partition=debug      # which queue
    #SBATCH --nodes=2              # Total number of nodes
    #SBATCH --ntasks-per-node=24   # Total number of mpi tasks
    #SBATCH -t 00:30:00            # Run time (hh:mm:ss) - 30 mins
    
    You should choose way more than 2 nodes and 24 mpi tasks in your runs. You can also try large sized arrays for each process to sort via command line args to your exectutable. You may need to adjust the esitimated runtime (30 mins in this example). If your estimate is too small and your program runs longer than your estimate, it will be killed before it completes. If your estimate is too long, it will wait in the job queue for much longer than it should. You should also submit to one of the regular job queues for the big runs (don't use the debug queue).

Submit
Before the Due Date, one of you or your partner should push comet output files from two large runs to your Lab05 repo. (ssh them over to your cs account to git push them to your Lab05 repo). If you have git problems, take a look at the "Troubleshooting" section of the Using git page.