For the Lab 4 assignment you will be using some XSEDE resources. In order to use these, you need to first create an account on XSEDE. It can take up to a week to enable this account, so please follow the "Get an XSEDE Account" steps today:

1. Get an XSEDE Accout

To set up your XSEDE Account:

  1. go to https://portal.xsede.org/

  2. choose "Create Account" button to request a new account:

      Organization: Swarthmore
      Department: Computer Science
      Registration Key:  pick the 6 first chars your swarthmore user name
  3. When you have a choice to select a user name, pick your swarthmore user name (ex. tnewhal1 is mine). If your Swat user name is already in use, pick a different one and let me know what you picked.

It will take a few days, and up to 1 week, for your account to be activated.

2. Setting up Logging into xsede portal and Comet system

  • go to https://portal.xsede.org/ and login with your user name, request a password.

  • Select My XSEDE→Profile

  • Choose Enroll in Duo in upper right Add a device. If you have a phone you carry in your pocket, use that. If not contact Andrew Ruether in ITS and he will give you a device to use and help you register it.

3. Log into Comet

3.1. The first time you log into comet, you need to do so through xsede:

ssh newhall@login.xsede.org
# enter passcode
gsissh comet

Once logged into comet, upload a public key from your cs account to comet (your public key is in ~/.ssh/id_rsa.pub). On comet:

mkdir .ssh
vi authorized_keys     # copy in your pub key from your cs account

In our git guide is some information about generating keys on our system: generating keys

3.2. logging directly Comet subsequent times

After adding your public key(s), you can directly ssh or scp to comet from a cs machine:

ssh newhall@comet.sdsc.edu

4. Using Comet and submitting jobs

4.1. Copying files

You can use scp to copy files between your cs account and comet. For example, from comet I can copy over a file or a whole subdirectory (and just change the source and destination to copy from comet to CS):

# copy over foo.c
scp newhall@cs.swarthmore.edu:/home/newhall/public/foo.c .

# WARNING: this does a recursive copy of all contents under the specified
# directory (my mpi_examples directory in this example):
scp -R newhall@cs.swarthmore.edu:/home/newhall/public/mpi_examples .

You can also create a single tar file of a set of files to easily copy over and then untar it once copied over. Here is my documentation about using tar. You can also look at the tar man page for more information.

4.2. Running Jobs

Comet runs the SLURM resource manager to submit jobs.

squeue            # list all jobs in the job queue (there will be lots)
sbatch  job.mpi   # to submit a batch job from a slurm script
                  # this will give you a jobId  for your job
squeue -u yourusername # list all jobs in the job queue that you own
scancel jobId     # to kill a queued job

man slurm         # the man page for slurm

4.2.1. queues

For MPI use the debug queue when debugging and the compute queue for larger longer test runs. If you use Comet for your course project, it has gpu queues too.

4.2.2. slurm job script

The slurm job script specifies information about the job you are submitting. Some of these include the number of nodes, mpi processes, and an estimate at your job’s runtime.

/share/apps/examples/  # example job scripts  (see mpi examples)

4.2.3. Luster file system

If you use Comet for data intensive computing that requires large input or output file storage, use the luster file system on Comet to store files.

The Comet User’s Guide has a lot more information and examples.

5. Hello World Example to try out

I have a very simple example MPI program and slurm run script for submitting to the debug queue on Comet. You can try it out by doing the following:

# from comet, copy over my hello world example, untar and make it
scp /home/newhall/public/XSEDE_MPI.tar .
tar xvf XSEDE_MPI.tar
cd XSEDE_MPI
make

vi hello.mpi  # change the path name to helloworld to your path
# (in mpi_run_rsh command line change newhall to your user name)

# submit your job
sbatch hello.mpi

# check its status
squeue -u yourusername

# after it has run its output is in a file (vi, cat, less, ... to view)
less helloworldJOBID.out

The hello.mpi is an example slurm runscript that you can use as a starting point for other mpi applications you run. It submits the job to the debug queue, which is the one you want to use for testing before submitting longer experiment runs to the compute queue.

6. Useful Functions and Resources

  • Try out my Hello World Example on Comet.

  • See the Comet User’s Guide for lots more information and examples. In particular:

    • Computing Environment: list the modules available (Comet using MVAPICH implementation of MPI, which has different way of specifying hosts than the OpenMPI version on the CS system)

    • Application Development: compilers and tools available, example compilation commands

      module avail           # lists software available on comet
    • Running Jobs on Comet: information about the different queues (use debug for testing, and compute for longer mpi runs). example slurm job scripts

  • The Xsede portal: https://portal.xsede.org/

  • You can copy over my MPI example programs in /home/newhall/public/openMPI_examples/ to try out. There are a few more MPI examples and some runs scripts for using openMPI on our system. See the README file in that directory for more information.

    cp -r /home/newhall/public/openMPI_examples .
  • See the Lab Machine specs for running on cs machines. The CS department machines have openMPI installed (Comet uses MVAPITCH), so the run scripts are different than on comet.