running long jobs/project etiquette

When running long experiments on the CS lab machines, it is important that you do not hog all of the resources. Here are some things to be aware of and a few rules to follow:

Do not run your job on allspice

Allspice is our email and file server. If you put a heavy load on allspice, everyone will suffer. Besides, the lab machines have faster CPUs, so your job should run faster on a lab machine (especially if you use the /local directories – see below).

See our Machine Specs Page for info on each lab machine.

View our status dashboards to see which machines are currently being used.

Use /local and /scratch/username for large data files

Your home directory is really on allspice, so running a job on a lab machine that writes data to your home dir means sending data (over the network) to allspice. Also, your home dir has a disk quota, which limits the amount of data you can store in your home directory.

Everyone has a /scratch/username directory (e.g., /scratch/csmajor1), and anyone can make a directory in /local (e.g., mkdir /local/yourusername). /scratch is accessible from any lab machine, so use /scratch if you need to access the same file(s) from any machine. Use the /local dirs if you can always use the same machine (i.e., /local on lime is not the same as /local on lemon).

If your program writes tons of data to files, it will be faster to use a /local directory.

See our use the /scratch and /local directories page for more info.

Be nice

All programs that will run for an extended amount of time (more than 15 minutes) should be “niced” to a lesser priority. If you are about to start a long program, try nice -n 19 ./a.out. If you have already started your program, use renice -n 19 -p pid, where pid is the Process ID of your program (found by using the ps or the top command). If nobody is using the computer your job is running on, it will still get 100% of the CPU. If someone is using the computer your job is running on, your job will run at a lesser priority so the console won’t be slow.

If possible, use screen or tmux

If you don’t need the graphics console, try running your long simulation in a screen or tmux session. They both allow you to detach from a session and then re-attach later. For example, you might start your program in the lab, detach and log out (your program keeps running!), and then re-attach to the same session from your dorm room. Here’s how to do it with screen:

See also:

write your program to allow restarts

This seems like common sense, but many programs don’t do it. If you’re going to run a job that takes 24 hours or more, what happens if the power goes out (or Jeff has to reboot the machines) after 23 hours? Ideally your program has been writing data files to disk every N timesteps, and can be restarted from any of these data files. :)

use xscreensaver and don’t leave signs all over the lab

If you’re running your long job on a lab machine, it is unrealistic to hog the lab machine and expect nobody else to use that machine. Don’t put signs on the machines in the lab (e.g., I’m using this machine – please don’t use it – my final project is due tomorrow…). Just use xscreensaver to lock your login session and allow others to log in if needed.

See our Running Long Jobs page for more info.


Back to SwatCS Help Docs