CS87, Week 2 Thurs Lab

Resources for Running Lab 1 Experiments

I'm going to show you some tools that may be useful for running Lab 1 experiments.

Look at the Useful Utilities and Resources part of the Lab 01 page for links to more information about some of these.

Also, my Help Pages documentation has information about useful tools and utilities.

Machine Specs page from the "cs lap help" link off the main cs page, list specs for all the CS mahines. Sort the Machines Table by #ofCores to find 8 and 16 core machines.

Results

You should run timed runs of different experiments, and multiple instances of each experiment. Present results as the average over runs, and note standard deviations. See the Lab 1 assigment for more details.

Here are some measures that you may want to use to present results:

Total average run times.

Speed-up.

Speed up =  (Sequential Time) / (Parallel Time)

Efficiency.

Speed-Up/(P)    P is the number of cores or threads

Example of using some tools for running experiments

Example Code to try out

Here is some example code you can try out with these tools (either cd into this directory to try out or you could make a copy in your cs87 subdir):

cd cs87
cp -r ~newhall/public/cs87/experiment_tools .

You copied over some example scripts that may be useful in helping you to write similar scripts for running experiments:

run.sh: example bash script for running a bunch of experiments with different input parameters. To run:
```
  ./run.sh
```

run_outfile.sh: example bash script for running a bunch of experiments with different parameters and capturing all output to a file. Note the bash command syntax for running 'time matrixmult' and redirecting its output to a file (&>>: appends stdout and stderr to the specified outpufile). To run:

  # output will go to file named "myoutputfile" in your home directory
  ./run_outfile.sh  ~/myoutputfile

  # or to default file name "output" in the current directory
  # you can't write into my directory so it will only work if you copied
  # this script over into your subdirectory
  ./run_outfile.sh

killmytests.sh: an example script that to kill all your experiments. Always:
1. first pkill -9 the run script (run_outfile.sh in this example)
2. then pkill -9 your program executable (matrixmult in this example)
This script is very useful if you want to stop all your experiments from running, particularly if you want to do so in the middle of the night, you can schedule a cron job to run this script.

Tools/Utilities

see if a machine is idle

See the lab 1 page, but run who to see who is logged in, and top -H to see what is running on a machine are good ways to guess if it is available for you to use. Let top run for a minute or so to be certain it is idle.

screen

Useful for logining in, starting something running in a screen session, and then loging out (what you are running in the screen session stays running).

login and run screen to start a screen session: screen
start the script you plan to run in this session. I suggest running a bash script of experiements inside a script session (details below), or run a bash script that redirects output of each run to a file.
detach from the screen session: Cntl-A d
then logout of the computer if you'd like

To reattach to a screen session:

login to the computer
run screen -r

And you can attach and detach as many times as you'd like from the same screen sesson.

script and dos2unix

script captures a terminal session to a file. dos2unix cleans up the resulting file after quit script. See more details here: script and dos2unix

Python is a nice language to use to process the resulting typescript file to pull out timing results for related runs, compute average, std dev, spit out results in a nice form.

bash script

Write a bash script to fire off a bunch of experiments. Then just run the bash script and come back later when done. Its good to have some echo commands in your bash script to print out some information about particular runs: this will help with your post-processing scripts to find timing results and compute averages and std dev. With the lab01 starting point code was one example bash script, try that out to see what it does. I also have links to bash programming off my help pages: bash

When you create a bash script, make sure the file is executable to run it:

vim runexper.sh  # or emacs 
chmod 777 runexper.sh  # set to executable 
ls -l
./runexper.sh

Also, try running your bash script a few times before starting it up in screen and coming back later: make sure it is doing what you think it is. You can always comment out the call to gol program in the script to see if it is doing what you want (# is the bash single line comment):

#!/usr/bash

for((n=256; n <= 2049; n=n*2))
do
for ((t=1; t <= 32; t=t*2))
  do
     echo ""
     echo "gol -t $t -n $n -m $n  -k 1000"
#    time ./gol -t $t -n $n -m $n  -k 1000 -x
  done
done

If I run the above bash script I'll see all the calls to echo print out parameter configs and see if they are what I expect. Then uncomment and run.

In your bash script make sure you run time ./gol ... to collect runtimes.

cron

You can add a cron job to run your script at a particular date and time by editing the crontab file on the machine you are running your experiments (ex. on chervil):

  $ ssh chervil
  $ crontab -e

Then add a line like this to run the killmytests.sh script at a specific time and date (at 8pm (20:00), on January (1) 31 :

                
  0 20 31 1 * /home/newhall/public/cs87/experiment_tools/killmytests.sh

Similarly you can add a cron job to run your experiements at a specific time (here I'm starting them at 4:05 am on February 3):

  5 4 3 2 * /home/newhall/public/cs87/experiment_tools/run_outfile.sh ./mytests

NOTE: please after your cron jobs run, make sure to run crontab -e again to remove them from the crontab file (so that cron doesn't run them every year on this date at this time until we remove your account).

Let's Try some stuff out

Let's try some of these steps together in the example you copied over.

First lets try out screen and script:

ssh into a machine, see if idle
start screen
cd to directory containing gol and bash script
start script
start bash script to run experiments
hit return and type exit (to terminate script...good practice)
detach from script
run top -H just to see if program is running
log out of machine

Then later, ssh back in the machine and re-attach to screen session.

On a different machine, create a cron job to run a test script and another to kill my test script and running test programs.

run date to get the current time

run crontab -e and let's start the run_outfile.sh in 2 mins and kill one minute later. In this example, let's say it is Feb. 1st at 1:30pm right now:

$ crontab -e
# start run_outfile.sh (with output file mytests in your home directory) 
# at 1:32pm on Feb. 1  (minute:32, hour:13, day:1, month:2)
32 13 1 2 * /home/newhall/public/experiment_tools/run_outfile.sh ~/mytests  
# run killmytests.sh at 1:33pm on Feb. 1
32 13 1 2 * /home/newhall/cs87/experiment_tools/killmytests.sh

Now, let's run top -H and see what happens.