CS87: Lab 1

This lab should be done with your lab 1 partner

Lab 1 Goals:

Refresher on pthread programming and thread synchronization
Refresher on C programming, using libraries, and C programming tools
- 2D arrays and file I/O in C
- pointers, dynamic memory allocation, and pass by reference
- debugging threaded programs (gdb and valgrind)
- using getops library
Designing and running experiments for scalability analysis.
Evaluating experimental results.
Writing a report describing your experimental design and results.

Developing hypotheses about the scalability pthreads GOL
Designing experiments to test your hypotheses about scalability.
Running inital experiments and analyizing the results to see if you need to re-design your exepriments.
Collecting and evaluating experimental results.
Presenting your experimental study in a written report.

Lab 1 Starting Point Repo

Both you and your partner should:
1. Create a cs87/labs subdirectory on the CS system:
```
mkdir cs87
mkdir cs87/labs
cd cs87/labs
```
2. Get your Lab01 ssh-URL from the GitHub server for our class: CS87-s16
3. On the CS system, cd into your cs87/labs subdirectory
4. Clone a local copy of your shared repo in your private cs87/labs subdirectory:
```
git clone [your_Lab01_URL]
```
  Then cd into your Lab01-you-partner subdirectory.
If all was successful, you should see the following files when you run ls:
```
Makefile run.sh testedges.txt 
```
If this didn't work, or for more detailed instructions on git see: the Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).
Next, decide whose pthread GOL program you want to use as a starting point. Read over the pthread code requirments for this lab (particularly (4)), run your solutions, look at the code, run in valgrind, and then together decide whose solution you will use as your starting point. Leave a top-level comment in .c and .h file(s) indicating the authors of your starting point code.
Whomever's you are starting with (or if you want to start from scratch together that is fine too), copy the over pthreads gol.c (and any other .h or .c files) into your cs87 Lab01 repo to use as the starting point for this lab. Make sure the copied version compiles and runs (and fix the Makefile if not), then add these files you copied over to your git repo, commit, and push:
```
git add gol.c
git add Makefile
git add mytestfile.txt
...
git commit
git push
```
Your partner should now be able to do git pull to grab what you pushed to your shared repo.
You may also want to add to your repo config files that you may have from CS31 for initializing a GOL board (you can easily create new ones too, and you may want to create new ones for large scale experiments).

Pthread GOL Program Requirements

Your pthreads GOL solution should meet the requirements of the CS31 pthreads GOL lab assignment, with the following changes:

Turn your torus world into a 2D world. In the 2D version of GOL, edge cells only have 5 neighbors and corner cells have only have 3. For example, the x's mark the set of neighbors for the three grid cells indicated with a 1 in this world (note: 3, 5 or 8 neighbors):
```
1  x  0  0  0  0  0  0  0
x  x  0  0  0  0  0  0  0
0  0  0  0  0  0  0  0  0
0  0  x  x  x  0  0  0  0
0  0  x  1  x  0  0  0  0
0  0  x  x  x  0  0  0  0
0  0  0  0  0  0  0  0  0
0  0  0  x  x  x  0  0  0
0  0  0  x  1  x  0  0  0
```
You can test correctness with the testedges.txt file (patterns on edges should not wrap around the world across iterations).

Your code should compile with the -Werror=vla gcc flag, like in these examples (and add this flag to your Makefile):
```
# compile with -g for testing and debugging:
gcc -g -Wall -Werror=vla -o gol gol.c -lpthreads

# with optimizations on -02, -g off for experiments:
gcc -O2 -Wall -Werror=vla -o gol gol.c -lpthreads
```
If it doesn't, it means you have one or more functions in your program that use run-time variable sized array allocataion on the stack. This is bad. Instead, space should be dynamically allocated on the heap (via malloc) and then passed into functions that need to use it. You should fix this.

Support the following command line options, using this syntax ( options in [ ] are optional), and use the getops library to define command line options and parse argv command lines:

   ./gol -t t { -n n -m m -k k [-s] | -f infile} [-x] [-c] 
       -t  t:      number of threads
       -n  n:      number of rows in the game board
       -m  m:      number of columns in the game board
       -k  k:      number of iterations
       -s:         initialize to oscillator pattern (default is random)
       -f infile:  read in board config info from an input file
       -x:         don't print board after every iteration (default is)
       -c:         do column partitioning (default is row partioning)
       -h:         print out this help message

This syntax for command line arguments supports command line arguments in any order and optional command line arguments. You will use getopt to parse command line arguments (more details below).

Review your CS31 solution and fix any bugs (including running through valgrind), and fix any inefficiences in its parallelization. Your solution should only malloc up space once for the shared game board(s), and malloc up this space before any pthreads are created. All functions that need to access the board(s) should be passed the board(s) as parameters. There should be no mallocs in the function(s) that solves a single interation of GOL. Make sure that no calls to usleep are made when the program is run without printing out the game board at each iteration. And make sure it is implemented using good modular design and is well commented. Fix it if not.

Command Line Options

The command line options specify how to initialize the game and how to run the simulation.

There are two ways to run gol and initialize the game board:

The -f command line option means that the GOL problem size and initial state are read in from a file:
```
./gol -t 16 -f infile  # init to values read in from the file "infile"
```

or the -n -m -k [-s] command line options are used to specifying the dimensions, number of iterations, and the starting point world initial configuration:

./gol -t 4 -n 20 -m 30 -k 100 -s   # init a 20x30 board to oscillator pattern
./gol -t 4 -n 20 -m 30 -k 100   # init a 20x30 board to random pattern
                                # about 25% of randomly selected cells 
                                # to 1 works okay, but you decide
                                # use column-partioning across threads

These two modes are independent---you cannot have a command line with both the -f and -m options.

The -t option is required and works with both initialization modes.

The -x and -c options are optional and work with either initialization mode. The -x option is useful for running timing tests. Here are some example command lines:

# initialize 20x30 board to an oscillator pattern, run with 8 threads
# print out board to stdout after each iteration:
./gol -t 8 -n 20 -m 30 -k 100 -s  

# initialize program from data read in from "infile", run with 16 threads,
# column partition, do not print any output to stdout for this run
./gol -t 16 -f infile -c -x 

# run with 4 threads, init board from file, row partition, do not print to stdout
./gol -t 4 -f infile -x

Your program should handle badly formed command lines (e.g. print out an error message and exit instead of using incompatible or incomplete command line options).

File Format

The input file format is identical to that of the sequential (and pthreads) GOL labs from CS31: GOL lab.

Additional Code Requirements

In addtion to the requirements listed above, your GOL solution should:

Have timing code in your solution around the GOL main computation (don't include grid initialization or outputing the result to a file). Use gettimeofday.
Calls your program makes to usleep and system("clear") to implement animation of the game board should only be made when the program is run in board printing mode; runs with the -x command line option should not trigger any calls to usleep).
Use perror to report errors from system calls
Detect and handle all errors. Your program can call exit(1) if the error is unrecoverable (after printing out an error message of course) This means that any time you call a function that returns a value, there should be a check for error return values and they should be handled: do not just assume that every function call and every system call is always successful.
Your solution should be correct, robust, and free of valgrind errors
You code should be well-commented code and use good modular design. See my C documentation for C references and a C code style guide.

Scalability Analysis, Running Expirements, and Report

You will design and run experiments that will answer questions about the scalability of your solution. Design experiments that answer questions about scalability in different ways. Some parameters to consider varying in your experiements:

The grid sizes. Try different powers of two (64, 128, 256, 512, 1024, 2048, ...) You do not have to try every power of two between your min and max sizes, but run a several intermediate sizes between your smallest-sized and largest-sized boards.
The number of threads. Again, increase by powers of two: (1, 2, 4, 8, 16, 32, ...), and again you do not need to include every power of two between your min and max.
The number of iterations (try a couple different values that show differences across runs). You should run each experiment for at least 2 iterations so that there is some synchronization in your implementation, but I encourage you to try runs with more iterations as well to see if, and how, added synchronization steps affect scalability.
Compare row-wise vs. column-wise board partitioning.

When running scalability studies you need to make sure that you have problem sizes that are large enough to result in fairly long run times for at least some numbers of threads. For example, if the single threaded run takes 1.3 seconds and the 16 threaded run takes 1.2, it is pretty difficult to draw any conclusions about scalability by comparing these two small runtimes. Instead, you want some runs that take many seconds to many minutes.

As you run experiments, make sure you are doing so in a way that doesn't interfere with others (see the Hints and Resources section below for some tips about how to ensure this). Also, remember to remove all output statements from the code you time (run without printing the board or anything else).

You should run multiple runs of each experiment. For example, don't just do a single timed run of 16 threads for 512x512 and 10 iterations, but run this same experiment multiple times (5 or 10 times each). The purpose of multiple runs is to determine how consistent your times are (look at the standard deviation across runs), and to also see if you have some odd outliers (which may indicate that something else was running on the computer that interfered with a run, and that you should discard this result).

Be careful to control experimental runs as much as possible. If other people are using the machine you using to run experiments, their use can interfere with your results. Here are some resources to see what is going on on machines:

who to see who else is on a machine you are running on, and try another machine if you are not alone.
top to see the system load on the machine.
top -H to see individual threads running on the machine (you can see your 4, 8, 16, etc running).

Also, make sure that the grid sizes are not too large to fit in RAM. I don't think this really will be a problem, but double check your a run with your largest sizes before running experiments to see if the system is swapping. To do this as you run, in another window run:

watch -n 1 cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda2                               partition       2072344 0       -1

If you notice the Used value going above 0, your grid sizes are too big to fit into RAM (or there are too many other memory intensive applications running on this machine), and you need to find an idle machine.

Machine Info

The Lab Machine specs page contains information about most of the lab machines, including the number of CPUs (click on the Processors link). We have machines with 4, 8, and 16 cpus. I suggest picking one with 16 cores (x16) for your final experiments, but use others during development to leave these machines free for other groups to use.

As much as possible, it is good to run all your experiments on the same machine, or at least identical machines.

Written Report

Write a 2-4 page report (2 is min, 4 is max) that describes the results of your experimentation. You may use any software you like for writing the report. I have latex examples you can use if you want to try latex (see my links below). Use 11pt font and a single column layout with reasonable margins. (if you use my latex starting point, remove twocolumn at the very top).

Your report should include the following sections:

A description of the experiments you ran. For each experiment presented:
1. What is the hypothesis the experiment is designed to test?
2. What was the experiment?
3. Why does this experiment test this hypothesis?
4. What did you vary, what machine(s) did you run on, how many runs of each experiment.
5. Also, briefly describe what you thought the expected outcome would be and why? It is fine if your expected outcome was different than what your experimental results show.
Experimental results: present your results AND describe what they show. You can use tables or graphs to present the data. Choose quality over quantity in the data you present. A couple graphs and/or tables with data that show scalability results in terms of number of threads and problem size is fine. It is also okay to present and discuss negative results..."we thought the X experiment would be better because...,but as shown in table 2, the Y experiment performed better. This is (or we think that this is) because ...".
There is, however, a difference between negative results (well designed experiments that produce unexpected results) and bad results (results from poorly designed experiments).
Also, here is a nice description of strong and weak scalability. You do not need to present your result data in terms of strong and weak scalability functions. Speed-up (Sequential Time/Parallel Time), or even average run times (with stddev), over different axes of change may be more useful for this assignment. However, it may be useful to know the difference between these to when presenting some of your experimental results in your report. I don't want you to get too focused on strong vs. weak though.
Conclusions: what did you learn from your experiments? what do they say about the scalability of your solution? did they match your expectations? if not, do you have an idea of why not? did the row-wise and column-wise versions perform differently? Explain why do you think they did or did not?

Useful Utilities

For the pthread GOL implementation:

The CS31 pthreads GOL lab and the CS31 sequential GOL lab assignment have hints and resource for implementing the GOL and pthreads GOL programs.
The man pages for pthread functions are very helpful. In addition I have links to pthreads documentations and tutorials
A guide for using gdb to debug pthread programs
getopt command line parsing.
I also have an example program using getopts that you can copy over and try out:
```
cp -r ~newhall/public/getopts_example/* .
```
man pages for C library functions and system calls, and be careful about who is responsible for allocating and freeing memory space passed to (and returned by) these routines.
rand and srand: C library random number generator and seeding function (pass srand time(NULL) to seed with current time).
Some C programming references.
valgrind and gdb guides.
atoi converts a string to an integer. For example, int x = atoi("1234"), assigns x the int value 1234. See the man page for more info and similar functions: (man 3 atoi)

use perror to print out error messages from failed system calls:

FILE *in;
in = fopen(input_file, "r");
if(in == NULL) { 
  perror("fopen failed. Exiting\n");
  exit(1);
}

gettimeofday: to add timing code around jut the GOL game playing part of your code (you should already have this in your CS31 starting point, see the CS31 GOL assignment page for more details if not)
File I/O in C (you should not need to modify the file I/O part from your CS31 solution)

For Running Experiements and Writing the Report:

Machines for experiments

The Lab Machine specs page contains information about most of the CS lab machines, including the number of CPUs (click on the processors link). We have machines with 4, 6, 8, and 16 cpus. Pick one with 8 or 16 cores (x16) for your final experiments, but try out others during development.

Please do not use the CS teaching lab nor overflow lab machines for scalability testing during classes.

As much as possible, it is good to run all your experiments on the same machine, or at least on identical machines.

Finally, be aware of other groups wanting to run experiments on the 8 and 16 node machines. So, please don't run experiments on a machine for hours and hours or days and days. And, logout when you are not using a machine to run experiments. If you run who and someone is logged in, run top to see if they are actually running on the machine before deciding it is okay for you to use.

For experiments and report:

See my Tools for examining system and runtime state for some tools you can use to determine the
- top, who, xload to get system usage info
  top -H: a dynamic display of information about system resource usage of the current threads that are the biggest consumers of CPUs. You should see your matrix multiply threads rise to the top. You can also use this to see if someone else is using a machine for experiments (in which case you should pick another one).
- /proc/ more useful system info. You can cat out files in here to find system-wide or per-process information: meminfo prints information about system memory use; cpuinfo prints information about the cpus, including L1 cache and cache line sizes.
```
$ cat /proc/cpuinfo
$ cat /proc/meminfo
```
time ./a.out: show the total runtime, and user and system times associated with executing a.out.
You can write and use shell scripts for running a set of experiments. To run a shell script, first make sure that the shell script file has permissions set to executable (e.g. chmod 777 run.sh), and then just run it from the Unix prompt ( ./run.sh). Here is a example script for running a set of row-wise and column-size experiments on a several input files which specify the grid size and number of iterations (I've given my files names which include the size and the number of iterations so that the script output helps me easily identify the different results). This is just one example that demonstrates how to interate over a set of values for number of threads and over a set of GOL init files (you likely will want to make multiple run scripts that test certain things, and you may find the non-file init command line args more useful for run scripts):
```
#!/bin/bash

for f in infile_100_100.txt infile_500_500.txt infile_1000_1000.txt
do
  for ((t=1; t <= 64; t*=2))
  do
    echo ""
    echo "$f  $t row"
    for((i=0; i < 5; i+=1)) 
    do
      time ./gol -t $t -f $f -x
    done

    echo ""
    echo "$f  $t column"
    for((i=0; i < 5; i+=1)) 
    do
      time ./gol -t $t -f $f -x -c
    done
  done
done
```
See bash shell programing links for more information.
I'd run this bash script inside a script session to capture all its output to a file, and if it is long running, I may also run within a screen session (then I can start it late at night, wake up the next morning and see what happened).
See my Tools for running experiments and collecting performance measurements. screen, script, and bash scripts may be particularly useful. I suggest running experiments in script, which will save all terminal output in a file (the default name is typescript). Just run script outfile, start your experiments on the command line, and then run exit to exit out of script. All the terminal output will be saved in a file named outfile. You can clean up the output file a bit by running dos2unix on it:
```
$ script results
Script started, file is results
% ./run
  ...
% exit
Script done, file is results
$ dos2unix -f results
$ dos2unix -f results
```
latex links. latex is Unix's document writing software. You do not have to use latex for writing your report, but it has very good support for some things that are common in scientific writing such as representing mathematical expressions, so you may want to give it a try. I have some example latex documents that you can grab to use as a starting point in ~newhall/public/latex_examples/.

Submit

Before the Due Date, one of you or your partner should push your solution to github from one of your local repos to the GitHub remote repo. (it doesn't hurt if you both push, but the last pushed version before the due date is the one I will grade, so be careful that you are pushing the version you want to submit for grading):

From one of your local repos (in your ~you/cs87/labs/Lab01-partner1-partner2 subdirectory)

If that doesn't work, take a look at the "Troubleshooting" section of the Using git page.

CS87 Lab 1:

Pthreads and Scalability Analysis