CS 31 Lab 9

This lab has two parts:

Checkpoint: Thursday, December 7 push to git repo before 11:59pm.

Complete lab: Due by 11:59 pm, Thursday, December 14, 2023

By the checkpoint deadline, your program should implement functionality up to and including step 3(d) in Section 7.2.

Check the Lab 9 Partners link for your lab partner.

CS31 Partner Etiquette and Expectations.

1. Lab Goals

Parallelize a shared-memory application using the pthreads library.
Protect threaded code with synchronization primitives.
Revisit 2D arrays and file I/O in C.
Gain more expertise with C pointers and dynamic memory allocation.
Design input files to test the correctness of your solution.
Practice debugging threaded programs.

2. Lab Overview

This lab is designed to give you some practice writing and debugging multi-threaded pthreads programs, using synchronization primitives, and to give you some experience designing and running some parallel performance experiments.

You’ll implement a parallel version of Conway’s Game of Life using your Lab 06 sequential solution as a starting point for this lab. See the Lab 06 assignment to remind yourself about the sequential version of the program and the rules of the game.

Your Lab 9 solution will use the same method as Lab 06 for initializing a dynamically allocated board from an input file given as a command line argument. In addition to supporting the same command line options as Lab 06, this version of GOL will add three new command line options related to the parallelization. These command line options are outlined in detail below.

You must begin Lab 9 by first fixing any errors in your Lab 06 sequential solution. You can start on this lab if the only problem with your previous solution is that the wrapping at the edge doesn’t work properly: as long as your Lab 06 implementation for all the middle (non-edge) cells works, that’s enough to move on to starting to parallelize your implementation. You can go back and fix the wrapping at the edge at the end.

During the lab meeting time, we may help you analyze your Lab 06 solution for any design choices that may make parallelization difficult, so that you can fix those before parallelizing.

After reading through the lab details and requirements, see Section 7 for tips on how to get started.

3. Lab Starting Point Code

3.1. Getting Your Lab Repo

Both you and your partner should clone your Lab repo into your cs31/labs subdirectory:

Get your lab ssh-URL from the CS31 GitHub organization. The repository to clone is named Lab9-userID1-userID2, where user1 and user2 are the user names of you and your lab partner.
cd into your cs31/labs subdirectory:
```
$ cd ~/cs31/labs
$ pwd
```

Clone your repo

$ git clone [your Lab9-userID1-userID2 url]
$ cd Lab9-userID1-userID2
$ ls
big.txt  colors.h  design_worksheet.txt  Makefile  square100.txt  testedges.txt

3.2. Starting Point Files

The files included in your repo are:

Makefile: builds the gol executable file
colors.h: contains an array of some different color3 values for use with ParaVis (you can optionally use this in your solution)
design_worksheet.txt: a worksheet to help you think through some of design challenges in this lab.
big.txt, testedges.txt, square100.txt: example input files. You can copy more test files from your Lab 06 repo into this one, and you should create new input files too (particularly big ones to test the performance of your parallel solution).

One thing to note, is that there is no gol.c file in the starting point code! You’ll add this in from the Lab 06 solution that you or your partner previously built.

3.3. Add gol.c to your repo and modify it

You need to copy over your gol.c from the sequential game of life lab into this repo as the starting point for this lab:

$ cd ~/cs31/labs/Lab6-userID1-userID2/
$ git pull
$ cd ~/cs31/labs/Lab9-userID1-userID2/
$ cp ~/cs31/labs/Lab6-userID1-userID2/gol.c .
$ git add gol.c

3.3.1. modifications to sequential `gol.c` before parallelizing

The first thing to do is to make sure that your gol.c file has the following includes (you may have already added colors.h in Lab 6):

#include <pthreadGridVisi.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <time.h>
#include <string.h>
#include <pthread.h>
#include "colors.h"

Next, you will need to change your play_gol function prototype to match that of the thread function argument passed to pthread_create:

void *play_gol(void *args)

This means you will have to modify it to return a value (returning NULL is fine), and you need to explicitly re-cast the void * parameter to a struct gol_data * inside this function in order to access field values of the passed argument. I recommend doing this by changing your function parameter name to a name not used inside your function (args for example) and then adding a local variable with your old parameter name (data for example) that you initialize to the re-cast void * parameter. This way your function code that refers to data will work un-changed. For example:

// change sequential play_gol function prototype from this:
void play_gol(struct gol_data *data);
// to this:
void *play_gol(void *args);  // be sure to change the return type too!


// change name of parameter in function definition and local variable
// of old paramter name and types that is assigned re-cast void* param
void *play_gol(void *args) {
  struct gol_data *data;  // add local variable of the type we know is passed in
    ...
  data = (struct gol_data *)args;  // initialize it to the re-cast args value
  // function code can use data to reference fields in passed struct
    ...
  return NULL; // this function now needs to return some value!!
}

For the OUTPUT_VISI mode in the sequential version, we hid a call to pthread_create that was needed by the ParaVis library. This code was hidden in a wrapper function named connect_animation that was called from main. Now that your code is going to create its own threads, you will not be able to use connect_animation in this lab.

As part of completing the lab, you will call pthread_create on your own in a loop, like we did in class and the in-lab example code. Those calls to pthread_create will do the same job as connect_animation, so you no longer need that function in your program.

In other words:
```
// this line that currently exists main
connect_animation(play_gol, &data);
// will eventually be replaced by something like this, once you get
// around to creating threads yourself:
for (...) {
   pthread_create(..., NULL, play_gol, ...); // some details for you to figure out
}
```
At this point in the lab, we’re not ready to make the changes we need to make the VISI code work seemlessly, so for now, just comment out the calls to connect_animation and run_animation. This means your VISI mode is broken, but we will talk about how restore that functionality when we get to Section 4.7.

After making these changes, compile and run your program, and test that this change to your sequential version of gol still works as before (with the exception of VISI mode being broken).

4. Lab Details

4.1. Command line arguments

Your program will support the same command line arguments as the Lab 06 version, and will add three new command line arguments:

An integer to specify the number of threads (the degree of parallelism).
A 0/1 flag to specify how to parallelize the GOL program (0: row-wise grid cell allocation, 1: column-wise grid cell allocation). More details below.
A 0/1 flag to specify if the per-thread board allocation should be printed out or not at start up (0:don’t print configuration information, 1: print allocation information).

$ ./gol
usage: ./gol infile.txt output_mode[0,1,2] num_threads partition[0,1] print_partition[0,1]

Here are some example command lines:

# run with config values read from file1.txt, run in run mode OUTPUT_NONE (0)
# create 8 threads, use row-wise partitioning, print per-thread partitioning details:
$ ./gol file1.txt  0  8  0  1

# run with config file file2.txt, run in run mode OUTPUT_ASCII (1),
# create 15 threads, use column-wise partitioning, no partitioning details printed:
$ ./gol file2.txt  1  15  1  0

The print per-thread board allocation command line option will help you debug the grid cell allocation scheme to ensure that you are correctly allocating rows or columns across the worker threads. We suggest you try running with print partitioning when using run mode 0 (OUTPUT_NONE), as OUTPUT_ASCII will clear the terminal too quickly to view the partitioning information. See more details below about partitioning and the print partitioning option.

Your program should handle badly formed command lines and check for bad values entered (e.g., a negative number of threads). It should print out an error message and exit for badly formed command lines and handle most bad input values similarly.

4.2. Structure of the parallel game of life

The main thread should begin by initializing the game board by reading the input file in the same way as the sequential version. The input files format has not changed. Your timing code should not include this step but should include all other parts of the execution.
After initializing the game board, the main thread spawns off worker threads that will play multiple rounds of the game, with each thread computing just its portion of cells of the new board each round (based on the partitioning information passed as a command line option). Grid cells are allocated to threads using either a row-wise or column-wise partitioning specified by a command line argument. If a command line is entered with more threads then the number of rows/columns to partition by, your program can print out an error message and exit. In other words, you don’t need to handle a case in which there are threads that have no rows/columns assigned to them. The total number of live cells each round should be calculated in parallel with each thread contributing to its portion of live cells to the total value. Note: If the print_partitioning option is specified, threads should print out their partition information before running rounds of gol.
If running in OUTPUT_ASCII mode, you should designate one thread to be the printing thread, and only that thread should call print_board to print the entire board after each round. Having threads print out just their part of the board will result in crazy interleaved output: a race condition! (Synchronizing parallel board printing so that the board is printed in order is beyond what you need to support, and it would be less efficient than if just a single thread prints the entire board.)
If running in OUTPUT_VISI mode, each thread should update its part of the color3 image buffer in parallel and then make a call to draw_ready when it is done updating its part.
At the end of the simulation’s execution, the main thread should print out the elapsed time (if run in mode OUTPUT_NONE or OUTPUT_ASCII). You will not be graded on absolute performance, but your code should run faster when run in parallel. As a sanity check, try running a large board instance (e.g., 1500x1500) for many rounds (e.g., 100+). You should see a substantial improvement between running with one thread and a handful of threads (2-8). It’s that sort of relative improvement that we’ll be looking for. Note: for comparison timing, run in mode OUTPUT_NONE, and make sure it does not include any extra synchronization or calls to usleep that may be part of the other run modes.

4.3. Synchronization

You will need to think about where and how to synchronize thread actions to remove race conditions. For example, you need to ensure that no thread starts the next round of play until all threads are done with the current round, otherwise threads will not read consistent values for the given round. If printing is enabled, you need to ensure that the printing thread prints a consistent version of the game board. If run in OUTPUT_ASCII mode, you need to ensure that the printing thread prints a consistent version of the game board. You will also need to ensure that you don’t have a race condition when updating the global live cell counter. Any synchronization steps that are only necessary for a specific command line option should not be included in runs that do not specify that option.

4.4. Partitioning

You will implement two different ways of parallelizing the computation: one partitions the board’s rows across threads, the other partitions the board’s columns across threads. As an example, suppose that the board is 8x8, and that there are 4 threads. Here is how the threads (with logical tids 0-3) would be assigned to the grid using row and column partitioning, respectively:

row partitioning (option 0)             column partitioning (option 1)
----------------                        ----------------
0 0 0 0 0 0 0 0                         0 0 1 1 2 2 3 3
0 0 0 0 0 0 0 0                         0 0 1 1 2 2 3 3
1 1 1 1 1 1 1 1                         0 0 1 1 2 2 3 3
1 1 1 1 1 1 1 1                         0 0 1 1 2 2 3 3
2 2 2 2 2 2 2 2                         0 0 1 1 2 2 3 3
2 2 2 2 2 2 2 2                         0 0 1 1 2 2 3 3
3 3 3 3 3 3 3 3                         0 0 1 1 2 2 3 3
3 3 3 3 3 3 3 3                         0 0 1 1 2 2 3 3

When the number of threads does not evenly divide the dimension by which you are partitioning, divide the rows (or columns) up so that there is at most a difference of 1 assigned row (or column) between any two threads. For example, for an 8x7 board and 3 threads, you would partition rows and columns like this among the 3 threads (with tids 0-2):

row partitioning (option 0)             column partitioning (option 1)
----------------                        ----------------
0 0 0 0 0 0 0                           0 0 0 1 1 2 2
0 0 0 0 0 0 0                           0 0 0 1 1 2 2
0 0 0 0 0 0 0                           0 0 0 1 1 2 2
1 1 1 1 1 1 1                           0 0 0 1 1 2 2
1 1 1 1 1 1 1                           0 0 0 1 1 2 2
1 1 1 1 1 1 1                           0 0 0 1 1 2 2
2 2 2 2 2 2 2                           0 0 0 1 1 2 2
2 2 2 2 2 2 2                           0 0 0 1 1 2 2

In this partitioning scheme, a single row (or column) is assigned to exactly one thread; a single row (or column) is never split between two or more threads. Thus, in the above example threads 0 and 1 have one more row each of work to do than thread 2 in the row-wise partitioning, and thread 0 has one more column of work to do than threads 1 and 2 in the column-wise partitioning. Note how the imbalance of rows/columns is assigned to threads in this example. Your solution should assign them in the same way: every thread should either be assigned the same number of rows or columns or differ by at most one.

4.5. Printing Partitioning Information

When the 5th command line option is 1, your program should print thread partitioning information. Each thread should print:

Its thread id (the logical id number that you assign, not the pthread_self value).
Its start and end row index values.
The total number of rows it is allocated.
Its start and end column index values.
The total number of columns it is allocated.

The threads will run in different orders, so you may see each thread’s output in any order, which is fine. If you’d like, you can try to avoid some interleaving of individual threads output by calling fflush after printf:

printf("tid %2d: ...\n", mytid, ...);
fflush(stdout);   // force the printf output to be printed to the terminal

4.5.1. Example Partitioning Output

Here are some example runs with different numbers of threads partitioning a 100x100 grid. (I’m using a config file I created named square100.txt included in your repo.) The total number of rows and columns per thread are shown in parentheses. Your output does not need to be identical to this, but it must include all the required parts.

# 9 threads, column-wise partitioning
$ ./gol square100.txt 0 9 1 1
tid  0: rows:  0:99 (100) cols:  0:11 (12)
tid  1: rows:  0:99 (100) cols: 12:22 (11)
tid  2: rows:  0:99 (100) cols: 23:33 (11)
tid  4: rows:  0:99 (100) cols: 45:55 (11)
tid  3: rows:  0:99 (100) cols: 34:44 (11)
tid  5: rows:  0:99 (100) cols: 56:66 (11)
tid  6: rows:  0:99 (100) cols: 67:77 (11)
tid  7: rows:  0:99 (100) cols: 78:88 (11)
tid  8: rows:  0:99 (100) cols: 89:99 (11)

# 6 threads, row-wise partitioning
$ ./gol square100.txt 0 6 0 1
tid  0: rows:  0:16 (17) cols:  0:99 (100)
tid  1: rows: 17:33 (17) cols:  0:99 (100)
tid  3: rows: 51:67 (17) cols:  0:99 (100)
tid  2: rows: 34:50 (17) cols:  0:99 (100)
tid  4: rows: 68:83 (16) cols:  0:99 (100)
tid  5: rows: 84:99 (16) cols:  0:99 (100)

# 17 threads, row-wise partitioning
$ ./gol square100.txt 0 17 0 1
tid  0: rows:  0: 5 (6) cols:  0:99 (100)
tid  1: rows:  6:11 (6) cols:  0:99 (100)
tid  3: rows: 18:23 (6) cols:  0:99 (100)
tid  2: rows: 12:17 (6) cols:  0:99 (100)
tid  4: rows: 24:29 (6) cols:  0:99 (100)
tid  5: rows: 30:35 (6) cols:  0:99 (100)
tid  6: rows: 36:41 (6) cols:  0:99 (100)
tid  7: rows: 42:47 (6) cols:  0:99 (100)
tid 10: rows: 60:65 (6) cols:  0:99 (100)
tid 11: rows: 66:71 (6) cols:  0:99 (100)
tid 12: rows: 72:77 (6) cols:  0:99 (100)
tid 13: rows: 78:83 (6) cols:  0:99 (100)
tid  8: rows: 48:53 (6) cols:  0:99 (100)
tid 15: rows: 90:94 (5) cols:  0:99 (100)
tid 14: rows: 84:89 (6) cols:  0:99 (100)
tid  9: rows: 54:59 (6) cols:  0:99 (100)
tid 16: rows: 95:99 (5) cols:  0:99 (100)

Note that for the run with 17 threads, there are enough threads to see "out of order" execution due to the exact scheduling of threads on the CPUs. This is fine and expected behavior (the exact ordering of thread printing depends on when threads are scheduled to run on cores by the OS CPU scheduler).

4.6. Correctness

In addition to printing out per-thread board partitioning information to verify correct portioning, you should also ensure that your parallel version solves GOL correctly. To do this run both your parallel version and your sequential version on some of the same input files in OUTPUT_ASCII run mode (run mode 1). The results should be identical.

If you see incorrect and/or odd output, either your partitioning of the GOL game board is incorrect, or you are missing synchronization, or both. You can remove the (multiple) calls to system("clear") to see the history of each round which should help you debug incorrect GOL computation. Of course, try this first using a small number of iterations!

4.7. Changes to ParaVis animation for pthreads

You should not be focusing on this section at all until after you have the checkpoint complete.

The ParaVis library supports animating pthread programs. You only need to make a few small changes to get the OUTPUT_VISI mode to work for the parallel version of GOL.

In the sequential version of gol.c, you called the function setup_animation. You will need to call that function in the parallel version too, but we will need to modify the setup_animation function to handle the fact that we now have many threads running the simulation.

Find the setup_animation function in your code. Notice that there is a call to function named init_pthread_animation. This function takes the number of threads as the first parameter:
```
visi_handle init_pthread_animation(int num_tids, int rows, int cols, char *name);
```
In the sequential version of gol.c, we hard-coded the number of threads to be 1 since only one pthread was created to run play_gol (which we hid in the connect_animation function, to be discussed in a moment):
```
// Previously in setup_animation, you had 1 thread hardcoded in setup_animation
int num_threads = 1;
data->handle = init_pthread_animation(num_threads, data->rows, data->cols, visi_name);
```
In the parallel version, you need the num_threads variable to equal the number of threads you are running in your simulation:
```
// Be sure to set the variable num_threads equal to the number of threads
// running the simulation.
int num_threads = data->num_threads; // this name may differ in your implementation
data->handle = init_pthread_animation(num_threads, data->rows, data->cols, visi_name);
```
As discussed in Section 3.3.1, you will comment out your call to connect_animation because that is now replaced with the loop that creates your threads.
You should leave the line to run_animation in main unchanged, but that function call must be called before you join your threads. You may need to reorganize the structure in this part of main to be sure you call run_animation before joining the threads.
In addition to the changes above that you made in main, your sequential code had a call to update_colors that set the color3 buffer. (You may have called this function something different, though we recommended you use the name update_colors at the time.)

You need to modify the update_colors function so that it is multi-threaded. As it’s currently written, this function colors every pixel of the image buffer. Since you call update_colors from play_gol, every thread calls this function. But, we don’t want every thread coloring every pixel. Instead, we only want each thread to color the pixels in its partition of the board. Update this function so that it uses the partitioning information to only color the pixels that the thread is responsible for.

In addition, you should have threads use different colors to color the board. The colors.h file (which you can open in your editor to view) contains an array called colors that you can use so that, e.g. thread 0 colors the board using colors[0], and thread 1 colors the board using colors[1], etc. You will need to consider how to use this array when there are more than 8 threads.
In play_gol, in addition to making sure that every thread calls update_colors, you need to make sure every thread also calls draw_ready when it is done updating the color3 buffer.

Further documentation about using the ParaVis interface is available as a comment at the top of the pthreadvisi_example.c program from Week 12. Also, more detailed documentation is available in the pthreadGridVisi.h header file that you can open in an editor:

$ vim /usr/local/include/qtvis/pthreadGridVisi.h

5. Sample Output

Here is output from a run of my gol program showing:

Example column-wise and row-wise partitioning output for different numbers of threads is shown in Section 4.5.1. Please follow our output format for your printing partitioning functionality.

Other program ouput should be identical to that from Lab 06.

Example runs showing the time improvement from the parallelization is shown below. This output was generated on a lab machine with 12 cores, using a configuration file (big.txt) that specified a 4000x4000 board and 500 iterations. The program was run for different numbers of threads (1, 4, 8, 12, and 20). Notice the time improvement with increased parallelization up to the number of cores (from 298 seconds for 1 thread to 50 seconds for 12 threads). Beyond the total number of cores (12, in the case), adding more threads does not help to improve total runtime.

$ ./gol big.txt 0 1 0 0
Total time: 297.943 seconds
After 500 rounds on 4000x4000, number of live cells is 17

$ ./gol big.txt 0 4 0 0
Total time: 78.617 seconds
After 500 rounds on 4000x4000, number of live cells is 17

$ ./gol big.txt 0 8 0 0
Total time: 71.223 seconds
After 500 rounds on 4000x4000, number of live cells is 17

$ ./gol big.txt 0 12 0 0
Total time: 50.128 seconds
After 500 rounds on 4000x4000, number of live cells is 17

$ ./gol big.txt 0 20 0 0
Total time: 53.965 seconds
After 500 rounds on 4000x4000, number of live cells is 17

These are runs with just the normal Makefile gcc command (there is no optimiztion compiler flag used to build this code). With compiler optimization each run would be much faster, but the trend of improvement from the multithreaded would remain.

6. Lab Requirements

Your program must satisfy all the requirements of the sequential gol lab, except that if you did not get the edges to work correctly, you do not need to fix that part (i.e. if your edge cells are not quite in your sequential gol, that is okay for your parallel solution).
Your program should handle the new command line arguments in addition to the previous ones.
The total_live global variable that should be updated each round to the number of live cells. Its value should be computed in parallel each round with each thread updating the portion of total_live value for a round from its portion of the grid.
The only other global variables you may add are any needed for pthread synchronization. Any values that threads need should be passed to them via the (void *arg) parameter. Look at the example pthread programs from Week 12, from in class, and in the textbook for some examples of how to do this. You will likely need to add fields to your struct gol_data type that are needed for threads to perform gol on their part of the grid (e.g. which rows and columns this thread is operating on and the logical thread id number).
The control flow in your sequential main function doesn’t need much change. You need to parse the additional command line arguments and make calls to pthread_create. If you find you are adding a lot of code to main, stop and create a function to do this work and call that function from main: main should not get a lot longer, and its main control flow really should not change.
For full credit, your solution should be correct, robust, and free of valgrind errors (when run in modes OUTPUT_ASCII or OUTPUT_NONE). "Still reachable" errors are okay.
You are NOT responsible for running valgrind in the OUTPUT_VISI mode (the ParaVis visualization mode). "Still reachable" errors are okay.
All TODO comments in the starting point code should be removed in your final submission. These are notes to you for what you need to add into the starting point code and where you need to add it.
Use good modular design, and comment your code well.
Last lab! 🎉 Good luck! 🥳

7. Getting Started

7.1. Get sequential gol.c ready for parallelizing

If you are working with a different partner than you had for Lab 6, start by deciding whose gol.c solution you will use for a starting point for this lab.
After copying your Lab 06 gol.c into your repo, make the modifications described in Section 3.3.1. Compile and run to make sure it works.
Then fix any any parts of your Lab 06 gol.c that will make parallelizing it difficult, and that are memory inefficient. As necessary, we may have included comments as part of your gradesheet that are in your Lab 6 repo. We can also look at your solution in lab on Wednesday and help you figure out changes that you may need to make.

7.2. Get Started on Parallelization

Start by reviewing Section 14.2 and 14.3 of the textbook and the in-lab examples.
The design_worksheet.txt file will not be graded, but it contains helpful ideas for outlining the design of your solution. You should consider reviewing this before starting to write code.
We recommend that you implement and test functionality in this order:
1. Add fields to the struct gol_data that you will need to let each thread know which part of the partitioned board it is responsible for and any code needed to initialize these fields. Some of these fields may be initialized by the main thread before spawning and other fields may be initialized by the threads themselves before they start playing rounds of gol. (You can always go back and add more fields to the struct later if you need to.)
2. Without spawning any threads, write the partitioning function, i.e. figure out which section of the board would be assigned to each thread. Test the function’s implementation first without spawning any threads by adding some calls in main to test different sizes and partitioning schemes (and remove these calls before moving on to the next step).
3. Implement the main thread create-join control flow, and test each thread’s row/column partitioning. Spawn the number of threads specified at the command line and have each thread print its partitioning information (including its logical tid value) in play_gol and then return before actually running the remainder of the code in play_gol. (Be sure to remove this return after you’ve tested this and before moving on to step (e) below).
4. Add code so that each thread prints out its own partitioning information only when the final command line argument is non-zero.
5. Write and test the code to play one round of the GOL simulation, with the work divided among the threads. This will include support for all run modes. Start with testing OUTPUT_ASCII on a small file with just one round. Once that is working, test OUTPUT_VISI to be sure things are working correctly.
6. Compute all rounds of the GOL simulation, avoiding race conditions.
7. Count the number of live cells in a multi-threaded way, avoiding race conditions.
8. Check timing for multi-threaded in OUTPUT_NONE mode. You should see performance improvements with multi-threaded vs. sequential for large boards.
9. Make sure all other requirements have been met.

8. Tips

Look at the Lab 06 page for lots of tips related to sequential GOL that may be useful here too.
Implement and test incrementally!
Most of your changes will be to the play_gol function. You should also add helper functions to implement any new functionality. main should not get much longer, so if you need to add alot of code to main, or you are adding the same code to multiple parts, functionize it and call it from main. Plus you can independently test this code by adding debugging calls to this function from main!
You will need to add fields to your struct gol_data type that are needed by threads to perform gol on their part of the grid.
Use gdb as you incrementally implement and test to find and fix bugs as you go. Use valgrind as you go to catch memory access errors as you make them.
As you test and debug, you may find it useful to use config files with small numbers of iterations. You may also want to comment out the call to system("clear") so you can examine the results of every intermediate iteration.
Running in OUTPUT_VISI and OUTPUT_ASCII are helpful for finding and debugging partitioning errors and other correctness errors.
ParaVis with colors.h: If you’d like to use some of the colors I have defined in colors.h you can set an RGB value in the color3 image array to a value in colors like this:
```
   // assume buff is name of the color3 array:
   buff[buff_i] = colors[3];  // set this pixel to the 3rd color (Yellow)
```
If you want to use the colors to help with debugging row and column partitioning you can create the rainbow pattern of my examples by having each thread be assigned a color based on its thread id, and have it print its color for dead cell values in its partition and print a common color (eg black) for cells that are alive. For example, if data→tid stores the value of a thread’s logical id, you could do something like this:
```
   if(live) {
     buff[buff_i] = c3_black;  // set live cells to black
   } else {
     // assumes your thread id is stored in your struct as tid;
     // modify the line below if you called it something different.
     buff[buff_i] = colors[((data->tid)%8)]; // dead to my tid color
   }
```
Don’t forget that the ParaVis library displays coordinate (0,0) at the bottom left of the visualization, and coordinate (rows-1, 0) at the top left.
The man pages for the various pthread functions are very helpful.
You can use the perror() function to print out error messages from failed system and pthread library calls. perror prints out detailed, human-readable information about the error that occurred. You call it passing in a string with program-specific error message. For example:
```
big_buff = malloc(...);
if (big_buff == NULL) {
  perror("malloc of big buffer failed");
  exit(1);
}
```
See the man page for perror for more information and for the include files you need to add to your program to use it.
You will need to use some synchronization in your solution. You may use any of the pthread synchronization primitives, including mutexes and barriers. Remember that before using any of these, you need to initialize them. You also need to make sure that all threads have access to the shared synchronization variables (either pass a reference to them at thread creation or declare them as static global variables). Here are some code snippets for using the barrier synchronization primitive:
```
// declare a global pthread_barrier_t
static pthread_barrier_t barrier;

// initialize it in a function (e.g., main) before using it:
if(pthread_barrier_init(&barrier, NULL, num_threads)){
   perror("pthread_barrier_init");
   exit(1);
}

// each thread can then call pthread_barrier_wait to synchronize:
ret = pthread_barrier_wait(&barrier);
if(ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD) {
  perror("pthread_barrier_wait");
  exit(1);
}
```

9. Submitting

Here are the commands to submit your solution (from one of you or your partner’s cs31/labs/Lab9-userID1-userID2 directory:

$ git add gol.c *.txt
$ git commit -m "Lab 9 completed"
$ git push

And of course, if you want to submit any nifty starting point world config files you come up with, please do:

$ git add coolworld.txt
$ git commit -m "cool world"
$ git push

If you have difficulty pushing your changes, see the "Troubleshooting" section and "can’t push" sections at the end of the Using Git for CS31 Labs page. And for more information and help with using git, see the git help page.

10. Handy Resources

General Lab Resources

Class EdSTEM page for questions and answers about lab assignment
CS31 github org
Git help and Git for CS31 labs
Some Useful Unix Commands
vi (and vim) quick reference
man and Manual pages documentation for libraries and commands

C programming Resource

Week 8 in-lab examples: ParaVis, and other sequential GOL related examples.

CS31 Lab 9: pthreads Game of Life