This lab should be done with your lab partner.
First, both you and your partner should run update31 to create the handin directory structure for lab 9, then you should copy your lab 6 solution into your lab9 subdirectory:
$ update31 $ cd cs31/labs/09 $ pwd /home/your_user_name/cs31/labs/09 $ cp ../06/gol.c . $ cp ../06/*.txt . $ ls Makefile gol.c and some .txt filesWith the 09 starting point code, I've given you a Makefile and some .txt input files. You should create more .txt input files to test larger sized and longer running threaded version, but use smaller ones (and printing command line options) to test for correctness.
You will implement a parallel version of Conway's Game of Life using your sequential solution as a starting point for this lab. See the lab 6 assignment to remind yourself about the sequential version of the program, and about the rules of the game. You will evaluate the scalability of your implementation as you increase the problem size and the number of threads.
Your lab 9 solution will use the same method as lab 6 for initializing a dynamically allocated board from an input file given as a command line argument, and will have the same print or not-print option after every iteration.
You should begin lab 9 by first fixing any errors in your lab 6 sequential solution that you copied over into your labs/09 subdirectory.
A run with no printing enabled should only printout the final timing result of the gettimeofday timer and should not include any calls to usleep in the run.
$ ./gol usage: ./gol infile.txt print[0:1] ntids partition[0:1] print_alloc[0:1]Here are some example command lines:
# run with config values read from file1.txt, do not print the board after each round # create 8 threads, use row-wise partitioning, print per-thread partitioning details: ./gol file1.txt 0 8 0 1 # run with config file file2.txt, print the board after each round, create # 15 threads, use column-wise partitioning, don't print per-thread partitioning: ./gol file2.txt 1 15 1 0
The print per-thread board allocation command line option will help you debug the grid cell allocation scheme to ensure that you are correctly allocating rows or columns across the worker threads.
Your program should handle badly formed command lines and check for bad values entered (like a negative number of threads). It should print out an error message and exit for badly formed command lines and handle most bad input values similarly.
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 1 1 1 1 1 1 1 1 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 2 2 2 2 2 2 2 2 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3 3 3 3 3 3 3 3 3 0 0 1 1 2 2 3 3When the number of threads does not evenly divide the dimension by which you are partitioning, divide the rows (or columns) up so that there is at most a difference of 1 assigned row (or column) between any two threads. For example, for an 8x7 board and 3 threads, you would partition rows and columns like this among the 3 threads (with tids 0-2):
row partitioning column partitioning ---------------- ---------------- 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2 2 2 2 2 2 2 2 0 0 0 1 1 2 2In this partitioning scheme, a single row (or column) is assigned to exactly one thread; a single row (or column) is never split between two or more threads. Thus, in the above example threads 0 and 1 have one more row each of work to do than thread 2 in the row-wise partitioning, and thread 0 has one more column of work to do than threads 1 and 2 in the column-wise partitioning.
printf("tid %d my values ...\n", mytid, ...); fflush(stdout); // force the printf output to be printed to the terminalHere are some example runs of my program with different numbers of threads partitioning a 100x100 grid. The total number of rows and columns per thread are shown in parentheses (your output does not need to be identical to mine but must include all required parts):
# 9 threads, column-wise partitioning % gol big 0 9 1 1 tid 0: rows: 0:99 (100) cols: 0:11 (12) tid 1: rows: 0:99 (100) cols: 12:22 (11) tid 2: rows: 0:99 (100) cols: 23:33 (11) tid 4: rows: 0:99 (100) cols: 45:55 (11) tid 3: rows: 0:99 (100) cols: 34:44 (11) tid 5: rows: 0:99 (100) cols: 56:66 (11) tid 6: rows: 0:99 (100) cols: 67:77 (11) tid 7: rows: 0:99 (100) cols: 78:88 (11) tid 8: rows: 0:99 (100) cols: 89:99 (11) # 6 threads, row-wise partitioning % gol big 0 6 0 1 tid 0: rows: 0:16 (17) cols: 0:99 (100) tid 1: rows: 17:33 (17) cols: 0:99 (100) tid 3: rows: 51:67 (17) cols: 0:99 (100) tid 2: rows: 34:50 (17) cols: 0:99 (100) tid 4: rows: 68:83 (16) cols: 0:99 (100) tid 5: rows: 84:99 (16) cols: 0:99 (100) # 17 threads, row-wise partitioning % gol big 0 17 0 1 tid 0: rows: 0: 5 ( 6) cols: 0:99 (100) tid 1: rows: 6:11 ( 6) cols: 0:99 (100) tid 3: rows: 18:23 ( 6) cols: 0:99 (100) tid 2: rows: 12:17 ( 6) cols: 0:99 (100) tid 4: rows: 24:29 ( 6) cols: 0:99 (100) tid 5: rows: 30:35 ( 6) cols: 0:99 (100) tid 6: rows: 36:41 ( 6) cols: 0:99 (100) tid 7: rows: 42:47 ( 6) cols: 0:99 (100) tid 10: rows: 60:65 ( 6) cols: 0:99 (100) tid 11: rows: 66:71 ( 6) cols: 0:99 (100) tid 12: rows: 72:77 ( 6) cols: 0:99 (100) tid 13: rows: 78:83 ( 6) cols: 0:99 (100) tid 8: rows: 48:53 ( 6) cols: 0:99 (100) tid 15: rows: 90:94 ( 5) cols: 0:99 (100) tid 14: rows: 84:89 ( 6) cols: 0:99 (100) tid 9: rows: 54:59 ( 6) cols: 0:99 (100) tid 16: rows: 95:99 ( 5) cols: 0:99 (100)Note that for the 17 thread run there are enough threads to see "out of order" execution due to the exact scheduling of threads on the cpus.
$ ./gol oscillator.txt 1 4 0 0 start board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ @ @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - end board: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - @ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - total time for 21 iterations of 11x11 is 5.493663 secsIf you see odd output, either your parallelization of GOL is incorrect, or you are missing synchronization, or both. You can remove the call to system("clear") to see the history of each round to help you debug incorrect GOL computation (of course run for a small number of iterations).
In addition to the parallelization and correctness requirements described above, you solution must:
big_buff = malloc( ...); if(!big_buff) { perror("mallocing big buffer failed\n"); }See the man page for perror for more information and for the include files you need to add to your program.
// declare a global pthread_barrier_t var static pthread_barrier_t barrier; // initialize it somewhere before using it: if(pthread_barrier_init(&barrier, NULL, num_threads)){ perror("Error: with pthread barrier init\n"); exit(1); } // threads than can call pthread_barrier_wait to synchronize: ret = pthread_barrier_wait(&barrier); if(ret != 0 && ret != PTHREAD_BARRIER_SERIAL_THREAD) { perror("Error: can't wait on pthread barrier\n"); exit(1); }
Before running cs31handin: Make sure to run make clean to remove all executable files, and remove (or move) any extra or temporary files out of your submit directory. You can move files somewhere else using the mv command:
mv badfile ../. # move it up one directory mv gol1.c ../. # move a confusing duplicate .c file out of your handin # directoryOnce you are satisfied with your solution, hand it in by typing handin31 at the unix prompt.
Only one of you or your partner should run handin31 to submit your joint solutions If you accidentally both run it, send me email right away letting me know which of the two solutions I should keep and which I should discard (you don't want the grader to just guess which joint solution to grade).
You may run handin31 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize, after handing in some programs, that you'd like to make a few more changes to them.