CS31 Lab 6: Extra Challenge
This challenge isn't for any extra points, but once you complete your implementation, see if you can tune your program's performance to run faster than mine.

Try running a large sized board for a large number of iterations. For example:

1024
1024
1000
3
9 8
9 9
9 10
My program (compiled with -g -Wall), took about 37 seconds to run on the CS lab machine named celery:
./gol startfile.txt 0
total time for 1000 iterations of 1024x1024 is 37.603214 secs
Note: always run timing experiments with command line option 0 that does not include the animation part (i.e. there is no clear, printing the board, nor usleep between iterations). This experiment was run on 11/7/17.

If you program is much slower, then try to improve your program's performance to see if you can get closer to mine, or beat it.

First, copy your existing solution to a new file named faster.c, so that you don't lose your correct, working gol solution in gol.c:

cp gol.c faster.c
Then try to improve the performance of the version that you have in faster.c (leave gol.c as your first working version). If yours is much slower than mine, think about your program's memory usage and see if you can improve the time by improving how it uses memory.

When you change your code you need to ensure that it still solves the correct problem. Verify that your changes still correctly implement gol by running the faster.c version on a small example with printing enabled.

Running Experiments

When timing code you always want to run large enough experiments to compare times (1 second vs. 2 seconds doesn't necessarily say anything about the two run times). You also want to time runs without output statements (printfs) and without calls to sleep. You should run each experiment on the same machine, and run multiple times to see check if there is a lot of variation between the measured run times of the same size. If there is a lot of variation, then you would want to try to explain why. For this problem there should be very little variation between the time of different runs of the same size and number of iterations. If you see an unusually long run, it is due to something else being run on the machine while you are timing your program.

To ensure that you are running experiments without interference from other processes using up a lot of CPU and/or RAM, you can see what else is going on using these commands:

The who command will list who else is logged into a machine. The top command will show cpu and memory usage of processes running on a machine. Here is a lot more information about tools for examining system state.

Using celery: To keep celery free as much as possible for other groups trying to do final timing to beat me, please first run timed runs on another machine and see how close you are getting.

You can, and should, do almost all your timing on any machine, just to see if you beat mine, you should run yours on the same machine as I ran my tests.

You can find the specs of different CS lab machines off the CS department lab help page, here: lab machine specs

bash for loop: At the bash shell prompt (bash is the name of the Unix shell program) you can write a bash loop to tell bash to repeat an action some number of times. The syntax is almost like a C for loop except do and done are in place of { and }, and you need double parens. Here is an example for loop to run gol on the same input 5 times (hit the enter key after each line, the "$" is the bash prompt):

$ for ((i=0; i< 5; i++))
> do
>  ./faster startfile.txt 0
> done
time command: You can also run gol in the time command to time the entire computation time (the above is from my gettimeofday timers in my code that do not include board initialization time):
$ time ./faster startfile.txt 0
total time for 1000 iterations of 1024x1024 is 6.397987 secs

real    0m6.402s
user    0m6.400s
sys     0m0.000s
real is wall time (total time), user and system are the portion of time the process spent running user-level code and operating system-level code. I may talk a bit about what these mean, but if you take the Operating Systems course, then these two times will make a lot more sense.

My times

Here are runtimes for different sized boards and interations of my solution run on celery (times for each are the average of 5 runs):
# these are timed runs compiled with gcc flags -g -Wall
#  gcc -g -Wall -o gol gol.c
#
total time for 500 iterations of 500x500 is  ~4.5 secs
total time for 500 iterations of 1000x1000 is ~17.8 secs
total time for 500 iterations of 2000x2000 is ~71.8 secs
total time for 500 iterations of 4000x4000 is ~287.2 secs
In these experiements the number of iterations is the same for different size boards. These show how the runtime grows with the problem size: this shows linear increase in time with increase in problem size (each 4 times increase in the problem size results in about a 4 times increase in execution time). If you think about the complexity of GOL, this is what you would expect--it is an 0(n) algorithm (where n is the number of grid cells) for a fixed number of iterations.

Compiler Optimization

Make sure you compare a version of your code that is compiled using the same compiler flags as used in the times shown (-g in the above times). You can often greatly improve your program's execution time by turning on compiler optimization during compilation. For example, if I compile my code without debugging information (no -g flag) and with the highest level of compiler optimization (-O3), I see a huge increase in its performance just from the compiler optimization. For example, here is my time running on celery for for my optimized gol program (this is the exact same C code as the results above, the only difference is compiling with -03):
faster startfile.txt 0
total time for 1000 iterations of 1024x1024 is 6.249315 secs

My times for compiler optimized version

Here are my times for my optimized gol program (it is the exact same C code in both sets of results):
# these are timed runs of my program compiled with the gcc flag -O3
#  gcc -O3 -o gol gol.c
#
total time for 500 iterations of 500x500 is ~0.7 secs
total time for 500 iterations of 1000x1000 is ~2.9 secs
total time for 500 iterations of 2000x2000 is ~12.0 secs
total time for 500 iterations of 4000x4000 is ~48.2 secs
Just compiling the code with -03 compiler optimization flag, results in code that is roughly 6 times faster than the version that is compiled with -g.

In general, you want to do development with the -g flag so that you can easily debug your program. The -g flag and the -O flags are not compatible, and -g wins. If you are running experiments on code you have already debugged and tested, then you may want to enable compiler optimization to improve its runtime.

handin

If you improve your faster.c version, add, commit and push it to your git repo and also add a QUESTIONNAIRE file telling me how you improved its performance and how well it performed (some results). To compare your faster.c to your gol.c solution, you do not need to run on celery, just run both on the same machine. If your solution is significantly faster than mine, then make sure to tell me that in your README file (and let me know in person or via email too).