CS21 Lab4: CUDA Fire Simulator

Due: Thursday Feb 25 before 1:00 am (late Wed night)

You will complete this lab with your lab 4 partner.

Project Introduction
For this assignment you and your partner will implement a forest fire simulator in CUDA. Your program will make use of a simple GPU animation library that will animate your fire simulation as it runs on the GPU.

Contents:

Programming in CUDA
Project Details
Project Requirements
Useful Functions and Resources
Submission and Demo

Ideas for Extra Extensions

Lab 4 Starting Point Repo
  1. Both you and your partner should cd into your labs subdir:
      cd cs87/labs
      
    1. Get your LabO4 ssh-URL from the GitHub server for our class: CS87-s16
    2. On the CS system, cd into your cs87/labs subdirectory
    3. Clone a local copy of your shared repo in your private cs87/labs subdirectory:
      git clone [your_Lab04_URL]
      
      Then cd into your Lab04-you-partner subdirectory.
    If all was successful, you should see the following files when you run ls:
    Makefile  README.md  firesimulator.cu  inputfire.txt  myopengllib.cu  myopengllib.h
    
    If this didn't work, or for more detailed instructions on git see: the Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).

CUDA Programming

Getting Started

You need to set the following environment variables to compile cuda code (you can add these to the bottom of your ~/.bashrc file):
# cuda:
export CUDA_HOME=/usr/local/cuda-7.5
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
I suggest starting by looking at and running two simple cuda examples:
cp -r ~newhall/public/cs87/cuda_example .
The first testgrid.cu allows you to select one of three cuda kernels to run. You can use it to test different grid, block and thread layouts, and "see" how they are mapped onto the 2D array.

The second simple.cu shows an example of how to write a CUDA program that uses myopengllib to simultaneously animate the computation on the GPU. You will use this library in your lab solution.

The header file, myopengllib.h, has comments about how to use the library. simple.cu is an example program that uses the library to animate a simple cuda kernel where values in a 2-D grid are cyclically updated. Note how it registers and animation and clean-up functions.

There are other sources of CUDA examples that you can investigate. See the Useful Resources section below.

The Programming Model in CUDA

The CUDA programming model consists of a global shared memory and a set of multi-thread blocks that run in parallel. CUDA has very limited support for synchronization (only threads in the same thread block can synchronize their actions). As a result, CUDA programs are often written as purely parallel CUDA kernels that are run on the GPU, where code running on the CPU implements the synchronization steps: CUDA programs often have alternating steps of parallel execution on the GPU and sequential on the CPU.

A typical CUDA program may look like:

  1. The initialization phase and GPU memory allocation and copy phase: CUDA memory allocated is allocated on the GPU by calling cudaMalloc. Often program data are initialized on the CPU in a CPU-side copy of the data in RAM, and then copied to the GPU using cudaMemcpy. GPU data can also be initialized on the GPU using a CUDA kernel, and then the cudaMemcpy does not need to be done. For example, initializing all elements in an array to 0 can be done very efficiently on the GPU.
  2. A main computation phase, that consists of one or more calls to cuda kernel functions. This could be a loop run on the CPU that makes calls to one or more CUDA kernels to perform sub-steps of the larger computation. Because there is almost no support for GPU thread synchronization, CUDA kernels usually implement the parallel parts of the computation and the CPU side the synchronization events. An embarrassingly parallel application could run as a single CUDA kernel call.
  3. There may be a sequential output phase where data are copied from the GPU to the CPU, using cudaMemcpy, and output in some form.
  4. A clean-up phase where CUDA and CPU memory is freed. cudaFree is used to free GPU memory allocated with cudaMalloc. Be sure to call cudaFree in any error handling code that cleans-up state and exits.
In CUDA, parallelism is expressed in terms of a number of multi-threaded parallel blocks running on the GPU. The programmer explicitly maps parallelism in terms of blocks and threads onto portions of the GPU data that each thread will access "simultaneously" in parallel. All array data in CUDA (on the GPU) are single-dimensional. However, the blocks and threads specification can be structured multi-dimensionally to better match the programmer's view of his/her program. For example, for programs that process 2-D arrays, the CUDA programmer often specifies a 2-D layout of blocks where a block's 2-D x, y position may better map onto the programmer's view of the data. This is not to say that there is always a 1-1 mapping of blocks and threads to underlying data elements. There are limits to the sizes of blocks and threads per block, which mean that for larger data, a single thread must access a range of the underlying array.

GPU functions in CUDA

__global__ functions are CUDA kernel functions: functions that are called from the CPU and run on the GPU. They are invoked using this syntax:
    my_kernel_func<<< blocks, threads >>>(args ...);
__device__ functions are those that can be called only from other __device__ functions or from __global__ functions. They are for good modular code design of the GPU-side code. They are called using a similar syntax as any C function call. For example:
__global__  my_kernel_function(int a, int *dev_array) {
  int offset = blockIdx.x + blockIdx.y*gridDim.y;  

  int max = findmax(a, dev_array[offset]);   
  ...

}
__device__ findmax(int a, int b) {
  if(a > b) { 
    return a; 
  } 
  return b;
}

Memory in CUDA

GPU memory needs to be explicitly allocated (cudaMalloc), if initial values for data are on CPU, then these need to be copied to GPU side data (cudaMemcpy), and explicitly freed (cudaFree). When you program in CUDA you need to think carefully about what is running on the CPU on data stored in RAM, and what is running on the GPU on data stored on the GPU. Memory allocated on the GPU (via cudaMalloc) stays on the GPU between kernel calls. If the CPU wants intermediate or final results, they have to be explicitly copied from the GPU to CPU.

In CUDA all arrays are 1-dimensional, so each parallel thread's location in the multi-dimensional thread blocks specifying the parallelism, needs to be explicitly mapped onto offsets into CUDA 1-dimensional arrays. Often times there is not a perfect 1-1 thread to data mapping and the programmer needs to handle this case to not try to access invalid memory locations beyond the bounds of an array (when there are more threads than data elements), or to ensure that every data element is processed (when there are fewer threads than data elements).

For this lab, you can assume that if you use a 2-D layout of blocks (dim3 blocks(DIM,DIM)) that there are enough GPU threads to each handle a single element in a 512x512 array.

Fire Simulator Lab Details
For this lab you will implement a discrete event simulator in CUDA. The discrete event simulation is similar to GOL so that you can focus more on the CUDA parts for this lab.

The forest fire simulator is a discrete event simulator of a 2-dimensional non-tours world, where each cell is either:

  1. part of a LAKE
  2. part of a forest that is UNBURNED
  3. part of a forest that is BURNING
  4. part of a forest that has already BURNED

In addition to a cell being in one of these different states, also associated with each cell is its temperature. A cell's temperature range depends on its state:

  1. 60 degrees for UNBURNED forest cells
  2. 300 to 1000 to 60 for a BURNING forest cell. A burning cell goes through increasing and decreasing temperatures phases. It starts at the ignition temperature of 300 degrees and increase up to a max of 1000 degrees. Once it reaches 1000 degrees its temperature starts decreasing back down to 60 degrees, at which point it becomes BURNED.
  3. X degrees for a BURNED cell: you can pick a temperature, but pick one that no UNBURNED or BURNING forest cell can ever be.
  4. Y degrees for a LAKE cell: you can pick a temperature, but pick one that no forest cell can be.
define and use constants for these and for sizes in your solution

Your simulator should take the following command line arguments (all are optional arguments):

./firesimulator {-i iters -d step -p prob | -f filename}
 -i iters     number of iterations to run
 -d step      rate at which a burning cell's temp increases or decrease each step
 -p prob      probability a cell will catch fire if one of its neighbors is burning 
 -f filename  read in configuration info from a file

Your program should using default values for any of values not given as command line arguments. Use 1,000 iterations, a step size of 20, and a probability of 0.2 as the default values.

Options -i, -d and -p are not compatible with -f. The file format is discussed below.

Initialize your world to some default configuration (unless the -f command line is given). The default should start a fire in the center of the world (just a single cell...like a lightning strike). It should also contain a couple lakes (contiguous regions of some size of lake cells).

After simulating the given number of steps your program should print out the cumulative GPU run time and exit. At each time step, a cell's state and/or temperature may change according to these rules:

  1. if a cell is a LAKE, it stays a LAKE
  2. if a cell is BURNED, it stays BURNED forever
  3. if a cell is UNBURNED, then it either starts on fire or stays UNBURNED.

    To decide if an UNBURNED cell starts on fire:

    1. look at the the state of its immediate neighbors to the north, south, east and west. The world is not a torus, so each cell has up to 4 neighbors, edge cells have only 2 or 3 neighbors.
    2. if at least one neighboring cell is on fire, then the cell will catch fire with a probability passed in on the command line (or use 10% as the default probability).
    if an UNBURNED cell changes state to BURNED, its new temperature jumps to 300 degrees F and its temperature will start increasing.
  4. if a cell is BURNING, then it burns at a constant rate for some number of time steps. However, its temperature first increases from 300 (the fire igniting temp) up to 1000 degrees, and then it decreases from 1000 back down to 60 degrees, at which point it becomes a BURNED cell.

    The rate at which its temperature increases or decreases is given by a command line argument -d, or use a default value of 20.

    A BURNING cell's state may change based on its new temperature: if its new temperature is <= 60, then this cell is now done burning and its state is now BURNED. Its temperature is set to the BURNED temperature value that you use.

Here are a few screen shots of a run: ./firesimulator -i 1000 -p 0.2 -d 20, showing a fire starting in the center and spreading to neighboring forest cells over time. In my simulator, unburned forest cells are green, burning forest cells are red, burned forest cells are black, and lake cells are blue and note that my very rectangular lakes do not burn (note: the openGL graphics display has point (0,0) in the lower left corner vs. your view of your program data where (0,0) is in the upper left corner, so things look rotated over a mid x-axis).

Input file format

If run with an input file (the -f command line option), the program configuration values are all read in from the file. The file's format should be:
line 1: number of iterations
line 2: step size
line 3: probability
line 4: the lightning strike cell (its (i,j) coordinates)
line 5: number of lakes
line 6-numlakes+6: lines of (i,j) coordinate pairs of the upper 
left corner and lower right corner of each rectangular lake
The lake coordinates are given in terms of the 2-D array of cell values that you initialize on the CPU. All cells specified in that rectangle should be lake cells, all others should be forest cells. For example:
800
40
0.3
250 400
2
20 30 50 70
100 60 120 110
This will run a simulation for 800 iterations, with a temperature step size of a 40 degree increase or decrease, and with a probability of 30%. It will start with an initial world containing 2 lakes one with upper left corner at (20,30) and lower right at (50,70), the other with upper left corner at (100,60) and lower right at (120, 110). All other cells will be UNBURNED forest cells, except cell (250,400) which will start as BURNING. It is fine if the lakes overlap; the lakes in the world from my example simulation would look less rectangular if I overlapped several lake rectangles.

Project Requirements


Useful Functions and Resources


Submit and Demo
Before the Due Date, one of you or your partner should push your solution to github from one of your local repos to the GitHub remote repo. (it doesn't hurt if you both push, but the last pushed version before the due date is the one I will grade, so be careful that you are pushing the version you want to submit for grading):

From one of your local repos (in your ~you/cs87/labs/Lab04-partner1-partner2 subdirectory)

git  *.cu 
git commit
git push

If you have git problems, take a look at the "Troubleshooting" section of the Using git page.

Demo

You and your partner will sign up for a 15 minute demo slot (outside my door) to demo your fire simulator. Think about, and practice, different scenarios to demonstrate both correctness and error handling.

Some ideas for Extra Extensions
These parts are not required, and do not try any of these until you have all the required functionality implemented and tested.

If you implement some extensions to the basic simulator, please do so in a separate .cu file and build a separate binary so that I can still easily test your solution to the required parts of the lab assignment.

Here are a few suggestions for some things to try to improve the simulation or the performance (you are also welcome to come up with your own ideas for extensions):