CUDA Fractal Generation

From CS40: Computer Graphics
Andrew Danner, Computer Science Department, Swarthmore College
This lab will have you practice writing 2D CUDA kernels for generating fractal images, in particular the Julia Set. By experimenting with the grid and block dimensions of the kernel, you will see how these parameters effect performance on multiple GPU configurations.

Project Goals:

Getting Started
We will start with the demo provided with the CUDAVis library. For the moment, we are more interested in timing a static image, so we will modify the animate_julia function to use a single seed value and to not update the ticks for animating. A replacement function is given below with changes indicated.
static void animate_julia(uchar3 *devPtr, void *my_data) {

  my_cuda_data *data = (my_cuda_data *)my_data;
  dim3 blocks(data->size, data->size);

  float im = data->im; //CHANGED
  float re = data->re; //CHANGED
  GPUTimer timer;
  julia_kernel<<>>(devPtr, data->size, re, im);
  printf("Frame generation time: %7.2f ms\n", timer.elapsed());
As written, this kernel assigns an entire CUDA block to each pixel. Is this optimal, or can this be improved? You assignment is to modify the CUDA kernel julia_kernel or add additional CUDA kernels that can leverage both CUDA threads and block. Your kernels should not assume the total number of CUDA compute elements (threads/blocks) matches the number of pixels as the initial example assumes. You should handle the cases when some blocks or threads may need to process multiple pixels, as well as cases where some threads in a block may need to be idle while other threads are working.

After making modifications to your kernels, modify your animate_julia function to call your kernel appropriately.

Run experiments with your code to answer the following questions:
Once you have completed the basic modifications and answered the questions above, feel free to modify the program to explore extensions that interest you. Below are some possibilities.