Qt5 Parallel Visualization Library

Getting Started | First Demo | openMP | pthreads | CUDA
Creating your own demos

Getting Started

It is recommended that you follow the install instructions first, to install the QtVis library. Once installed, you can then download the QTVis Demos repo, for examples on writing your own demos. While some demos are included directly in the library code and you can add your demos there, the separate demos repo simplifies the process by excluding all library code that you do not need to modify to create your own demo.
Overview
The DataVis class is a virtual C++ class that provides the primary interface for creating a new visualization. To write a new demo, a user will typically create a new class that inherits from the DataVis class and implement the update() method which must update a color buffer allocated by the DataVis class. Once this new derived class is implemented, the user simply connects the class to an QtViewer instance and then runs the viewer. As a demonstration of this overview, the file main.cpp shows the complete program for a example demo visualization called gradientVis which we will describe next.
#include <qtViewer.h>
#include "gradientVis.h"

int main(int argc, char *argv[]) {
  QTViewer viewer(argc, argv, 10, 10, "QtCPU");

  int width = 50;
  int height = 50;
  DataVis* vis = new GradientVis(width,height);
  viewer.setAnimation(vis);

  return viewer.run();
}
QtViewer
Using the QtViewer class is fairly boilerplate for any demo. The constructor
QTViewer(int argc, char* argv[], int w, int h, QString title="Demo");
requires the command line arguments needed, but typically not used in this application by Qt. The next two parameters are to specify the desired width and height of the application window. Note: currently, these parameters are not used and this feature is not yet implemented. An optional fifth string parameter specifies the title of the animation window.

Aside from the constructor, the QtViewer contains only two other public methods. The setAnimation(DataVis* vis) method, connects a user defined animation to the viewer. The viewer will repeatedly call the update() method of this animation to visualize the demo.

The run() method enters the primary graphical user interface (GUI) event loop of the viewer and start running the animation. This method only returns when the application quits.

A First Demo

While this library is primarily intended for visualizing parallel code, our first demo will illustrate a serial application that displays a scrolling vertical red/blue gradient. To do this, we first create a class gradientVis.h which inherits from DataVisCPU. DataVisCPU is itself derived from DataVis, but allocates space for the color buffer in the main memory of the computer. This class can be used for serial, pthreads, and OpenMP applications. A separate DataVisGPU class is used for CUDA applications where the color buffer resides in GPU memory.

The primary role of the GradientVis class, or any demo class is to implement the update() method. Our implementation will add one additional member variable m_ticks to keep track of how many times the update() method has been called. The resulting header file, gradientVis.h is quite small:

#pragma once

#include <dataVisCPU.h>

/* A single threaded gradient visualization */
class GradientVis: public DataVisCPU {

public:
  GradientVis(int w, int h);
  virtual ~GradientVis();

  void update();

private:
  int m_ticks;
};

The implementation of the GradientVis class is provided in gradientVis.cpp. The constructor simply needs to initialize the base class and m_ticks.

GradientVis::GradientVis(int w, int h) :
   DataVisCPU(w,h), m_ticks(0){
   /* do nothing */
};
There is no dynamic memory allocation in the derived class, so the destructor can be empty.
GradientVis::~GradientVis(){
  /* do nothing */
}

All the real work is in the update() method, shown below:

void GradientVis::update() {
  int off;
  unsigned char val;
  int c;
  for(int r=0; r<m_height; r++){
    for(c=0; c<m_width; c++){
      off = r*m_width+c;
      val = (unsigned char) (128. * r / m_height);
      val = (val+m_ticks)%128;
      m_image.buffer[off].r=val;
      m_image.buffer[off].g=0;
      m_image.buffer[off].b=128-val;
    }
  }
  m_ticks += 1;
}
The GradientVis class and its underlying base class DataVisCPU contains a color buffer of a size specified during construction. In our example above in main.cpp, this was set to be 50 by 50 pixels. These dimensions are exposed by protected member variables in the DataVis base class as m_height and m_width and can be used by our update() method. Additionally, each DataVis object stores an ImageBuffer object, m_image, which is a simple struct containing the dimensions of the color buffer and the buffer itself, which is an array of m_height*m_width color3 objects stored in row major order. Row 0 is the bottom of the image. The individual red, green, and blue components of each pixel can be accessed with the r,g and b variables in the color3 struct. Each component is an unsigned char in the range 0 to 255.

Our initial demo uses the row number to compute the color and the rotates the color through the rows using the m_ticks variable to animate the scene.

Building with CMake

With an implementation of the update() method populating the m_image color buffer in GradientVis and this class connected to the QTViewer in main.cpp, we are ready to build and test.

For this step, we are assuming you have built and installed the QtVis library. If this is the case, compiling a new demo is fairly easy. We just ask CMake to find the QtVis library, then compile our source and link against the library and all of its dependencies. The demos file CMakeLists.txt has a more complicated example for supporting multiple demos and multiple, optional, parallel frameworks. For this simple demo, the following CMakeLists.txt should suffice.

cmake_minimum_required(VERSION 3.11)

project(qtdemos LANGUAGES CXX)

find_package(QtVis REQUIRED)

#Add user generated files here and name of executable
add_executable(cpuDemo main.cpp gradientVis.cpp)

#Link executable to vis library
target_link_libraries(cpuDemo QtVis::qtvis)

The first few lines are boilerplate. If you are not using CUDA, you may be able to use a slightly earlier version of CMake, but it would probably need to be at least version 3.5. You can change the project name to anything you would like.

The primary step needed for your demo is an add_executable(...) step that specifies the desired name of the compiled executable and the list of source files that need to be compiled, in our case, main.cpp and gradientVis.cpp. This command will generate the needed commands to compile your source.

The final step is to link your compiled source to third party libraries. Again, we specify the name of the executable (matching the line above) and then the list of libraries. By including the QtVis::qtvis name, this will include the qtvis library and all its dependencies, so we do not need to list them explicitly.

An OpenMP Demo

Having seen the single threaded demo above, we can use the same general approach for OpenMP demos. The three sample files for this example are available in the qtogl-demos/openMP folder. openMPDemo.cpp is almost identical to main.cpp in the previous demo except that we replace the GradientVis class and files with a new OpenMPVis class and header files, highlighting that minimal changes are needed to switch to a new parallel framework in the main function for most parallel frameworks (CUDA has an extra step described below).

openMPVis.h follows the same setup as gradientVis.h in inheriting from DataVisCPU and adding an m_ticks member variable. The constructor and destructor implementation in openMPVis.cpp also follow the same pattern. The only difference in update() is the addition of the line

#pragma omp parallel for private(c, off, val)
in front of the outer for loop. This line instructs the compiler to parallelize the loop over rows. By default, OpenMP will only make private copies of the loop variable for each thread. But in this example, each parallel thread will need a private copy of c, off and val, so we add the directive private(c, off, val). As a fun example of how this library can be used a debugging tool, try leaving this private(...) option off, and see how the output changes.

If the QtVis was compiled with OpenMP support, the CMakeLists.txt file is similar to example above. We simply need to change the name of the executable and source files.

#...header same as single threaded example...

add_executable(openMPDemo openMPDemo.cpp openMPVis.cpp)
target_link_libraries(openMPDemo QtVis::qtvis)

A pthreads Demo

Our final demo that uses the DataVisCPU for image buffers stored in main memory is a pthreads example available in the qtogl-demos/pthreads folder. threadDemo.cpp is almost identical to main.cpp in the previous demo except that we replace the GradientVis class and files with a new PThreadVis class and header files. The PThreadVis constructor takes the number of threads to use as the first input parameter.
DataVis* vis = new PThreadVis(2, width, height);
In this demo, each thread will create its own vertical red to blue gradient in a portion of the overall image buffer, so you can see how many threads are running when you execute the demo.
A more complicated visualization header file

As we move to the phtreadsVis.h header file, we begin to see a little more complexity than the previous examples. We still inherit from DataVisCPU, but there are more member variables to maintain. For each thread, we will keep track of its handle through the pthread_t* m_threads array, and any info needed by each thread is stored in a separate threadInfo* m_tinfo. We also keep track of the total number of threads as int m_numThreads. To coordinate the writing of a single buffer by all thread, we use a pthread_barrier_t m_barrier.

The pthread_barrier_t is not provided by OSX implementations of pthreads, but it is provided by the QtVis library if you want to use the library on OSX. Our complete header file phtreadsVis.h for linux and OSX is shown below:

#pragma once

#include <dataVisCPU.h>
#include <pthread.h>

#ifdef __APPLE__
#include <osx/pthread_barrier.h>
#endif

typedef struct {
  int nThreads;
  int id;
  ImageBuffer* img;
  pthread_barrier_t* barrier;
} threadInfo;

/* A PThreads Demo */
class PThreadVis: public DataVisCPU {

public:
  PThreadVis(int numThreads, int w, int h);
  virtual ~PThreadVis();

  void update();

private:
  int m_numThreads;
  pthread_t* m_threads;
  threadInfo* m_tinfo;
  pthread_barrier_t m_barrier;

};
Since each thread will need access to the barrier and the image buffer, we make these elements part of the threadInfo and show how they are populated below.
Implementing the PThreadVis constructor
The implementation of the PThreadVis is a little more complicated than the previous examples. Ultimately, we still want to write to the ImageBuffer in parallel, but since we are using threads, the setup is a bit different. The constructor first initializes the base class DataVisCPU as in the previous examples. It then allocates space for the m_threads an m_tinfo arrays. Each thread will have acess to the same ImageBuffer and pthread_barrier_t through pointers to the shared objects. Ignore the synchronization initialization for the moment. The for loop creates each of the threads using pthread_create, passing the name of a locally defined function void *threadUpdate(void* info). As we will see soon, this function replaces most of update() method in the previous demos.

Since the m_threads an m_tinfo arrays are allocated dynamically in the constructor, we delete them in the destructor.

PThreadVis::PThreadVis(int numThreads, int w, int h) :
   DataVisCPU(w,h), m_numThreads(numThreads),
   m_threads(nullptr), m_tinfo(nullptr) {
  int i;
  m_threads = new pthread_t[m_numThreads];
  m_tinfo = new threadInfo[m_numThreads];
  m_tinfo[0].nThreads=m_numThreads;
  m_tinfo[0].img=&m_image;
  m_tinfo[0].barrier=&m_barrier;
  pthread_barrier_init(&m_barrier, nullptr, m_numThreads+1);
  for(i=0;i<m_numThreads;i++){
    m_tinfo[i]=m_tinfo[0];
    m_tinfo[i].id=i;
    pthread_create(&m_threads[i], nullptr, threadUpdate, (void*)&m_tinfo[i]);
  }
};

PThreadVis::~PThreadVis(){
  pthread_barrier_wait(&m_barrier);
  pthread_barrier_destroy(&m_barrier);
  delete [] m_threads; m_threads=nullptr;
  delete [] m_tinfo; m_tinfo=nullptr;
}

Once created, each thread independently executes the threadUpdate function described below.

threadUpdate
Each thread in this demo lives for the entire lifetime of the application. In this particular demo, each thread is writing to its own subsection of the image in each update step, so a thread does no need to coordinate with other threads on where to write. However, since a thread cannot update the next image until all other threads have finished writing and the QTViewer class calls the update() method again, each thread must coordinate with others on when to write. We describe how each thread updates its own subimage first and then describe how to handle synchronization with pthread barriers.

threadUpdate receives the information it needs through the void* info, which was originally a threadInfo pointer during the pthread_create call in the constructor. We begin by casting the info struct and extracting the needed information. We use the image size, the total number of threads, and the current thread id to compute which rows of the images this thread should process. This information is stored in the variables rowstart and rowstop.

void *threadUpdate(void* info){
  threadInfo* tinfo = (threadInfo*) info;
  int off;
  int w,h;
  int rowstart, rowstop, maxrows;
  int ticks = 0;
  unsigned char val;
  w= tinfo->img->width;
  h= tinfo->img->height;

  maxrows = h/tinfo->nThreads;
  if(h%tinfo->nThreads > 0) { maxrows++; }
  rowstart=maxrows*tinfo->id;
  rowstop=rowstart+maxrows;
  if(rowstop > h) { rowstop = h; }

  /* do stuff */
We now update only the rows between rowstart and rowstop similar to the approach used in the two previous examples.
  /* update subimage */
  for(int r=rowstart; r<rowstop; r++){
    for(int c=0; c<w; c++){
      off = r*w+c;
      val = (unsigned char) (128. * r /maxrows);
      val = (val+ticks)%128;
      tinfo->img->buffer[off].r=val;
      tinfo->img->buffer[off].g=0;
      tinfo->img->buffer[off].b=128-val;
    }
  }
After completing the update of this subimage, this thread must wait until other threads have completed their updates and the update() is called again to refresh the entire image. We can coordinate most of this with a pthread barrier in our threadUpdate function by adding a barrier wait inside a while loop as follows
while(true){
  /* update subimage as above */
  /* wait until next update time */
  pthread_barrier_wait(tinfo->barrier);
  /* increment local tick counter */
  ticks++;
}
This completes the threadUpdate implementation, but how do we coordinate the writing of subimages by other threads with the primary QTViewer update loop? If we are careful, we can use the same barrier as shown below.
Coordinating with the update loop
The threads running threadUpdate(...) are created during the construction of the PThreadVis, but the primary event loop of the QTViewer class was created outside of this context. How can we use this code to implement the virtual method update() and coordinate a call to update with the writing to the ImageBuffer by all the concurrent threads? We can use the same m_barrier created in constructor! A barrier blocks execution until a certain number of threads have called wait on the barrier. By adding one to the number of threads created in the constructor and executing threadUpdate, we can synchronize the worker threads with the main update() loop. We initialize our barrier in the constructor prior to creating the threads.
pthread_barrier_init(&m_barrier, nullptr, m_numThreads+1);
for(...){
 ...
}
Since only m_numThreads are calling wait in threadUpdate(...), we add an additional wait call inside our implementation of update() to sync the update loop with the worker threads. This is the only line we need in update() as threadUpdate(...) does all the image writing.
void PThreadVis::update() {
  pthread_barrier_wait(&m_barrier);
}

We can clean up the barrier in the destructor with

pthread_barrier_wait(&m_barrier); /* one last wait sync with workers */
pthread_barrier_destroy(&m_barrier);

While considerably more complicated than the single threaded or OpenMP examples, the pthreads example outline can be readily modified for other pthreads demos by changing the threadInfo struct and the threadUpdate(...) function for your particular application. The use of the barrier for proper synchronization should be a common pattern for a wide range of applications.

Building a pthreads example

If the QtVis was compiled with pthreads support, the CMakeLists.txt file is similar to examples above. We simply need to change the name of the executable and source files.

#... header same as the examples above ...

add_executable(threadDemo threadDemo.cpp pthreadVis.cpp)
target_link_libraries(threadDemo QtVis::qtvis)

A CUDA Demo

The code in the qtogl-demos/cuda folder illustrates how to use the QtVis library for CUDA animations. cudaDemo.cpp has the same general structure as the other main(...) functions above, but there are a few extra steps required for CUDA. First, in CUDA, the color buffer data will be stored directly on the GPU, avoiding the overhead of regularly transferring data between the CPU and GPU. We will be using a new derived DataVis class called DataVisGPU instead of the DataVisCPU of previous examples to build our CUDA applications.

Additionally, because CUDA code is compiled with separate compiler options, we occasionally need to be careful when combining CUDA code with other complex C++ code like that found in Qt. For this reason, we introduce a new special virtual class Animator that contains one pure virtual method virtual void update(ImageBuffer* img). Instead of users writing a new class that derives directly from DataVisGPU class as was the case in previous examples, CUDA users will instead write a class that derives from Animator and connect this class to a non-virtual DataVisGPU instance.

The QTViewer class deletes the DataVis object upon exiting, but neither QTViewer nor DataVisCUDA delete the Animator instance, so the user is responsible for this final cleanup. The new main() program is shown below for a ripple animation we will describe next.

#include "rippleKernel.h"
#include <qtViewer.h>
#include <dataVisCUDA.h> /* for CUDA applications */

int main(int argc, char *argv[]) {
  QTViewer viewer(argc, argv, 10, 10, "QtCUDA");
  int width = 800;
  int height = 800;

  /* Note: DataVisCUDA instead of DataVis or DataVisCPU */
  DataVisCUDA* vis = new DataVisCUDA(width,height);

  /* Note: create animator and connect to DataVisCUDA object */
  Animator* kern = new RippleKernel();
  vis->setAnimator(kern);

  /* Same as other demos */
  viewer.setAnimation(vis);
  int res = viewer.run();

  /* Cleanup */
  /* The viewer only returns once the OpenGL context has been destroyed
     currently, the viewer will delete the vis object on the user behalf,
     since attempting to delete it after the OpenGL context has been destroyed
     causes problems. For CUDA users, however, the Animator is not
     automatically destroyed and must be cleaned up manually */
  delete kern; kern=nullptr;
  return res;
}
Creating an Animator
The primary user responsibility in creating a new CUDA animation is writing a new class that implements the update(ImageBuffer* img) method of the virtual Animator class. In this example, we will create a greyscale ripple effect centered on the image in the class RippleKernel. The header file rippleKernel.h is a straightforward subclass of Animator and as was the case in the serial and OpenMP examples, we add an m_ticks member variable to keep track of the number of times update() has been called in the DataVis class by the QTViewer object. Since the constructor and destructor are trivial, we provide their implementation in the header file.
#pragma once

#include <animator.h>
#include <cuda.h>

class RippleKernel: public Animator {

  public:
    RippleKernel(): m_ticks(0){ };
   ~RippleKernel(){ /* do nothing */};

    void update(ImageBuffer* img);

  private:

    int m_ticks; /* number of timesteps in animation */
};
The only thing left to do is provide an implementation of the update(ImageBuffer* img) in rippleKernel.cu that uses CUDA to populate the color buffer.

Given a pointer to the color buffer in GPU memory, the dimensions of the image, and the number of ticks, the CUDA kernel can execute in parallel with the following code (Example originally from section 5.2.2 of CUDA by Example by Sanders and Kandrot):

__global__ void kernel(color3 *ptr, int w, int h, int ticks) {
  // map from threadIdx/BlockIdx to pixel position
  int x = threadIdx.x + blockIdx.x * blockDim.x;
  int y = threadIdx.y + blockIdx.y * blockDim.y;
  int offset = x + y * w;

  // compute distance from center of image
  float fx = x - w / 2;
  float fy = y - h / 2;
  float d = sqrtf(fx * fx + fy * fy);

  // use distance to modulate grey value intensity
  unsigned char grey = (unsigned char)(128.0f +
                                       127.0f * cos(d / 10.0f - ticks / 7.0f) /
                                           (d / 10.0f + 1.0f));
  if(x<w && y<h){
    ptr[offset].r = grey;
    ptr[offset].g = grey;
    ptr[offset].b = grey;
  }
}

The implementation of update(ImageBuffer* img) simply needs to unpack the necessary information from img and call the kernel on each update. The DataVisCUDA class automatically allocates the color buffer on the GPU and stores the GPU pointer in img->buffer. The image dimensions are also part of the ImageBuffer struct. A full implementation of update is below:

void RippleKernel::update(ImageBuffer* img) {
  int tdim = 8; // number of threads in x/y direction per block

  int w = img->width;
  int h = img->height;

  /* set up grid dimension */
  dim3 blocks((w+(tdim-1)) / tdim, (h+(tdim-1)) / tdim);
  /* set up block dimension */
  dim3 threads_block(tdim, tdim);

  /* call the CUDA kernel with grid dimension */
  kernel<<<blocks, threads_block>>>(img->buffer, w, h, m_ticks);

  /* step size controls speed of animation */
  m_ticks += 2;
}
Building a CUDA example
Building a CUDA example with CMake requires a little extra work as the CUDA specific code needs to be compiled separately from Qt code. In CMake 3.10.3+ though, compiling with CUDA is easier as CUDA is a supported first class language and you can specify that you want CUDA support by listing it as a language in your project description
project(qtdemos LANGUAGES CXX CUDA)
Now we can compile CUDA simply by specifying the source files with a .cu extension. Since this CUDA code depends on some includes and libraries that are part of the QtVis, we note this dependency with the target_link_libraries command.
add_library(cudademos rippleKernel.cu)
target_link_libraries(cudademos QtVis::qtviscuda)
This creates a small library for our cudademos. The final step is to add our main executable as in the previous examples and link it to our new library along with the primary QtVis library.
add_executable(cudaDemo cudaDemo.cpp)
target_link_libraries(cudaDemo cudademos QtVis::qtvis)
If you know you have the QtVis library compiled with CUDA support, your CMakeLists.txt can be simply:
cmake_minimum_required(VERSION 3.11)
project(qtdemos LANGUAGES CXX CUDA)
find_package(QtVis REQUIRED)

add_library(cudademos rippleKernel.cu)
target_link_libraries(cudademos QtVis::qtviscuda)

add_executable(cudaDemo cudaDemo.cpp)
target_link_libraries(cudaDemo cudademos QtVis::qtvis)

More demos

There are more CUDA demos in the repo, including more complex examples of using auxiliary buffers for storing additional GPU data beyond color information in userBufferKernel and a fractal example in juliaKernel. If you have a nice demo you'd like to share or add, let me know.

Building with Makefiles

VERBOSE=1 make
then do some stuff... TODO