CS31: Week 12

Create a week12 subdirectory in your weeklylab subdirectory and copy over some files:

    cd cs31/weeklylab
    pwd
    mkdir week12
    ls
    cd week12
    pwd
    cp ~kwebb/public/cs31/week12/* .
    ls
    Makefile  data_large.txt  data_small.txt  findmax.c

Then type make to build the executable files.

Finding the max, in parallel.

For this exercise, we'll write a C program that searches through a (potentially very large) array to find (1) the maximum data value and (2) the number of times that value appears in the data set. Since we're learning about threads, we'll divide up our data set into partitions and assign one thread to each partition. The steps we go through here will be similar to what you'll need to do when parallelizing the game of life for lab 9.

You can invoke the program with two arguments, the file name to use when reading data and the number of threads:

# Use the "data_small.txt" input file and divide it among four threads
$ ./findmax data_small.txt 4

The code to read in the input file, parse_file(), is provided for you. It returns an array of integers and sets the num_values variable so that you know how many data points you have. The provided code also sets num_threads to the value of the second parameter.

Thread accounting.

In some ways, creating threads resembles the way you created processes. For example, with processes, we used a pid_t to represent the process ID of the child process. Here, we'll use a pthread_t to represent a new thread. The exercise code allocates an array of pthread_t types, one for every thread. The thread library will fill in the details of the pthread_t when you call pthread_create. Thus, when you're creating thread N, give pthread_create the address of the Nth element of the pthread_t array, and it will fill in the thread's information so that you can reference that thread again later (e.g., join).

Also like processes, you may want to wait until a thread has terminated (and check it's status) before moving forward. For processes, we used waitpid(). For threads, we'll use pthread_join(), which takes the pthread_t of the thread you want to wait on. The pthread_join() function will also give you the thread's return value, if you want it (we don't care for this exercise).

Unlike fork(), with threads we don't create a clone of the current process. Instead, we give pthread_create the name of the function that we want it to execute in a separate thread. The function you tell it to execute must be of the form:

void *function(void *argument)

That is, the function must return a pointer (NULL is fine, if you don't care about returning anything), and it must accept exactly one pointer as an argument (NULL also fine here, if you don't want to pass anything in). The type of both pointers is the generic "void *", since the thread library has no idea what you're planning to do in the thread. This means we'll have to do some casting when we pass arguments.

Passing arguments.

Since pthread_create() only allows us to pass one argument to a thread (a pointer to something, we'll have to be creative if we want to start up a thread with more than one piece of information. The common solution is to pass a pointer to a struct, where the struct has a field for every piece of information you'd like the thread to have. The provided exercise code already defines a struct named thread_args and creates an array of those structs with one element per thread. When creating thread N, we'll fill in the parameters for thread N in the Nth element of the array. We can then pass pthread_create() a pointer to the Nth array element.

When the thread starts up, it can access the pointer argument to get access to the struct you passed it. Since the argument is declared as generic type "void *", you'll need to cast it to the type you know it is: a struct thread_args pointer. After doing that, you'll be able to use it's fields as you normally would with a struct pointer (i.e., the -> operator).

Your job...

Use pthread_create and pthread_join to create and reap threads, respectively. Initially, you can pass them an argument of NULL. We'll replace that will useful information (e.g., the partition the thread is responsible for) in the next step.

Partition up the input array so that each thread gets roughly the same number of values to check. You can add fields to the thread_args struct to let each thread know which indices of the array it's responsible for checking.

Write the (relatively simple) thread body code so that each thread will search its partition to find the maximum value and the number of times it appears.

Add synchronization (pthread_mutex_t's) and update the global_max and global_count results safely (no race conditions).

Bonus: Use another synchronization primitive, a barrier (pthread_barrier_t) to pause every thread when they're half-way through their partition, only to resume when EVERY thread has reached the half-way point. This is similar to what you'll do in the GOL, to keep every thread on the same simulation round.

Synchronization

To fully solve this exercise (and lab 9), you'll need to use two forms of synchronization: mutex locks and barriers. When using one of these primitives, each thread that wishes to use it must have a pointer to the same pthread_XXX_t.

pthread_mutex_t: a type that provides a mutual exclusion lock. You can initialize a lock with pthread_mutex_init() and clean it up with pthread_mutex_destroy(). A thread can acquire a lock with pthread_mutex_lock() and release it with pthread_mutex_unlock(). Only one thread may hold a lock at a time, and any attempt by a thread to acquire a lock that is already held will cause the caller to be blocked until the lock becomes free.

pthread_barrier_t: a type the provides a "thread barrier" such that every thread must reach the barrier before moving forward. This is useful when you need to prevent any one thread from getting too far ahead of the others (e.g., in the game of life, you want to keep every thread on the same round of the simulation). Like mutex locks, you must first initialize a barrier with pthread_barrier_init(), and you can destroy it when you're finished with it via pthread_barrier_destroy(). During initialization, you specify the number of threads that much reach the barrier before any are allowed to proceed beyond it. To use the barrier, threads call pthread_barrier_wait(), which will block every thread that calls it until the target number has been reached.

Game of Life, in parallel.

Link to Lab 9