CS87 Mini Lab3: OpenMP Matrix Multiply

Due: Friday Feb 12 before 11:59pm (in about 24 hours)
A Mini Lab is one that I anticipate that you can complete in a couple hours; finish or be close to finishing by the end of a Thursday lab session. The purpose of Mini labs are to introduce you to a parallel or distributed programming language/utility without having you solve a larger problem using the language/utility.

I give you only about 24 hours to complete a mini-lab because I want you to stop working on it and get back to focusing your effort on the regular lab assignment.

If you don't get a mini lab fully working, just submit what you tried. Mini Labs do not count very much towards your final grade, and not nearly as much as regular Labs--they are mini.
The are graded on a did you try it out or not scale.

Lab Details
For this mini lab, you will start with a sequential implementation of matrix multiply, and then use OpenMP to parallelize the code.

Parallel Matrix Multiply

The starting point code contains a complete implementation of sequential Matrix Multiply. The code runs some iterations of multiplying to matrices together. This is an example of a kernel benchmark program: it is likely not so useful as a stand-alone program, but instead implements a common sub-operation that might be part of larger parallel programs.

Your job in this lab is to use OpenMP to parallelize the code.

OpenMP

You do not need to learn an enormous amount of OpenMP to solve this problem. You will need to use the #pragma omp parallel to fork a set of threads to do something in parallel, and you will want to add a parallel for loop and maybe some synchronization.

You should be careful to stick with the fork-join, fork-join, fork-join model of OpenMP; don't do things in the parallel parts that are really not parallel or you will get some weird/unexpected behavior. Do not try to "optimize" your code by reducing fork-join blocks. You should, however, think about minimizing other parallel overheads as you design a solution; your goal is a solution designed such that there is a performance improvement from parallelization. If your 1 thread execution wins out over the multi-thread ones, think about how you can remove some parallel overhead (think of space/time trade-offs, think about synchronization costs, ...). Make sure you are comparing runs for large enough problem sizes (N and M) with enough iterations.

I encourage you to try different partitioning of all or some of the matrices and see if you get different timed results. For example, see if you can partition one or more matrices by rows or by columns across threads:

row                                  column
---                                  ------
1 1 1 1 1 1 1 1                      1 1 2 2 3 3 4 4 
1 1 1 1 1 1 1 1                      1 1 2 2 3 3 4 4 
2 2 2 2 2 2 2 2                      1 1 2 2 3 3 4 4 
2 2 2 2 2 2 2 2                      1 1 2 2 3 3 4 4 
3 3 3 3 3 3 3 3                      1 1 2 2 3 3 4 4 
3 3 3 3 3 3 3 3                      1 1 2 2 3 3 4 4 
4 4 4 4 4 4 4 4                      1 1 2 2 3 3 4 4 
4 4 4 4 4 4 4 4                      1 1 2 2 3 3 4 4 


Starting Point Code and Tips for Getting Started
  1. Both you and your partner should:
    1. Create a cs87/labs subdirectory on the CS system:
      mkdir cs87
      mkdir cs87/labs
      cd cs87/labs
      
    2. Get your LabO3 ssh-URL from the GitHub server for our class: CS87-s16
    3. On the CS system, cd into your cs87/labs subdirectory
    4. Clone a local copy of your shared repo in your private cs87/labs subdirectory:
      git clone [your_Lab03_URL]
      
      Then cd into your Lab03-you-partner subdirectory.
    If all was successful, you should see the following files when you run ls:
    Makefile README matrixmult.c
    
    If this didn't work, or for more detailed instructions on git see: the Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).

Starting Point files

Getting Started

  1. I suggest first trying out my simple openMP examples in my public directory:
    cp  -r ~newhall/public/cs87/openMP_examples .
    
  2. Then take a look at the matrixmult.c file, then try compiling and running it to understand what it does.
  3. Then try to add in some openMP code to parallelize parts of the matrix multiply program.

With the starting point code, the sizes of N and M are tiny and the DEBUG definition is on. This will print out matrices and debug info as the code runs. Once you have something working, comment out DEBUG and make N and M big and try some timed runs to see if you get performance improvements with your parallel solutions. For example:

time ./mm_par 1000 0
time ./mm_seq 1000 0
Note: these executables take at least two command line options, the first is the number of iterations, the second specifies row-wise or column-wise partioning, and an optional third takes a partitioning block size. The row/column-wise and the block-size options are there if you want to use them, you don't have to; it is to make the starting point code have a few more command line options that you can use if you'd like

Useful Functions and Resources
Submit
Before the Due Date, one of you or your partner should push your solution to github from one of your local repos to the GitHub remote repo. (it doesn't hurt if you both push, but the last pushed version before the due date is the one I will grade, so be careful that you are pushing the version you want to submit for grading):

From one of your local repos (in your ~you/cs87/labs/Lab0X-partner1-partner2 subdirectory)

git add *.c 
git commit
git push

If you have git problems, take a look at the "Troubleshooting" section of the Using git page.