• Both Parts Due 11:59pm Tuesday, February 26

1. Handy References

2. Lab 4 Goals:

  • Gain experience using pointers and dynamic memory allocation (malloc) in C.

  • Practice using gdb and valgrind to debug C memory bugs.

  • Convert C code to equivalent IA32 assembly instructions.

3. Part 1. C Pointers and Memory Allocation.

Experimental scientists often want to compute some simple statistical analyses over the set of experimental data. A useful tool would be a program that could compute these statistical results for any an arbitrarily sized data set (i.e. it would work for 10 or for 10 million data values without re-compilation).

For this lab, you will implement the program started in stats.c that takes a single command line argument, which is the name of a file of data values (floats, one per line), and computes and prints out a set of statistics about the data values.

3.1. Compiling stats.c

Use an editor to open the stats.c file. Notice that it includes the readfile library code that it links in as well as linking in the math library.

Compile stats.c file using the Makefile by running make. You can see how the executable is built from a .c, a .o, and explicitly linking in the math library (-lm). By including the math library, you gain access to the sqrt function, which you will need for your standard deviation calculations.

3.2. Structuring your code

stats.c also contains the prototype for the getvalues function that you need to implement and has some code to get the filename command line argument.

You should structure your program in the following way:

  1. Make a call to getvalues, passing in the filename of the file containing the data values, and passing in two pointers: the address of an int variable to store the size of the array (number of values read in) and the address of an int variable to store the total array capacity.

    The function will return an array of float values that stores the values read in from the file, or NULL on error (e.g., malloc() fails or the file cannot be opened).

  2. Compute the mean (average), the median (the middle value), the standard deviation of the set of values and print them out. Note: You will likely need to sort the values prior to computing the median. Luckily, you already wrote code to sort floats in lab 2!

  3. Print out the results, plus information about the number of values in the data set and the amount of unused capacity in the array storing the values.

The starting point code comes with two input files that you can use to test your solution:

./stats small.txt
Results:
--------
num values:         16
      mean:      1.812
    median:      1.500
   std dev:      1.031
unused array capacity: 4

Note: Much like you use \n to insert a new line, you can use \t to insert a tab character for pretty output formatting like this.

*./stats large.txt*
Results:
--------
num values:         94
      mean:      1.161
    median:      1.000
   std dev:      0.788
unused array capacity: 66

3.3. Statistic Functions

The statistics you need to compute on the set of values are the following:

  1. mean: the average of the set of values. For example, if the set is: 5, 6, 4, 2, 7, the mean is 4.8 (24.0/5).

  2. median: the middle value in the set of values. For example, if the set is: 5, 6, 4, 2, 7, the median value is 5 (2 and 4 are smaller and 6 and 7 are larger). If you have an odd number of values, the median is the middle value, as in the previous example. If you have an even number of values, use the average (arithmetic mean) of the two middle values. For example, if your numbers were 1, 3, 6, and 10, your median would be the average of the two middle values: (3+6) / 2 = 9 / 2 = 4.5.

  3. stddev: is given by the following formula:

    Standard Deviation

    Where N is the number of data values, xi is the ith data value, and X-bar is the mean of the values.

Feel free to discuss the math on Piazza or in person with your classmates. While we’re using it to ensure that your pointers are working correctly, the math itself is not the focus of this assignment..

3.4. Requirements

  • getvalues: The array of values must be dynamically allocated on the heap by calling malloc. You should start out allocating an array of 20 float values. As you read in values into the current array, if you run out of capacity:

    1. Call malloc to allocate space for a new array that is twice the size of the current full one.

    2. Copy values from the old full array to the new array (and make the new array the current one).

    3. Free the space allocated by the old array by calling free.

      When all of the data values have been read in from the file, the function should return the filled, dynamically allocated, array to the caller (the function’s return type is float *). The size and capacity of the array should be "passed" back to the caller via the pointer parameters.

      There are other ways to do this type of alloc and re-alloc in C. However, this is the way we want you to do it for this assignment: make sure you start out with a dynamically allocated array of 20 floats, then each time it fills up, allocate a new array of twice the current size, copy values from the old to new, and free the old.
  • For full credit, your program must be free of valgrind errors. Do not forget to close the file you opened! If you don’t, valgrind will likely report a memory leak.

  • Your code should be commented, modular, robust, and use meaningful variable and function names. This includes having a top-level comment describing your program and listing your and your partner’s names and the date. In addition, every function should include a brief description of its behavior. You should not use any global variables for this assignment.

  • It should be evident that you applied top-down design when constructing your submission (e.g., there are multiple functions, each with a specific, documented role).

  • You should not assume that we will test your code with the sample input files that have been provided.

3.5. Tips

  • Try getting getvalues to work without the re-allocation and copying part first (for fewer than 20 values). Once that works, then go back and get it to work for larger numbers of input values that require mallocing up new heap space, copying the old values to the new larger space, and freeing up the old space.

  • For increased precision, use the C double type to store and compute the mean and the square root.

  • See the lab 2 assignment for documentation about using the readfile library (and also look at the readfile.h comments for how to use its functions).

  • Take a look at the weekly lab code and in-class exercises to remind yourself about malloc, free, pointer variables, dereferencing pointer variables, dynamically allocated arrays, and passing pointers to functions.

4. Part 2. Writing a Swap Function in IA32.

In the file named swap.s finish the implementation of the a swap function that swaps two int values:

void swap(int *x, int *y);

This function takes two ints passed as pointers (the parameters point to the storage location of their argument variables). After the call, the two int variables passed to swap should have their values swapped. A call to swap would look like (see mainswap.c for another example):

 int a, b;
 a = 10;
 b = 8;
 printf("%d %d\n", a, b);  // prints: 10 8
 swap(&a, &b);  // swap the values stored in a and b
 printf("%d %d\n", a, b);  // prints: 8 10

The swap.s file already contains some IA32 instructions that set up up and tear down the stack frame associated with the swap function. You will add IA32 instructions to the body of this IA32 function to perform swapping operation.

In mainswap.c is code to test your swap implementation. Run make to build an executable, mainswap, that links in your swap.s solution and calls it. Use this to test your solution for correctness.

4.1. Requirements

  • As its name implies, your swap function should swap the values stored in the given input pointers.

  • Your code should access the function’s parameter values of x and y on the stack in the caller’s stack frame. They can be accessed relative to the %ebp pointer. You may find this stack diagram to be helpful. In the case of swap, the stack will look something like:

    esp ---->    # swap's stack frame:
    
    ebp ---->    # main's stack frame:
    ebp+0x8:   first parameter value (pointer to main's a variable)
    ebp+0xc:   second parameter value (pointer to main's b variable)
  • Briefly comment your IA32 instructions with what they are doing in terms of the C code. As an example:

movl 0x8(%ebp), %eax   # load x into register %eax
addl $1, %eax          # x + 1

4.2. Tips

  1. Start by figuring out what the C code version of swap would look like. Draw a picture of the stack, draw where parameter and argument values are and trace through the instruction execution to help you determine what IA32 instructions you need to use to implement swap.

  2. Implement and test incrementally. For example, first see if you can set the value pointed to by x to the value pointed to by y. Once that works, then see if you can complete the function.

  3. You can use gdb (or ddd) to debug your solution. Set a breakpoint in swap, and use disass, ni, and print $reg to see values as you go. If you want to see the value referred to by an address, you need to tell gdb what type the address points to. For example, if you want to see an int value at address 0x1234, do:

# re-cast address as an int pointer (int *), then dereference the pointer *
(gdb) *(int *)(0x1234)

5. Submitting

Please remove any debugging output prior to submitting.

To submit your code, simply commit your changes locally using git add and git commit. Then run git push while in your lab directory. Only one partner needs to run the final push, but make sure both partners have pulled and merged each others changes. See the section on Using a shared repo on the git help page.