Checkpoint for Weekly Lab 2 (Sept 16): By the time you come to lab, your EmployeeList and EmployeeNode classes should be implemented and passing all provided tests, including checking for memory access errors and memory leaks.

1. Overview

Due by 11:59 p.m., Saturday, September 19, 2020.

The task for this week’s lab: given employee data from two separate files, combine the contents into a sorted list and output the results to file. You will implement a linked-list data structure to maintain your sorted list of employee data.

While this lab serves primarily as a warm-up exercise and reminder of C++ programming, we will also introduce new concepts. The learning objectives for this assignment:

  • implement a data structure (linked lists) using low-level memory management

  • reacquaint students to programming in C/C++

  • introduce binary file I/O

  • manage heap memory, including using valgrind to debug memory errors

  • introduce the const keyword and review C-style strings (char *)

1.1. Getting Started

If you have not already done so, first create a course directory for this course, and add a lab subdirectory for your lab repos:

mkdir -p cs44
mkdir -p cs44/labs
cd cs44/labs

We will be using git repos hosted on the college’s GitHub server for labs in this class. If you have not used git or the college’s GitHub server before, here are some detailed instructions on using git for CS44 labs.

Next find your git repo for this lab assignment off the GitHub server for our class: cs44-f20

Clone your git repo (Lab1-userID1-userID2) containing starting point files into your labs directory:

cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab1-userID1-userID2

If all was successful, you should see the following files (highlighted files require modification):

  • Makefile - pre-defined. You may edit this file to add extra source files or execution commands.

  • sortEmployees.cpp - your main program for reading and sorting employee data.

  • employee.[cpp/h] - the implementation of an EmployeeList class that will represent a list of sorted employees in a linked list.

  • testEmployeeList.cpp - a test file for your EmployeeList class. You will need to add more tests in here to ensure your program is correct.

  • input/ - directory of sample input files in either text format (.txt extension) or binary (.dat).

1.2. Deliverables

The following will be evaluated for your lab grade:

2. Linked List Implementation

In employee.cpp/h, you will implement an EmployeeList and related EmployeeNode class to manage a sorted linked list of employee data. The data will be sorted based on the name field of EmployeeNode. For both classes, you may add private methods, but you may not add public methods or any additional data members (private or public).

2.1. EmployeeNode class

The data members and public methods have been provided. Begin by implementing the EmployeeNode class in employee.cpp. This class represents one employee as a node in the linked list:

  • name is a C-style string containing the employees name.

  • salary is an integer.

  • next is a pointer to the next EmployeeNode in the linked list.

The constructor should initialize all data members (e.g., set pointers to nullptr) and the destructor should deallocate name (but only if it is not null).

Each of the above data members has a corresponding setter and getter method. Note that c-style strings are arrays of characters, so you will need to allocate and deallocate memory. See the section on c-strings below for details.

2.2. EmployeeList class

EmployeeList is a linked list with a limited interface — there is only one type of method for inserting and no methods for removing. Additionally, there is only one data member for the linked list: the head of the list.

You will need to implement the class methods:

  • A constructor that initializes head to nullptr.

  • A destructor that deletes each EmployeeNode in the list.

  • insert() adds a new employee to the list such that the resulting list is in sorted order by the name field. In other words, you are performing insertion sort. Your method should create an EmployeeNode with the given information and traverse the list to find the correct spot to insert the new EmployeeNode.

  • getNumEmployees() and getAverageSalary() must traverse the linked list to accumulate the respective statistics.

  • writeToFile() writes the entire EmployeeList, in sorted order, to the binary file specified by the parameter filename. See details on the structure of the binary files and writing to binary files. The file output should mirror the structure of the file that is read in by readFromFile() in the main program. Return 0 on success and a non-zero error value on failure (e.g., illegal file name). Be sure to specify the meaning of error values in the function comments.

  • print() is useful for debugging; it output the contents of the list in a formatted table.

2.3. Developing and Testing

We have provided a few tests in testEmployeeList.cpp. To begin your lab, you should first look at these tests to understand the usage for EmployeeNode and EmployeeList. Next, in employee.cpp, finish creating function stubs for all methods not already stubbed out for you. Be sure to include (and understand) the function comments that explain the usage of each method. When this is complete, you should be able to make and run the tests.

$ make testEmployeeList
$ ./testEmployeeList

While your program will compile, it will immediately fail since you have not implemented any methods. Use incremental development to implement one method at a time, starting with the EmployeeNode class. Note that the provided tests are not unit tests - they test the full functionality of the EmployeeNode and EmployeeList. You will need to add your own tests to help verify, and debug, your implementation.

Also note that the provided tests are not complete; while passing them is a good indication, most of the tests require you to manually verify the printed output is correct. In addition, it does not test all corner cases for a linked list and nor does it test your writeToFile method. Do not move on to the main program until you are fully satisfied with your test suite and confident in the correctness of your solution. Also be sure that valgrind on your tests does not find any memory errors or leaks.

3. Main program

In sortEmployees.cpp, you will write a main program that, at a high level:

  • reads in employee data from two unsorted binary files

  • combines the data from two files together in a sorted list of employees

  • write the resulting sorted list of employee data to a binary file.

  • prints formatted information to standard output as it makes progress (see example runs).

3.1. General Requirements

  • All of your code should demonstrate good design. Your main program should written in a top-down, modular fashion.

  • Each function should have a single, clear purpose.

  • Each function should have a comment explaining its purpose as well as parameters, return values, and error conditions.

  • Your program must be free of memory-access errors. This includes both memory leaks and out of bounds memory access. Recall that C++ does not report this to you; you need to test your code and use tools such as valgrind and gdb. See Compiling, Debugging and Linking Tips for a reminder on how to use these tools.

3.2. Usage and Sample Runs

$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat

Your program takes in three arguments: two binary files to read and the name of the file to write the resulting list. Incorrect usage should result in an error message e.g.,

$ ./sortEmployees
usage: sortEmployees file1 file2 resultfile

See here for the expected output.

3.3. Reading the input files

In sortEmployees.cpp, you must implement a function to read from binary files:

int readFromFile (char *filename, EmployeeList *list)

This function reads in employee records from the binary file specified by the parameter filename and adds each employee record list. Return 0 on success and any non-zero error value on failure (there may be multiple reasons for failure; in the function comments, be sure to specify the error associated with each value you return). See Reading Binary Files for tips on how to complete this function

3.4. Formatted Output

Your program should use formatted output (e.g., printf) to display the employee data as it is read from the input files (i.e, in readFromFile()). Your program should finish by displaying the sorted list of employee records. See these example runs.

4. Binary File Format

You are used to reading files that are in text format - all data is stored as ASCII characters regardless of whether it is actually a number or character. Alternatively, we could store the data in a format more native to computing - as binary. You cannot open and read these files in a text editor (because they aren’t text). But they can be much more predicable in terms of how data is formatted.

4.1. Employee Data Files

Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has three fields: the length of the name (4 byte integer), the actual name (variable length), and a salary (4 byte integer). The format of an employee record in the binary file is as follows:

  ----------------------------------------------------------------------------
 |    4 byte integer     |     N character string     |    4 byte integer     |
 | Number of characters  |        Employee name       |    Employee salary    |
 | in employee name (N)  |    (not null-terminated)   |                       |
  ----------------------------------------------------------------------------

4.2. Reading binary files

To read integers and character strings from a binary file you can use the following code as an example:

  //open file specified in filename. Second parameter specifies that
  // the file is for input and is in binary format
  std::fstream infile(filename, ios::in | ios::binary);
  int nameLen;
  char *name;

  // read 4 bytes into memory location of nameLen
  // read assumes everything is char, so we will here we typecast
  // the address of nameLen (the destination) as a char *
  infile.read((char*)&nameLen, sizeof(int));

  //Allocate space and read characters for string
  name = new char[nameLen+1];
  infile.read(name, nameLen);
  // TODO: remember to null terminate strings with '\0'
  // TODO: add code to finish reading file
  infile.close();

You should reference the fstream library for details on how to read files. Also, you will need to loop until you read all of the records in a file. You should use the eof() (end of file) function to do so. Here is an illustrative example with tips on how to use it properly.

4.3. Writing binary files

To write integers and character strings to a binary file you can use the following example code:

  std::fstream outfile(filename, ios::out | ios::binary);
  //You need to figure out how to set these variables and loop through the list
  int nameLen = ...;
  char *name = ...;
  ...

  outfile.write((char*)&nameLen, sizeof(int));//write the length of the name
  outfile.write(name, nameLen); //write the name using the specific length
  outfile.close(); //close the file when done writing

Remember to close the file when done writing, otherwise your results may not be saved to disk.

5. Tips and Hints

5.1. c-strings

See Prof. Newhall’s help page on Strings in C for a refresher on using c-strings. In particular, you will need to use strcpy, strlen, printf, among others. Also, recall that you cannot use comparison operators (<,>,==, etc.) with c-strings. Instead you must strcmp.

Since your code is compiled in C++, you can use new and delete instead of malloc and free e.g.,

char *staticVar = "this is a string"; //a static c-string
char *dynamicVar = nullptr; // we well allocate/deallocate space for this c-string

printf("%s\n", staticVar); //Prints out "this is a string"
len = strlen(staticVar);

dynamicVar = new char[len+1]; //add one for the null termination '\0'
strcpy(dynamicVar, staticVar); //copies "this is a string"

printf("%s\n", dynamicVar); //should print "this is a string"

delete [] dynamicVar //clean up when done
dynamicVar = nullptr; // set pointer to null to detect memory access errors

C-strings are arrays; allocating them dynamically requires that you also clean up memory when it is no longer being used. This also means that, c-strings variables are pointers - treat them as such.

5.2. const modifier

const is a modifier that guarantees that an entity will not be modified while in scope. You will see it used is a modifier to a variable’s type. const signifies that a modified after it is declared. When used for a function parameter as in EmployeeList::insert and EmployeeNode::setName, the keyword states that the value of the parameter cannot be modified in that function. EmployeeList::insert() can only read the values that have been sent it, it is not allowed to modify them.

Note that nothing changes on the caller’s end of function use. This simply provides a contract guarantee to a user that their data cannot be modified by a called function. It is enforced by the compiler (attempting to modify a constant will lead to a compile-time error message).

Practically, this means that setName cannot simply store the pointer. This is illegal:

void EmployeeNode::setName(char const *nameIn){
  this->name = nameIn; //type mismatch
}

For one, this is bad coding; by sharing the pointer, you create potential bugs since the value of this→name is now directly accessible outside the scope of the EmployeeNode class. Also, this will not compile as the type of this→name (char *) is different than the type of nameIn (char const *nameIn). You will, instead, need to allocate new space for this→name and copy in the contents of nameIn.

5.3. Error messages

Above, we mentioned that you will return error statuses via an int for your read and write functions. This is common practice, particularly in C programs where exceptions are not used. To make this aspect of your code more readable, define and use meaningful constants. For example:

//static makes this accessible throughout this file
//const ensures the value cannot be changed
static const int ERROR_NULL_FILENAME = -1;

/* Function purpose
 * @params ...
 * @return ...
 * @error returns ERROR_NULL_FILENAME if the filename pointer is nullptr
 */
int readFromFile(char *filename, EmployeeList *list){
  //...
  //some code
  if (filename == nullptr){
    return ERROR_NULL_FILENAME; //returns a -1, but someone reading your
                                //code cares more about the reason for error
  }
  //...rest of program

Instead of returning -1, the function returns a variable (this variable’s value happens to be -1 but this is abstracted). The key is that in main, you can check for this by comparing the return value to the error values you expect to find. For example, if you store the function return value in main using status, you can write:

  if (status == ERROR_NULL_FILENAME){
    //handle this error
  }

This can be easier to understand (and less prone to bugs) than the equivalent statement:

  if(status == -1){
    //handle this error
  }

In future labs, we will instead use C++ exceptions to throw exceptions when an error occurs (instead of returning an error code) and try/catch blocks around code that could evoke an exception. If you prefer, you can use exceptions instead of error statements for this lab, but that is not required or expected.

5.4. Testing your code

It is difficult to verify the output since it is not text. Here are some strategies:

  • To verify that your writeToFile function works, try using the output file of one run as one of the input files in a subsequent run. Read errors mean that the write phase was not correct. This will allow you to see the contents of the file as the employee records get printed to screen when it is read.

  • Run wc (word count) or ls -l to get the number of bytes in the output and input files. The output file should have exactly the same number of bytes as the sum of the size of the two inputs.

  • Generate a hex dump of a binary file using xxd.

    $ xxd infile1.dat
    00000000: 0c00 0000 5475 7269 6e67 2c20 416c 616e  ....Turing, Alan
    00000010: c8af 0000 0d00 0000 486f 7070 6572 2c20  ........Hopper,
    00000020: 4772 6163 6550 c300 000d 0000 004c 6f76  GraceP.......Lov
    00000030: 656c 6163 652c 2041 6461 3aaa 0000 1000  elace, Ada:.....
    00000040: 0000 4261 6262 6167 652c 2043 6861 726c  ..Babbage, Charl
    00000050: 6573 ed11 0100                           es....
  • We will also test your code with held-aside input files, so you should try create other input files for both debugging purposes early on (i.e., simple tests) as well as more stressful tests. To create new binary input files, use the createbin program provided in ~newhall/public/cs44/. It will take in a text file that you create and output its binary version, which you can then feed into your sortEmployees program:

~newhall/public/cs44/createbin yourtextfile binaryoutfile

In input/, we have given you infile1.dat and infile2.dat as sample inputs. You will also find the corresponding text files so you can see how to format your input to createbin.

6. Submitting your lab

Review the lab deliverables to ensure you have completed all of your work. Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.

From your local repo (in your ~/cs44/labs/Lab1-userID1-userID2 subdirectory)

make clean
git add employee.h employee.cpp sortEmployees.cpp
git commit -m "my correct and well commented solution for grading"
git push

Verify that the results appear (e.g., by viewing the the repository on cs44-f20). You will receive deductions for submitting code that does not run or repos with merge conflicts. Also, note that the time stamp of your final submission is used to record late days, so please do not update your repo until after the late period has ended.

If that doesn’t work, take a look at the "Troubleshooting" section of the Using git for CS44 labs and the Using git pages. At this point, you should submit the required Lab 1 Questionnaire (each teammate must do this).

7. Handy References