CS44 Lab 0: Binary I/O

Due by 11:59 p.m., Wednesday, Sept 7, 2016

This assignment is to be done individually. Most subsequent lab assignments will be done with a partner.

The task for this week's lab is fairly manageable: given employee data from two separate files, combine the contents in a sorted list and output the results to file. You will implement a linked-list data structure to maintain your sorted list of employee data. This will simulate, at a low-level, a common operation for storing raw data in a DBMS.

While this lab serves primarily as a warm-up exercise and reminder of C++ programming, we will also introduce new concepts. The learning objectives for this assignment:

Lab 0 Starting point
First create a cs44 directory in your home directory, and add a labs subdirectory to it:
mkdir cs44
cd cs44
mkdir labs
cd labs
We will be using git repos hosted on the college's GitHub server for labs in this class. If you have not used git or the college's GitHub server before, here are some instructions: Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).

Next find your git repo for this lab assignment off the GitHub server for our class: CS44-f16

Clone your git repo with the lab 0 starting point files into your labs directory:

cd ~/cs44/labs
git clone [the ssh url to your your repo)
Then cd into your Lab0-you subdirectory. If all was successful, you should see the following files (files highlighted in blue require modification): If this didn't work, or for more detailed instructions on git see: the Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).

Run the following to create a symlink to the createbin program that you can use to create binary files from ascii files (more info below about how to run it):

make setup

Implementation Details

EmployeeList class

In employee.cpp/h, you will implement an EmployeeList and related EmployeeNode class to manage a sorted linked list of employee data. The data will be sorted based on the name field. Each employee (stored as an EmployeeNode) will be described with a name (c-string) and salary (int). A list of more specific requirements and details:

Main program

In sortEmployees, you will then write a main program that reads in employee data from two unsorted binary files (the format of these files is described below), merges the data from two files together in sorted order on employee name, and outputs the resulting sorted list of employee data to a binary file. You may use any algorithm for sorting, but pick one that works well for this application. Your program's output should match the sample output regardless of how you sort.

Your program will also output information to standard output as it makes progress. Specifically, your program should use formatted output to display the employee data as seen in the input files and finalize with a display of the sorted list of employee data with average salary. as an example). The three files used by your program (two input and one output) will be passed to your program via command line arguments. See Tips and Hints for example usage.

Some more specifics to consider for your main program:

Tips and Hints

Format of Employee Files

Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has two fields: name (variable length) and salary (4 byte integer). The format of an employee record in the binary file is as follows:

 |    4 byte integer     |     N character string     |    4 byte integer     |
 | Number of characters  |        Employee name       |    Employee salary    |
 | in employee name (N)  |    (not null-terminated)   |                       |

Reading and Writing binary files

To read integers and character strings from a binary file you can use:

  //open file specified in filename. Second parameter specifies that
  // the file is for input and is in binary format
  fstream infile(filename, ios::in | ios::binary);
  int nameLen;
  char *name;

  // read 4 bytes into memory location of nameLen 
  // here we typecast the address of nameLen (the destination) as a char *
  infile.read((char*)&nameLen, sizeof(int));

  //Allocated space and read characters for string
  name = new char[nameLen+1];
  infile.read(name, nameLen);   
  // remember to null terminate strings with '\0'

To write integers and character strings to a binary file you can use:

  fstream outfile(filename, ios::out | ios::binary);
  int nameLen = ...;
  char *name = ...;
  outfile.write((char*)&nameLen, sizeof(int));
  outfile.write(name, nameLen);

When you are done implementing the employee list classes, you should be able to type make and have your code compile to give an executable called sortEmployees. Run this executable to test your code.

Sample Output

# a run with the wrong number of command line args, should exit with message
$ ./sortEmployees 
usage: sortEmployees 'file1' 'file2' 'resultfile'

# a run with correct number of command line args:
$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat
See here for the expected output.

The input data files are given to you with the starting point code. These files, however, do not test all corner cases for the linked list so be sure to design further tests. You should design a strategy for verifying your output files are correct, as well. For example, you can use createbin to create a binary file (given your expected result in text format) and then diff your actual result with the createbin result.

Submitting your lab
Before the Due Date, push your solution to github from one of your local repos to the GitHub remote repo.

From your local repo (in your ~you/cs44/labs/Lab01-you subdirectory)

git add *
git commit -m "my correct and well commented solution for grading"
git push

If that doesn't work, take a look at the "Troubleshooting" section of the Using git page. Also, be sure to complete the README.md file.