CS44 Lab 0: Binary I/O

Due by 11:59 p.m., Sunday, February 2, 2014
`
Introduction

This assignment is to be done individually. Most subsequent lab assignments will be done with a partner.

The purpose of this assignment is to get you acquainted with some particular aspects of C++ and Unix in order to prepare you for future lab assignments. While many of the requirements serve as a reminder/warm-up, you will encounter several concepts that may be new to you including:

The goal of this lab is: given employee data from two separate binary files that are potentially sorted, output a binary file with all records combined in sorted order. This will simulate, at a low-level, a common operation for storing the raw data for a DBMS.

To get started, run update44 to obtain the starting point files in ~/cs44/labs/0/. You should obtain the following files (files highlighted in blue require modification):

When you are ready to submit your lab, use handin44. Recall that only files in the ~/cs44/labs/0 subdirectory will be submitted. You may submit as many times as you wish; only the most recent copy will be saved.

Assignment

You will first implement an EmployeeList and related EmployeeNode class to manage a sorted linked list of employee data. Each employee (stored as an EmployeeNode) will be described with a name (c-string) and salary (int). You will then write a main program that reads in employee data from two unsorted binary files (the format of these files is described below), merges the data from two files together in sorted order on employee name, and outputs the resulting sorted list of employee data to a binary file.

Additionally, as your program reads each employee's information from an input file it should print it to stdout in tabular format. Also, before your program writes the resulting sorted merged data from the two input files, it should also print it to stdout in tabular format, and print out the total number of employees and the average salary (see the sample run of my program below as an example). The three files used by your program (two input and one output) will be passed to your program via command line arguments.

Specifics

The starting point code for the EmployeeNode and EmployeeList classes are provided in employee.[cpp/h]. You are free to add any private methods that aid you with the implementation (think good modular design). DO NOT change the name field to C++ string (it should be a C-style string; i.e.,char *). The other main file to edit is sortEmployees.cpp which implements the main program. Below are a few detailed specifics to help you with development.

As always, your program should use good design, have good error handling, and be well commented. For example, each file should have a top-level comment describing its purpose and author; each function should have a comment describing its purpose and usage (i.e., parameters, return values, and error conditions). In addition it should be free of all memory access errors. Run valgrind to find and fix any memory access errors in your program (see the Compiling, Debugging and Linking Tips" link for information on how to use gdb and valgrind).


Tips and Hints

Format of Employee Files

Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has two fields: name (variable length) and salary (4 byte integer). The format of an employee record in the binary file is as follows:

  ----------------------------------------------------------------------------
 |    4 byte integer     |     N character string     |    4 byte integer     |
 | Number of characters  |        Employee name       |    Employee salary    |
 | in employee name (N)  |    (not null-terminated)   |                       |
  ----------------------------------------------------------------------------

Reading and Writing binary files

To read integers and character strings from a binary file you can use:

	//open file specified in filename. Second parameter specifies that
	// the file is for input and is in binary format
	fstream infile(filename, ios::in | ios::binary);
	int nameLen;
	char *name;

	// read 4 bytes into memory location of nameLen 
	// here we typecast the address of nameLen (the destination) as a char *
	infile.read((char*)&nameLen, sizeof(int));

	//Allocated space and read characters for string
	name = new char[nameLen+1];
	infile.read(name, nameLen);   
	// remember to null terminate strings with '\0'
	...
	infile.close();

To write integers and character strings to a binary file you can use:

	fstream outfile(filename, ios::out | ios::binary);
	int nameLen = ...;
	char *name = ...;
	...
	outfile.write((char*)&nameLen, sizeof(int));
	outfile.write(name, nameLen);
	outfile.close();

When you are done implementing the employee list classes, you should be able to type make and have your code compile to give an executable called sortEmployees. Run this executable to test your code.

Sample Output

# a run with the wrong number of command line args, should exit with message
$ ./sortEmployees 
usage: sortEmployees 'file1' 'file2' 'resultfile'

# a run with correct number of command line args:
$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat

Reading File 1: input/infile1.dat

Name                 Salary
----                 ------
Turing, Alan         45000
Hopper, Grace        50000
Lovelace, Ada        43578
Babbage, Charles     70125

Reading File 2: input/infile2.dat

Name                 Salary
----                 ------
Soni, Ameet          150
Danner, Tex          85
Wicentowski, Rich    100000
Tyson, Neil deGrasse 850269

Result:

Name                 Salary
----                 ------
Babbage, Charles     70125
Danner, Tex          85
Hopper, Grace        50000
Lovelace, Ada        43578
Soni, Ameet          150
Turing, Alan         45000
Tyson, Neil deGrasse 850269
Wicentowski, Rich    100000

Number of total employees: 8
Average salary: 144900.88

These input data files and are given to you with the starting point code. You can test your program and diff your result file with mine to see if your program works correctly for this example. You should design a strategy for verifying your output files are correct. There are several hints in this document.


Submitting your lab

Submit using handin44. Please run make clean before submitting to keep file sizes down. Also, be sure to complete the README file.