Project 1: Warmup C++ Assignment

Due: Friday Sept. 14 before 1am (very late Thursday night)

Note: This assignment is to be done individually. All subsequent assignments should be done with a partner.


Introduction

The purpose of this assignment is to get you acquainted with some particular aspects of C++ and Unix in order to prepare you for the course project. It will illustrate C-style strings, iostream, and the fstream libraries, as well as the basic OO features of C++.

Getting Started

Create a new working directory for this programming assignment (you should work on each project in its own protected subdirectory (e.g. ~/cs44/proj1)), and copy the project 1 starting point into your working directory. For example:

	mkdir cs44            
	chmod 700 cs44     
	mkdir cs44/proj1
	cp -r  /home/newhall/public/cs44/proj1/*  ~/cs44/proj1/.  
  

You should have all the files that you need to work on the assignment. Your job is to complete the implementation in these starting point files. Use make to compile (there is a starting point Makefile provided, if you add additional .h or .C files, you will need to add them to the Makefile).

What You Have to Do

You will write a program that reads in employee data from two unsorted binary files (the format of these files is described below), merges the data from two files together in sorted order on employee name, and outputs the resulting sorted list of employee data to a binary file. Additionally, as your program reads each employee's information from an input file it should print it to stdout in tabular format, and before your program writes the resulting sorted merged data from the two input files, it should also print it to stdout in tabular format, and print out the total number of employees and the average salary (see the sample run of my program below as an example). '\t' is the tab character that can be used to get nice tabular output. The three file used by your program (two input and one output) will be passed to your program via command line arguments.

Your program will read employee data into a singly linked list. The employee list should be maintained sorted alphabetically by employee name. You should not assume that the employee records in the input file are already in sorted order. However, your program should write the list to an output binary file in sorted order.

Your program should use good OO design, have good error handling, and be well commented. In addition it should be free of all memory access errors. Run valgrind to find and fix any memory access errors in your program (see the Compiling, Debugging and Linking Tips" link for information on how to use gdb and valgrind).

Details

Starting point code for the EmployeeNode and EmployeeList classes are provided in employee.[Ch]. You are free to add any private methods that would help you with the implementation (think good modular design), do not change the name field to string (it should be a C-style char *), and you definitely should add more comments to these files (read the C++ Code Style Guide to get an idea of what I expect).

Format of Employee Files

Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has two fields: name (variable length) and salary (4 byte integer). The format of an employee record in the binary file is as follows:
  ----------------------------------------------------------------------------
 |    4 byte integer     |     N character string     |    4 byte integer     |
 | Number of characters  |        Employee name       |    Employee salary    |
 | in employee name (N)  |    (not null-terminated)   |                       |
  ----------------------------------------------------------------------------

The difference between a binary file and an ascii file is best demonstrated by how each stores numerical values. For example, the integer value 12345 is 11000000111001 in binary. In a binary file, an int is stored in 4 bytes. The integer value 12345 would be stored in a binary file as:


 00000000000000000011000000111001
 --------------------------------
|    4 byte int value 1234      |
 --------------------------------
 	4 bytes to store 12345
in an ascii file, 12345 would be stored in 5 bytes (one byte of each digit):
 00110001  00110010  00110011  00110100  00110101 
 -------------------------------------------------- 
| 1 byte  | 1 byte  | 1 byte  | 1 byte  | 1 byte   |  
| ascii   | ascii   | ascii   | ascii   | ascii    | 
| char '1'| char '2'| char '3'| char '4'| char  '5'| 
 --------------------------------------------------- 
 		5 bytes to store 12345

To read integers and character strings from a binary file you can use:

  // | is the binary OR operator (it OR's the bits of the operands)
	// this tells the constructor to open the file as a binary (ios::binary)
	// file that will be read (ios::in)
	fstream infile(filename, ios::in | ios::binary);
	int nameLen;
	char *name;
	...

  // read 4 bytes into memory location of nameLen 
	// here we typecast the address of nameLen (the destination) as a char *
	infile.read((char*)&nameLen, sizeof(int));
	...
	infile.read(name, nameLen);   
	// remember to null terminate strings with '\0'

To write integers and character strings to a binary file you can use:

	fstream outfile(filename, ios::out | ios::binary);
	int nameLen;
	char *name;
	...
	outfile.write((char*)&nameLen, sizeof(int));
	...
	outfile.write(name, nameLen);
	outfile.close();

When you are done implementing the employee list classes, you should be able to type make and have your code compile to give an executable called elproc. Run this executable to test your code.

Sample Output

# a run with the wrong number of command line args, should exit with msg
%  ./elproc 
usage: elproc 'file1' 'file2' 'resultfile'

# a run with correct number of command line args:
% ./elproc  infile1.dat infile2.dat outfile

input file1:

Name                    Salary
-----                   ------
Smith, Jo               40
Hopper, Grace           50
Lovelace, Ada           20
Turing, Alan            30
Amdahl                  30

input file2: 

Name                    Salary
-----                   ------
Marley, Bob             20
Stone, Sylvester        40
Cervenka, Exene         30
Richman, Jonathan       70
Brown, James            20
Cash, June Carter       40
Green, Al               10

merged files:

Name                    Salary
-----                   ------
Amdahl                  30
Brown, James            20
Cash, June Carter       40
Cervenka, Exene         30
Green, Al               10
Hopper, Grace           50
Lovelace, Ada           20
Marley, Bob             20
Richman, Jonathan       70
Smith, Jo               40
Stone, Sylvester        40
Turing, Alan            30

Number of employees in merged list = 12 
Average salary in merged list = 33.3  
These input data files and the expected result file, result.dat, is given to you with the starting point code. You can test your program and diff your result file with mine to see if your program works correctly for this example.

When you hand in your code, I will also test it with other input files, so you should try other cases as well. To create new binary input files, use the createbin program provided with the starting point code:

./createbin asciiinputfile binaryoutfile
A sample ascii file in the correct format (asciifile) is provided.

Handing in Your Code

Create a tar file of the entire subdirectory containing your solution source code (do a make clean before creating the tar file, so that you are not including object and executable files in your submission). Then submit your solution tar file using cs44handin.

Good luck!