CS44: Database Management Systems

Lab 0: C++ Warmup/Binary IO

Due by 11:59 p.m., Wednesday, Sept 12, 2018.

This is an individual lab. You are to complete this lab on your own, although you may discuss the lab concepts with your classmates. Remember the Academic Integrity Policy: do not show your code to anyone outside of the course staff and do not look at anyone else’s code for this lab. If you need help, please post on the Piazza forum or contact the instructor. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

The task for this week’s lab: given employee data from two separate files, combine the contents into a sorted list and output the results to file. You will implement a linked-list data structure to maintain your sorted list of employee data. This will simulate, at a low-level, a common operation for storing raw data in a DBMS.

While this lab serves primarily as a warm-up exercise and reminder of C++ programming, we will also introduce new concepts. The learning objectives for this assignment:

You may want to read a bit about binary files before getting started.

Getting Started

First create a cs44 directory in your home directory, and add a labs subdirectory to it:

mkdir cs44
cd cs44
mkdir labs
cd labs
pwd

We will be using git repos hosted on the college’s GitHub server for labs in this class. If you have not used git or the college’s GitHub server before, here are some instructions on Using Git (follow the instructions for repos on Swarthmore’s GitHub Enterprise server).

Next find your git repo for this lab assignment off the GitHub server for our class: CPSC44-F18

Clone your git repo (Lab0-userID) containing starting point files into your labs directory:

cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab0-userID

If all was successful, you should see the following files (highlighted files require modification):

Linked List Implementation

In employee.cpp/h, you will implement an EmployeeList and related EmployeeNode class to manage a sorted linked list of employee data. The data will be sorted based on the name field of EmployeeNode.

EmployeeNode class

The data members and public methods have been provided. You may add private methods, but shouldn’t need to. Begin by implementing the EmployeeNode class in employee.cpp. This class represents one node in a linked list:

The constructor should initialize all data members (e.g., set pointers to NULL) and the destructor should deallocate name if it is not NULL.

Each of the above data members has a corresponding setter and getter method. Note that c-style strings are arrays of characters, so you will need to allocate and deallocate memory. See the section on c-strings below for details.

EmployeeList class

EmployeeList is a linked list with a limited interface – there is only one type of method for inserting and no methods for removing. While this may seem like a poor design, it is common to implement limited data structures to accomplish specific tasks efficiently and predictably. Do not modify the public interface, or add additional data members. You may add private methods, if needed.

There is only one data member for the linked list: the head of the list. You will need to implement the class methods:

Main program

In sortEmployees.cpp, you will write a main program that, at a high level:

General Requirements

Usage and Sample Runs

$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat

Your program takes in three arguments: two binary files to read and the name of the file to write the resulting list. Incorrect usage should result in an error message e.g.,

$ ./sortEmployees
usage: sortEmployees file1 file2 resultfile

See here for the expected output.

Reading the input files

In sortEmployees.cpp, you must implement a function to read from binary files:

int readFromFile (char const *filename, EmployeeList *list)

This function reads in employee records from the binary file specified by the parameter filename and adds each employee record list. Return 0 on success and any non-zero error value on failure (there may be multiple reasons for failure; in the function comments, be sure to specify the error associated with each value you return). See below for tips on how to complete this function

Formatted Output

Your program should use formatted output (e.g., printf) to display the employee data as it is read from the input files (i.e, in readFromFile()). Your program should finish by displaying the sorted list of employee records. See examples below.

Binary File Format

You are used to reading files that are in text format - all data is stored as ASCII characters regardless of whether it is actually a number or character. Alternatively, we could store the data in a format more native to computing - as binary. You cannot open and read these files in a text editor (because they aren’t text). But they can be much more predicable in terms of how data is formatted.

Employee Data Files

Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has three fields: the length of the name (4 byte integer), the actual name (variable length), and a salary (4 byte integer). The format of an employee record in the binary file is as follows:

  ----------------------------------------------------------------------------
 |    4 byte integer     |     N character string     |    4 byte integer     |
 | Number of characters  |        Employee name       |    Employee salary    |
 | in employee name (N)  |    (not null-terminated)   |                       |
  ----------------------------------------------------------------------------

Reading binary files

To read integers and character strings from a binary file you can use the following code as an example:

  //open file specified in filename. Second parameter specifies that
  // the file is for input and is in binary format
  fstream infile(filename, ios::in | ios::binary);
  int nameLen;
  char *name;

  // read 4 bytes into memory location of nameLen
  // read assumes everything is char, so we will here we typecast
  // the address of nameLen (the destination) as a char *
  infile.read((char*)&nameLen, sizeof(int));

  //Allocate space and read characters for string
  name = new char[nameLen+1];
  infile.read(name, nameLen);
  // TODO: remember to null terminate strings with '\\0'
  // TODO: add code to finish reading file
  infile.close();

You should reference the fstream library for details on how to read files. Also, you will need to loop until you read all of the records in a file. You should use the eof() (end of file) function to do so. Here is an illustrative example with tips on how to use it properly.

Writing binary files

To write integers and character strings to a binary file you can use:

  fstream outfile(filename, ios::out | ios::binary);
  //You need to figure out how to set these variables and loop through the list
  int nameLen = ...;
  char *name = ...;
  ...

  outfile.write((char*)&nameLen, sizeof(int));//write the length of the name
  outfile.write(name, nameLen); //write the name using the specific length
  outfile.close(); //close the file when done writing

Remember to close the file when done writing, otherwise your results may not be saved to disk.

Tips and Hints

c-strings

See Prof. Newhall’s help page on Strings in C for a refresher on using c-strings. In particular, you will need to use strcpy, strlen, printf, among others. Since your code is compiled in C++, you can use new and delete instead of malloc and free e.g.,

char *staticVar = "this is a string"; //a static c-string
char *dynamicVar = NULL; // we well allocate/deallocate space for this c-string

printf("%s\n", staticVar); //Prints out "this is a string"
len = strlen(staticVar);

dynamicVar = new char[len+1]; //add one for the null termination '\0'
strcpy(dynamicVar, staticVar,len); //copies "this is a string"

printf("%s\n", dynamicVar); //should print "this is a string"

delete [] dynamicVar //clean up when done

C-strings are arrays; allocating them dynamically requires that you also clean up memory when it is no longer being used. This also means that, c-strings variables are pointers - treat that as such.

const modifier

const is a modifier that guarantees that an entity will not be modified while in scope. You will see it used in several contexts throughout the given code. There are two main uses:

Note that nothing changes on the caller’s end of function use. This simply provides a contract guarantee to a user that their data cannot be modified by a called function. It is enforced by the compiler (attempting to modify a constant will lead to a compile-time error message).

Error messages

Above, I mention that you will return error statuses via an int for your read and write functions. This is common practice, particularly in C programs where exceptions are not used. To make this aspect of your code more readable, consider using meaningful constant values. For example:

//Global static variable
static const int ERROR_NULL_FILENAME = -1;

/* Function purpose
 * @params ...
 * @return ...
 * @error returns ERROR_NULL_FILENAME if the filename pointer is NULL
 */
int readFromFile(char const *filename, EmployeeList *list){
  //...
  //some code
  if (filename == NULL){
    return ERROR_NULL_FILENAME; //returns a -1, but someone reading your
                                //code cares more about the reason for error
  }
  //...rest of program

Instead of returning -1, the function returns a variable (this variable’s value happens to be -1 but this is abstracted). The key is that in main, you can check for this by comparing the return value to the error values you expect to find. For example, if you store the function return value in main using status, you can write:

  if (status == ERROR_NULL_FILENAME){
    //handle this error
  }

This can be easier to understand (and less prone to bugs) than the equivalent statement:

  if(status == -1){
    //handle this error
  }

Testing your code

It is difficult to verify the output since it is not text. Here are some strategies:

./createbin yourtextfile binaryoutfile

In input/, I have given you infile1.dat and infile2.dat as sample inputs. I have also provided the corresponding text files so you can see how to format your input to createbin.

Submitting your lab

Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.

From your local repo (in your ~userID/cs44/labs/Lab0-userID subdirectory)

git add *
git commit -m "my correct and well commented solution for grading"
git push

Verify that the results appear correct online. You will receive deductions for submitting code that does not run or repos with merge conflicts. Also, note that I take the time stamp of your final submission to record late days, so please do not update your repo until after the late period has ended.

If that doesn’t work, take a look at the “Troubleshooting” section of the Using git page. Also, be sure to complete the README.md file.