CS44 Lab 3: Heap Pages

Due by 11:59 p.m., Saturday, March 22, 2014
`
Introduction

This assignment is to be done with a partner. You may not work with other groups, and the share of workload must be even between both partners. Failing to do either is a violation of the department Academic Integrity policy.

The goal of the WiscDB projects is to allow students to learn about the internals of a data processing engine. In the first assignment, you built a buffer manager, on top of an I/O Layer that I provide, to understand the role of a system in managing main memory efficiently.

In this second part, we will drill into this I/O layer to understand details of managing data in a file. You will complete the heap file representation of the data - that is, one where records are stored in an unsorted manner. This will allow you to directly simulate the physical structure of data on disk. Recall that a file is a collection of pages which themselves maintain a collection of records. I have provided you with the File class, which provides a logical ordering of pages for you (e.g., a linked list representation of pages). You must complete the I/O layer by implementing the Page class, which provides the physical layout of records on a disk page. The code base is quite extensive and, as with Lab 1, will require much reading of API documentation and thorough testing. You are responsible for making significant progress early on the in assignment as waiting until the last few days will not be manageable. The goals of this assignment include:

To get started, run update44 to obtain the starting point files in ~/cs44/labs/3/. You should obtain the following files (files highlighted in blue require modification):

When you are ready to submit your lab, use handin44. Recall that only files in the ~/cs44/labs/3 subdirectory will be submitted. You may submit as many times as you wish; only the most recent copy will be saved.

The WiscDB I/O Layer: Heap Files
The lowest layer of the WiscDB database systems is the I/O layer. This layer consists of two classes: a file (class File) and a page (class Page) class. These classes use C++ exceptions to handle the occurrence of any unexpected event. Implementation of the File class, and the exception classes are provided to you. The code has been adequately commented to help you with understanding how it does what it does.

At a high-level, the File class abstracts the physical structure of files from upper-levels of the database. It stores a linked-page structure of heap-file pages that allow iteration of contents of a file and the storage of records. The File class allows the upper level of the system to:

Before reading further you should first read the documentation for the File class while reviewing the concept of a heap file in Chapter 9 of the textbook. Again, this implementation has been completed and thus the details of managing a collection of pages is an abstraction for both higher layers (e.g., the buffer manager) as well as lower layers (i.e., the page layer).


Heap Page

The Page class describes the structure and set of operations that can be performed on a slotted page. A page stores a set of records and supports operations such as:

Specifically, you will implement a variable-length slot representation. Review your material from lecture and the textbook. Our representation is very similar to the one depicted on Figure 9.7 with the following alteration: the header and slot directory is stored at the beginning of the page (the directory grows forward) and the heap space is filled from back to front. The structure is illustrated below:

Your job in this assignment is to complete the implementation of the Page class. You will have to pay close attention to pointer arithmetic and the conceptual mapping of data in a general array of data. At a high level, you are provided the following structures to represent a page object:

Thus, a page is simply the concatenation of the header and data array.


Implementation of the Page class

The Page class consists of several public methods that are utilized as the interface to a heap page as well as a set of private methods which are intended as either helper methods or restricted access methods to be used by friend classes (such as File and PageIterator). The following is a list of methods in no particular order. You should not simply implement each method one-by-one; you need to carefully plan your implementation in tandem with a testing strategy. Failure to do so will lead to significant problems as the data array structure is handled directly by you and not C++.

Provided methods

These methods have been defined for you:

Basic accessor/mutator methods to implement

Inserting a record

These are the set of methods involved in adding a record to a page:

Getting a record and updating a record

To obtain a record from a page, you will implement the following:

Delete a Record

There are two methods for deleting a record:
Coding and Testing

Acknowledgements
The code base was provided and developed by the University of Wisconsin Database Group, with many modifications at the top level.
Submitting your lab

Submit using handin44. Please run make clean before submitting to keep file sizes down. Also, be sure to complete the README file.