CS44 Lab 3: Heap Pages

Due by 11:59 p.m., Thursday, Oct 20, 2016
but I encourage you to complete it before fall break
Checkpoint: due during Lab on Oct 3
demo basic insertRecord and getRecord functionality
(pass test1 and your own test2 that tests getRecord)
Introduction

This assignment is to be done with your assigned Lab 3 partner. You may not work with other groups, and the share of workload must be even between both partners. Failing to do either is a violation of the department Academic Integrity policy. Please read over Expectations for Working with Partners on CS Lab work

Lab 3 Partner List for Lab Section A
Lab 3 Partner List for Lab Section B


Lab 3 Goals

The goal of the WiscDB projects is to allow students to learn about the internals of a data processing engine. In the first assignment, you built a buffer manager, on top of an I/O Layer that I provide, to understand the role of a system in managing main memory efficiently.

In this second part, we will drill into this I/O layer to understand details of managing data in a file. You will complete the heap file representation of the data - that is, one where records are stored in an unsorted manner. This will allow you to directly simulate the physical structure of data on disk and help prepare you for future labs. Recall that a file is a collection of pages which themselves maintain a collection of records. I have provided you with the File class, which provides a logical ordering of pages for you (e.g., a linked list representation of pages). You must complete the I/O layer by implementing the Page class, which provides the physical layout of records on a disk page.

The code base is quite extensive and, as with the BufferManager lab, will require much reading of API documentation and thorough testing. You are responsible for making significant progress early on the in assignment as waiting until the last few days will not be manageable.

The goals of this assignment include:

Lab 3 Starting point
First, find your Lab3-partner1-partner2 git repo off the GitHub server for our class: CS44-f16

Next, clone your Lab 3 git repo into your cs44/labs subdirectory, cd into your repo, and run make setup:

cd
cd cs44/labs
git clone [the ssh url to your repo]
cd Lab3-partner1-partner2
make setup
Here are some instructions on Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).

If all was successful, you should see the following files and symlinks (files highlighted in blue require modification):

Running make setup will add symlinks to execptions, include, lib directories. You can open .h and .cpp files in exections and include to see definitions. It is probably easiest to refer to the online documentation for exceptions.


The WiscDB I/O Layer: Heap Files
The lowest layer of the WiscDB database systems is the I/O layer. This layer consists of two classes: a file (class File) and a page (class Page) class. These classes use C++ exceptions to handle the occurrence of any unexpected event. Implementation of the File class, and the exception classes are provided to you. The code has been adequately commented to help you with understanding how it does what it does.

At a high-level, the File class abstracts the physical structure of files from upper-levels of the database. It stores a linked-page structure of heap-file pages that allow iteration of contents of a file and the storage of records. The File class allows the upper level of the system to:

Before reading further you should first read the documentation for the File class while reviewing the concept of a heap file in Chapter 9 of the textbook. Again, this implementation has been completed and thus the details of managing a collection of pages is an abstraction for both higher layers (e.g., the buffer manager) as well as lower layers (i.e., the page layer).


Heap Page

The Page class describes the structure and set of operations that can be performed on a slotted heap page. A page stores a set of records and supports operations such as:

Specifically, you will implement a variable-length slot representation. Review your material from lecture and the textbook. Our representation is very similar to the one depicted on Figure 9.7 with the following alteration: the header and slot directory is stored at the beginning of the page (the directory grows forward) and the heap space is filled from back to front. The structure is illustrated below:

In your Heap Page implementation:

Your job in this assignment is to complete the implementation of the Page class (read the page.h file, implement page.cpp). You will have to pay close attention to pointer type and make use of sizeof to access different types within a page (an array of bytes/char).

You are provided the following structures to represent a page object, defined in page.h (DO NOT ADD or MODIFY DATA MEMBERS):

Thus, a page is simply the concatenation of the header and data array. So another view of the Heap File Page organization shown above, is in terms of the wiscdb Page class definition. The Page class has two fields (header and data). The data field is used to store the slot directory and the data records. The slot directory grows into freespace from one end of the data array, the record data compacted from the other end:


Implementation of the Page class

The Page class consists of several public methods that are utilized as the interface to a heap page as well as a set of private methods which are intended as either helper methods or restricted access methods to be used by friend classes (such as File and PageIterator). The following is a list of methods in no particular order. You should not simply implement each method one-by-one; you need to carefully plan your implementation in tandem with a testing strategy. Failure to do so will lead to significant problems as the data array structure is handled directly by you.

Provided methods

These methods have been implemented for you:

Basic accessor/mutator methods to implement

Inserting a record

Records to be inserted into a page are passed as an C-style string (an array of char with a terminating '\0' character). When you insert a new record in a Heap Page, you should NOT store the terminating '\0' part of each record (i.e. the record "Tia" should take up 3 bytes of space in the page and not 4).

Record Id consists of the (slot_number, page_number). NOTE:slot_numbers start at 1 not 0.

These are the set of methods involved in adding a record to a page:

Getting a record

To obtain a record from a page, you will implement the following:

Delete a Record

There are two methods for deleting a record:

Updating a record


Coding and Testing

In the BufferPool lab, you spent most of your time understanding the tests I provided; only a few methods were not covered in those tests. For this lab, you will be given much less guidance by me. As usual, you are more than welcome to discuss testing strategies with your classmates (but not the testing code). Think carefully about how your implementation could go wrong. For example, are you properly calculating the free space available? You can test this by inserting the exact amount of data a page should fit and see if it succeeds. Then, see if you Page correctly fails if you try to insert even a little more.

You can test your heap Page implementation for correctness without having the buffer manager layer. Create a file, allocate a Page in the file, and do your testing on that Page. You are not testing the full Heap File implementation, and you are not testing your Heap Page as it would be used in a full DBMS system implementation (where there would be a BufferPool into which copies of file pages would be loaded to be accessed and modified).


Acknowledgements

The code base was provided and developed by the University of Wisconsin Database Group.


Submitting your lab
Before the Due Date, push your solution to github from one of your local repos to the GitHub remote repo.

From your local repo (in your ~you/cs44/labs/Lab03-partner1-partner2 subdirectory)

git add *
git commit -m "our correct, robust, and well commented solution for grading"
git push

If that doesn't work, take a look at the "Troubleshooting" section of the Using git page. Also, be sure to complete the README.md file, add and push it.