CS44: Database Management Systems

Lab 3: B+ Tree Index

Due by 11:59 p.m., Sunday, November 18, 2018.

This is a partnered lab. You are to complete this lab with one other person, who must attend the same lab as you. You may discuss the lab concepts with other classmates. Please use Teammaker to set your lab partner preferences. You can choose “Random” if you do not have a partner. Remember the Academic Integrity Policy: do not show your code/solution to anyone outside of the course staff and your lab partner. Do not look at any other group’s code/solution for this lab (current or past). If you need help, please post on Piazza.

Checkpoint 1 (Week 9) - demonstrate that your program can:

Checkpoint 2 (Week 10) - demonstrate that your program can:

Introduction

A key distinction in data structures designed for database management systems is the fact that data (i.e., relations) are stored in files on secondary storage (usually disk). In addition to discussing the effect of data structure design, we must also discuss each file type’s particular organization and how that organization lends itself to efficient evaluation of typical operations: scan, equality search, range search, insertion, and deletion.

A DBMS index is a structure to access a relation quickly. For this assignment, you will implement a B+ Tree index that will store data entries in the form of [key, rid] pairs (i.e., Alternative 2 from the course textbook). The actual tuples are stored in a file separate from the index; entries in the index file “point to” the location of the tuple using a RecordId. You will implement a B+ tree, including the full search and insert algorithms as discussed in class. Insert must be capable of dealing with overflows (at any level of the B+ tree) by splitting pages (no redistribution will be done). The goals of this assignment include:

There are number of design choices that you need to make, and you probably need to reserve a big chunk of time for testing and debugging. So, start working on this assignment early; you are unlikely to finish this project if you start it just a week before the deadline.

Getting Started

The goal of the WiscDB projects is to allow students to learn about the internals of a data processing engine. In the first assignment, you built a buffer manager. In part two, you dove into the I/O Layer to implement a heap page. In this last part, you will construct a B+ tree class to index relations on disk.

You will notice a few differences from previous labs. Records were simply strings, often with arbitrary values. In this lab, they encode actual tuples from a relation. In addition, instead of a Heap Page, you will use (modified) page representations to map nodes on to disk and you will use a buffer manager to manage the interface between your B+ tree and the actual data on disk.

The code base is quite extensive and will require much reading of API documentation and thorough testing.

Click to see a summary of files

Both you and your partner will share a repo named Lab3-userID1-userID2. Note that the lab will not release until you have both marked your partner preferences on Teammaker. You should find your repo on the GitHub server for our class: CPSC44-F18

Clone your Lab 3 git repo containing starting point files into your labs directory:

cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab3-userID1-userID2

If you need help, see the instructions on Using Git (follow the instructions for repos on Swarthmore’s GitHub Enterprise server).

If all was successful, you should see the following files (highlighted files require modification):

As in previous labs, you need to run make setup (just once) to create symlinks to shared libraries that are needed for your programs to run:

cd ~cs44/Lab3-userID1-userID2/
make setup

I/O Layer: Updated Interface for Files and Pages

To help get you started, I have provided you with an implementation of few new classes: PageFile, RawFile, and FileScanner. The PageFile and RawFile classes are derived from the File class. These classes implement a file interface in two different ways:

FileScanner Class

The FileScanner class is used to scan records in a file. You will use this class for the base relation, and not for the index file. The file main.cpp file contains code which shows how to use this class. The public member functions of this class are described below.

Be sure to examine/understand this function for using the FileScanner. You will need to use the FileScanner in your B+ tree construction.

A B+ Tree Index

Your assignment is to implement a B+ tree index. To simplify the task, we will make the following assumptions:

The index will be built directly on top of the previously described I/O Layer (the RawFile and the Page classes). An index will need to store its data in a file on disk, and the file will need a name (see below). Recall that RawFile gives unformatted Page objects , so you will need to map the various types of data (i.e., leaf node, internal node, header page) into a raw unformatted page (an unstructured array of bytes).

To start, examine the public interface of the BTreeIndex. You can (and should!) add new private methods to this class for e.g., helper functions and good modular design. But do not modify the public interfaces or add data members.

Constructor

Insert

Range Search Methods

Destructor and Print Methods

Tips and Additional Details

Here is some additional information to help complete the lab:

Testing and Design

A large part of your time on this lab will be spent designing and testing your program. When designing your solution, be sure to consider that there are two types of nodes - leaf and non-leaf nodes

You should develop a testing strategy that parallels your design. The provided test is not useful for incremental development. You should devise smaller tests. Ideas include:

Submitting your lab

Before the due date, push your solution to github from one of your local repos to the GitHub remote repo. Only one of you or your partner needs to do this.

From your local repo (in your ~cs44/labs/Lab3-userID1-userID2subdirectory)

make clean
git add *
git commit -m "our correct, robust, and well commented solution for grading"
git push

If that doesn’t work, take a look at the “Troubleshooting” section of the Using Git page. Also, be sure to complete the README.md file, add and push it.