CS44 Fall 2007

Project 2: HeapFile Page

Due Friday, September 28 before 1am (late Thurs night)

For this and all subsequent projects, you should work with your project partner and submit a single joint solution.
Introduction

In this assignment, you will implement the page structure for the Heap File layer. You will be given a library containing the lower layers (Buffer Manager and Disk Space Manager), and some driver routines to test the code.

Begin by reviewing HeapFile organization. In particular, you should review the description of Page and record formats for Heap Files in chapter 9 of the text. A HeapFile is seen as a collection of records. Internally, records are stored on a collection of HFPage objects. For this assignment, you will implement just the HFPage class, and not all of the HeapFile code.

Your HFPages will use the page format for storing variable sized records that uses a slot array, similar to the one shown in figure 9.7 of the text. However, in the description in the text, the slot directory is located at the end of the page, and grows toward the beginning. In your implementation, the slot array will be at the beginning of the page (immediately after the page header) and will grow towards the end of the page. Variable-sized records will be stored starting at the end of the page and moving towards the top of the page. All records should be compacted on the page, and all available free space should be between the end of the slot array and the end of the compacted records (remember they grow from the bottom up). You will need to write the code so the records themselves are placed starting at the end of the page and growing towards the beginning of the page; be very careful with your pointer arithmetic.

Keep in mind that in order to add a record to a page, there has to be room for the record in the data area and also room for a new slot in the slot area (unless there happens to be a pre-allocated slot that's empty). The HFPage is illustrated below:


Getting Started

Create a working directory for this project, then copy the starting point file:

% cd cs44
% mkdir proj2
% cp -r ~newhall/public/cs44/proj2/* .
src contains staring point code (.h and .C files that you will implement)
include contains .h files for other layers of Minibase (don't change these)
lib contains the library of lower Minibase layers used by this layer.

In src, do 'make depend' to build dependencies. If you make the project, it will create an executable named hfpage . Right now, it does not work; you will need to fill in the bodies of the HFPage class methods. Sample output of a correct implementation is available in sample_output.

Design Overview and Implementation Details

void HFPage::init(PageId pageNo): This member function is used to initialize a new heap file page with page number pageNo. It should set the following data members to reasonable defaults: nextPage, PrevPage, slotCnt, curPage, usedPtr, freeSpace. You will find the definitions of these data members in hfpage.h. The nextPage and prevPage data members are used for keeping track of pages in a HeapFile. A good default unknown value for a PageId is INVALID_PAGE, as defined in page.h. Note that usedPtr is an offset into the data array, not a pointer.

PageId HFPage::getNextPage(): This member function should return the page id stored in the nextPage data member.

PageId HFPage::getPrevPage(): This member function should return the page id stored in the prevPage data member.

void HFPage::setNextPage(PageId pageNo): This member function sets the nextPage data member.

void HFPage::setPrevPage(PageId pageNo): This member function sets the prevPage data member.

Status HFPage::insertRecord(char* recPtr, int reclen, RID& rid): This member function should add a new record to the page. It returns OK if everything went OK, and DONE if sufficient space does not exist on the page for the new record. If it returns OK, it should set rid to be the RID of the new record (otherwise it can leave rid untouched.) Please note in the parameter list recPtr is a char pointer and RID& denotes passed by reference. The Status enumerated type is defined in new_error.h if you're curious about it. Use memcpy to copy the record to its correct position in the page. Your implementation should use unused slots in the slot array, before creating a new one.

DONE is a special code for non-errors that are nonetheless not "OK": it generally means "finished" or "not found." FAIL is for errors that happen outside the bounds of a subsystem.

The RID struct is defined in include/minirel.h:

struct RID {
  PageID pageNo;
  int    slotNo;

  int operator == (const RID rid) const
  	{ return (pageNo == rid.pageNo) && (slotNo == rid.slotNo); };

  int operator != (const RID rid) const
  	{ return (pageNo != rid.pageNo) || (slotNo != rid.slotNo); };
};
The pageNo identifies a physical page number (something that the buffer manager and the DB layers understand) in the file. The slotNo specifies an entry in the slot array on the page (something your code will set when a record is inserted on an HFPage).

The operators ==, !=, and << are overloaded for RID types, so that you can compare two RID object by doing somehthing like the following:

	struct RID rid1, rid2;
	...
	if(rid1 == rid2) { ... }
	if(rid1 != rid2) { ... }
	cout << rid1 << endl;

Status HFPage::deleteRecord(const RID& rid): This member function deletes the record with RID rid from the page, compacting the hole created. Compacting the hole, in turn, requires that all the offsets (in the slot array) of all records after the hole be adjusted by the size of the hole, because you are moving these records to "fill" the hole. You should leave a "hole" in the slot array for the slot which pointed to the deleted record, if necessary, to make sure that the rids of the remaining records do not change. The slot array can be compacted only if the record corresponding to the last slot is being deleted. It returns OK if everything goes OK, or FAIL otherwise.

Status HFPage::firstRecord(RID& firstRid): This routine should set firstRid to be the rid of the "first" record on the page. The order in which you return records from a page is entirely up to you. If you find a first record, return OK, else return DONE.

Status HFPage::nextRecord(RID curRid, RID& nextRid): Given a valid current RID, curRid, this member function stores the next RID on the page in the nextRid variable. Again, the order of your return records is up to you, but do make sure you return each record exactly once if someone continually calls nextRecord! Don't worry about changes to the page between successive calls (e.g. records inserted to or deleted from the page). If you find a next RID, return OK, else return DONE. In case of an error, return FAIL.

Status HFPage::getRecord(RID rid, char * recPtr, int& recLen): Given a rid, this routine copies the associated record into the memory address *recPtr. You may assume that the memory pointed by *recPtr has been allocated by the caller. RecLen is set to the number of bytes that the record occupies. If all goes well, return OK, else return FAIL.

Status HFPage::returnRecord(RID rid, char*& recPtr, int& recLen): This routine is very similar to HFPage::getRecord, except in this case you do not copy the record into a caller-provided pointer, but instead you set the caller's recPtr to point directly to the record on the page. Again, return either OK or FAIL.

int HFPage::available_space(void): This routine should return the amount of space available for a new record that is left on the page. For instance, if all slots are full and there are 100 bytes of free space on the page, this method should return (100 - sizeof(slot_t)) bytes. This accounts for the fact that sizeof(slot_t) bytes must be reserved for a new slot and cannot be used by a new record.

bool HFPage::empty(void): Returns true if the page has no records in it, and false otherwise.

Please follow the Minibase Error Protocol.

An example of how to use this protocol is given in the file ErrProc.sample. Basically, the first detector of an error returns MINIBASE_FIRST_ERROR(status_value, error_number). The caller will either handle the error, or more typically pass it on to higher layers, by calling MINIBASE_CHAIN_ERROR(status_value, errornumber). status_value is one of the Status enumerated types defined in include/new_error.h. These correspond to layers in the DBMS or data structures in the DBMS. Do not add new values. For this assignment, use HEAPFILE as the status_value. If our code first detects an error, we would do something like this:

return MINIBASE_FIRST_ERROR(HEAPFILE, INVALID_SLOTNO);

If an call to a lower layer returns an error, then we typically pass it on to a higher layer (it is also possible, but unlikely, that we handle it and don't pass it on):
status = MINIBASE_DB->write_page(pid, &bufPool[i]);
if(status != OK) {
  return MINIBASE_CHAIN_ERROR(HEAPFILE, status);
}
For this assignment, I don't think code you write will need to invoke MINIBASE_CHAIN_ERROR, but there are cases where you will need to call MINIBASE_FIRST_ERROR. Because you are not implementing the full HeapFile layer, the Heapfile error codes are pre-defined for you and you cannot change them (they are in include/heapfile.h). For the next assignment, you will define your own error codes.

Test code

In hfp_driver.C are a number of functions that look like HfpDriver::testX(). sample_output contains correct output for exactly these tests (your correct submitted program should produce the same output). As you implement and test your code incrementally, you likely do not want to start by running all of these tests. What you will want to do is either comment out calls to them in TestDriver::runAllTests in test_driver.C, and add them back in one at a time as you add more functionality and/or comment out all or part of the body of the HfpDriver::testX functions, and as you add and test more functionality, uncomment more and more parts. You should additionally add test code to test conditions/cases that may not be fully tested in the test code I'm giving you.

When you submit your solution, test that your solution's output on the tests I give you is identical to my sample output (with the exception of page memory addresses which may vary); you should not submit a solution with extra debugging output.


What to Turn In

Create a submit subdirectory of your working directory, and copy into it all your source files that I need to compile, run and test your buffer manger solution (.C, .h, and Makefile). In this directory add a README file with the following information:

  1. Your name and your partner's name
  2. The number of late days that you have used so far
  3. A brief description of any features or error handling that you have not implemented in your submitted solution. In addition, if you submit an incomplete solution, it would be a good idea to tell us how to run and test your code for the parts that you did complete (you may want to submit a copy of your own test_driver.C file (name something like test_driver_mine.C) that contains your tests).

Then create a tar file of your submit subdirectory and submit it via cs44handin before the due date.