CS35 Lab 9B: AVL Trees

Due by 11:59 p.m., Tuesday, April 1, 2014
Your work for this lab consists of two parts:
  1. Complete and test a templated implementation of an AVL tree, AVLTree.
  2. Extend your plagiarism detector for Part A to use your AVLTree to build an index of essays and detect plagiarised documents.
While implementing AVLTree sounds lengthy, 95% was already done in last week's lab. All non-insert and remove methods work the same when comparing a LinkedBST to AVLTree, and we've seen pseudocode for most of the other functions. Below, I will detail the modifications needed to maintain the balanced AVL property. At a high level, you will need to accomplish the following:
  1. Initial setup - get and create files to for your AVLTree implementation
  2. Modify insert/remove - modify insertInSubtree and removeFromSubtree to balance
  3. Implement balancing and rotations - store node heights, detect imbalances, and implement rotations
  4. Test your AVL Tree - add tests for balancing to testBST
  5. Extend cheatDetector - add AVLTree option to your main program; run benchmark tests.

Introduction

The AVLTree is very similar to the LinkedBST from Lab 08, except it maintains the AVL balance property to guarantee that the tree is balanced. (The AVL balance property is that, for each node in the tree, the height of that node's left and right subtrees differs by at most one.)

To efficiently maintain the AVL property, each AVLTreeNode stores its current height in addition to the other data stored by a LinkedBSTNode. In sum, each AVLTreeNode contains:

Our algorithms define an empty child node to have a height of -1 for simplicity, but an empty child node is not an actual node. It is the pointer NULL. You will need to be careful to check that children nodes are not NULL before attempting to access information such as a child node's height or children.

To maintain a roughly O(lg n) tree height, you will extend your LinkedBST implementation to detect imbalances after insertions and removes and then use one of four tree rotation methods to fix the imbalance.


Getting started

TODO:

First, run update35. This should add the following files:

Neither of these files needs modifications. Instead, all of you work will be done in AVLTree-private-inl.h. Since the vast majority of methods will be the same as your LinkedBST private methods, you'll use your linkedBST-private-inl.h file as a starting point. To do so, do the following:
  1. Copy your linkedBST-private-inl.h file:
    $ cp linkedBST-private-inl.h AVLTree-private-inl.h
    
  2. Replace all instances of BSTNode with AVLTreeNode and LinkedBST with AVLTree. This is done using search and replace. Each editor has a built in way of doing this, but we can do it easily with a linux program called sed:
    $ sed -i 's/BSTNode/AVLTreeNode/g' AVLTree-private-inl.h 
    $ sed -i 's/LinkedBST/AVLTree/g' AVLTree-private-inl.h 
    

Since you are using your LinkedBST as your starting point, you are responsible for ensuring you correct all errors from Lab 8. Any lingering errors from Lab 8 will lead to further deductions in Lab 9.

Lastly, open up your Makefile and remove the comment # on line 8 so that the AVLTree files will be included in compilation.


Inserts/Removes in AVLTree

TODO:

To insert in an element an AVL Tree, we follow the normal protocol. As we saw in class, the insertInSubtree method should handle the 4 cases (i.e., base case, duplicate key, smaller key, larger key) for recursion the same as LinkedBST. Once that work is complete, current is the root of a sub-tree that may now be unbalanced and require a rotation. This check must be done at every call since the imbalance could be detected at any step back up the tree. Your overall method should follow this protocol:

  1. Handle the 4 cases the same as LinkedBST
  2. Before returning:
    1. Calculate height of node
    2. Check for imbalances
    3. Fix imbalances with rotations
  3. Step 2 may have resulted in a new root for the sub-tree. Return the root of the sub-tree to the calling function
Implementing this should take 1 to 2 lines of code at most. You completed #1 already from Lab 8. #2 should be handled by another function (balance()) specified below. #3 should take the value returned by balance and return it.

removeFromSubtree follows the same outline. Before every return statement, you should call balance to handle all of the work of updating heights and checking for imbalances.


Balancing and rotations

TODO:

To complete the AVLTree, you should complete each of the rotation functions discussed in class. Their prototypes can be found in AVLTree.h file. There is one rotation function for each of the four cases discussed in class, and each returns the new root of the sub-tree to the calling function:

Each rotation should do the following:

These functions are called by the balance function (which is in turn called by insertInSubtree and removeFromSubtree). balance does much of the book-keeping to maintain the AVL property. To complete this method, you should take the following steps:

  1. If current is NULL, return current. There is no work to do.
  2. Update current's height. You should implement this routine in computeHeightFromChildren. Remember to account for potential empty children (NULL).
  3. Check for a left imbalance (left child is 2 taller than right)
    • If so, check which of left's children is taller. Call either rightRotate (left grandchild is bigger) or leftRightRotate (right grandchild is bigger)
  4. Check for a right imbalance (right child is 2 taller than left)
    • If so, check which of right's children is taller. Call either leftRotate (right grandchild is bigger) or rightLeftRotate (left grandchild is bigger)
  5. Return the root of the sub-tree. Note that it may have changed.


Unit-testing the AVLTree

You should test the AVLTree implementation using the testBST program from last week. This should stress test the AVLTree implementation for a large number of random insertions and deletions. In addition, your tests should confirm that each of your rotation functions behaves as expected for known examples of the tree. While you cannot access private methods, you can come up with specific examples that guarantee certain rotations will occur and then use public methods such getHeight() or getPreOrder() to inspect the contents of the tree.


Extending cheatDetector

Return to the write-up for Part A and ensure that your cheatDetector can work with either LinkedBSTs or AVLTrees. The details are found in that write-up. Once that is complete, run your program with both and pay attention to the performance. How do the heights of trees change by using balancing? You will need to answer a few questions for the README.


Extra Credit

At this point, you have seen two different BST implementations. Your task for the extra credit is to design your own BST implementation that optimizes the overall runtime of your cheatDetector program. Can you create a balanced BST that has lower average height than your AVLTree? How does the actual running time compare? Can you create a balanced BST implementation with higher average height, but lower overall running time?

The metrics for comparison are intentionally vague. At a minimum, you have a nontrivial program (cheatDetector) that uses BSTs. cheatDetector runs for long enough that it's worth trying to optimize runtime. One way to try to do that might be to pay close attention to the average height of the tree, but there are other ways.

When you hand in your lab, you should also add a section to the README file stating that you're doing the extra credit portion. You should also have a brief (1-2 paragraphs) description of how your BST implementation differs from the two we've seen in class.

As always, doing the extra credit portion is 100% optional, and the gain you get in terms of your final grade is not proportional to what you get by putting effort into the main part of the lab.


Submitting your work

Submit your solution with the first part using handin35