CS35: Data Structures and Algorithms

Lab 06 Notes

Lab 06 Agenda

Clone lab06

This is the same as previous labs. Remember to run ssh-add when you log in, so you don't get repeatedly asked for you ssh password. Go to the CS35 github org, to find the clone link for your repo on the web interface.
$ cd ~/cs35/labs
$ git clone <link you got from github> ./lab06

Setup Symbolic Link

Just like in previous labs, we establish a symbolic link to a shared directory that contains files to test your program. You can create the link by executing the following commands:
$ cd ~/cs35/labs/lab06
$ ln -s /usr/local/doc/lab06-data/ ./test_data

Note the path /usr/local/doc/lab06-data/ is local only to the machines on the CS network and will not work if you clone your code to your personal computer.

BST Remove
Removing an element from a Binary Search Tree can sometimes be tricky. Insertion always happens at a leaf, but sometimes we want to remove a key that is in the middle or even at the root of the tree. Below is a pseudocode sketch of a BST removal algorithm. Much like the insert case, we will offload most of the work to a recursive helper function that returns the root of a new subtree.
void remove(key):
  /* Easy! */
  root = removeFromSubtree(root, key);
What does this recursive helper function look like? To remove a key, we first must find it. If it isn't there, we can't do much besides throw an exception, so let's focus first on finding the appropriate BST node that contains the key.

BSTNode removeFromSubtree(current, key):

  if current == nullptr:
    throw error("key not found in remove")

  if key < current->key:
    current->left = removeFromSubtree(current->left, key)
    return current
  else if key > current->key:
    current->right = removeFromSubtree(current->right, key)
    return current
  else:
    /* we found the key */
    /* TODO: remove this key at current node */
Once we found the key, we must be careful about how we remove the current node containing the key. Since current is BST Node, it can have zero, one, or two children. We will look at each of these cases separately. With some careful thought, some of these cases can be combined to reduce code duplication.

Consider first a leaf node with zero children. Removing such a node is easy. We simply delete the node, decrement the size of the entire BST, and return nullptr as the root of the new, now empty subtree.

   if current has no children:
     delete current
     size--
     return nullptr

If a node to be removed has only one child, we can replace the entire subtree rooted at the current node with the subtree rooted at the non-empty child without affecting the binary search tree property. Suppose, e.g., that the left child is not non-empty. We could do the following:

   if current->right==nullptr and current->left != nullptr:
     newSubtreeRoot = current->left
     delete current
     size--
     return newSubtreeRoot
Handling the case where the left is empty but the right is not is symmetric.

We've delayed long enough. We must finally deal with the tricky case of removing a key in a node with two children. We cannot simply cut out the node as there is nowhere to connect two extra children and preserve the BST property without making significant changes to the overall tree. Instead what we will do, is find a substitute key to replace the key we are removing. A good substitute key should be close to the original key so that replacing the removed key with the substitute will preserve the BST property. There are two valid choices for the substitute key, but to keep the algorithm clear, we will select one: the predecessor of the removed key. The predecessor of a key $k$ is the largest key that is strictly smaller than $k$. This predecessor can be found by searching for the maximum key in the left subtree of the current node.

  if current has two non-empty children:
     /* find substitute key in left subtree */
     kmax = MaxInSubtree(current->left)
     value = find(kmax)
     /* replace current key/value with substitute */
     current->key = kmax
     current->value = value

     /* but now kmax is in two places, remove duplicate */
     current->left =
        removeFromSubtree(current->left, kmax)
     return current
     /* we don't need to delete current or decrement size here, the recursive
        call will handle this */