CS35 Lab 7: A Binary Search Tree

Due by 11:59 p.m., Wednesday, October 26, 2011

Your work for this lab consists of two parts:

  1. Complete and test a templated implementation of a binary search tree, LinkedBst.
  2. Use the LinkedBst to analyze the word frequencies of input text.

As usual, you can get the initial files for this lab by running update35. When you run update35 you will obtain a large number of files for this lab, but you need to edit only three of them to complete this lab. The files that you will need to edit are highlighted below:

Below we introduce the Pair class, Iterator class, and QueueBasedIterator implementation; you'll need to understand -- but not modify -- these classes to complete your lab. Below that introduction we describe the work you'll need to complete the LinkedBst implementation.



The Pair class

We have implemented a templated Pair class to store a a pair of arbitrary-type data. You do not need to modify our implemenation, but you will need to understand Pairs enough to return an iterator of key-value pairs for your tree traversals.

A Pair is essentially just a container to publicly store two pieces of data:

  template <typename F, typename S>
  class Pair {
    public:
      F first;
      S second;
      Pair(F f, S s);
  };

Because the data is publicly stored within the Pair, the data can be accessed and changed directly from any other scope of the program. For example:

  int main() {
    Pair<string,int> p("Hello", 42);
    cout << p.first << ", " << p.second << endl;
    p.first = "Bye!";
    cout << p.first << endl;
    return 0;
  }



The Iterator interface and QueueBasedIterator implementation

A common programming task is to execute some code for each item in a data structure. An iterator is an object that provides a consistent interface for traversing all data from some container class. (A container is just a generic term for some class that contains other data. Lists, stacks, queues, and trees are all examples of container classes.)

We have defined one possible Iterator interface and a generic queue-based implementation, QueueBasedIterator, of that interface. Our Iterator interface is:

  template <typename T>
  class Iterator {
    public:
      bool hasNext();
      T    getNext();
  };
The idea is that the getNext() function returns each item from the container once, and the hasNext() function returns true until each item in the container has been returned with getNext().

Suppose that our LinkedList class included a getIterator() function that returned an iterator for the list. (Our actual implementation did not include this function.) You would then be able to use the iterator as:

  int main() {
    LinkedList<int> myList;
    // ... insert some items into the list.

    Iterator<int> iter = myList.getIterator();
    while (iter.hasNext()) {
      int someItem = iter.getNext();
      // execute some code for the current item, someItem...
    }
    return 0;
  }
to execute some arbitrary code for each item in the list. See testBst.cc for some example uses of the LinkedBst iterators.

The QueueBasedIterator is a queue-based implementation (surprise!) of the Iterator interface. In addition to the required iterator methods it also provides an insert(T item) method, which allows an arbitrary container to insert items into the iterator, to be accessed later with the getNext() function. Your LinkedBst traversal functions should use insert to place items into an iterator. Your testBst.cc and wordCount.cc programs should use hasNext() and getNext() to access the items in the iterator.



A binary search tree implementation

Your main work for this lab is to complete and test the LinkedBst implementation. In linkedbst.h we have defined templated LinkedBst and LinkedBstNode classes for you, and partially implemented these classes in linkedbst.inl. Each incomplete function in linkedbst.inl is marked with a TODO comment.

Our LinkedBst itself contains just two pieces of data:

Each LinkedBstNode contains:

As described in lecture, most of the public functions defined by the Bst interface are simple calls to private recursive helper functions, which perform the required operation on some subtree of the tree.

One peculiarity of our implementation is that our helper update functions (insertInSubtree and removeFromSubtree) return a pointer to root of the subtree they modified. This pointer is usually the same node as the node passed as an argument to the function (current), but can be a new node (for insertInSubtree) or a different previously-existing node (for removeFromSubtree) if their modification changes the root node of the subtree. This peculiarity greatly simplifies the binary search tree implementation because it allows a node's parent to easily maintain a correct tree structure if the modification changes the parent's child node.

Each tree traversal function returns a pointer to an iterator of all key-value pairs in the tree, in the order specified by the tree traversal. For a LinkedBst<K,V> storing keys of type K and values of type V, each key-value pair has type Pair<K,V>. This means that a pointer to an iterator of key-value pairs has type Iterator< Pair<K,V> >*, a templated type of a templated type! (This is sometimes called a nested template type.) Be warned that g++ requires a space between the > for the inner template and the > for the outer template. If you forget the space you might get some error such as:

  testBst.cc:42:  error: '>>' should be '> >' within a nested template 
  argument list
See the LinkedBst::getPostOrderIterator function for an example of how to create our iterators for the tree traversals, and testBst.cc for examples of how to use them.



Unit-testing with assert

assert(bool condition) is a C function that takes a boolean condition as an argument and causes your program to immediately crash if that boolean condition is false. (assert does nothing if that condition is true.) This is useful because it allows you to test parts of your program as you write, checking that the program values are what you expect when your program runs.

To use assert you must #include <assert.h>. See testBst.cc for many examples of its use. You should add code to testBst.cc as you implement LinkedBst features. Be warned that testBst already tests some LinkedBst features that you must implement; testBst will crash if you run it before implementing those features.



Computing word-count frequencies

Inside wordCount.cc, implement a program that reads text input and uses a LinkedBst to count the number of occurences of each distinct word. After all words from the input have been read, use one of the LinkedBst iterators to print every word (in alphabetical order) and its frequency.

Here is one example execution of this program:

  $  ./wordCount
  this is a test
  of this    program to see
  if this program behaves like 
  a nice program    <CTRL-D>

  a: 2
  behaves: 1
  if: 1
  is: 1
  like: 1
  nice: 1
  of: 1
  program: 3
  see: 1
  test: 1
  this: 3
  to: 1

  $ 

You may assume that input words do not contain punctuation and are all lower case. Notice that this program reads whitespace-delimited words until the text input ends with an end-of-file (caused by the user typing CONTROL-D). For an example of another program that does this, see
cs35/f11/library/inclass/w04/arraylist/reverse.cc.



Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.