CS35 Lab 7: Binary Search Trees

Due by 11:59pm Sunday, November 1, 2015

Over the next few weeks, you will implement a multi-part lab that will create a plagiarism detector. Given a set of documents (e.g., essays from a high school English class), your algorithm will identify significant matches between pairs of documents that indicate shared use of the same material. Some of the documents may be source documents (e.g., a Wikipedia entry) while others may be actual essays.

This will require a fast dictionary implementation. This week, you'll concentrate on creating a well-tested binary search tree implementation.

To get started, clone lab7-<partner-names>.

Files: There are a few different files involved this week:

Implementation details

The main work for this lab is to implement and test the LinkedBST class. Most, if not all, of your implementation work will happen in linkedBST-private-inl.h.

Templates: In these classes, the keys and values are both templated. For this lab, you only need to worry about testing with string keys. We'll address more general key types in the future.

The LinkedBST itself tracks just two pieces of data:

The BSTNode class is as we declared it in lecture, except templated.

Pairs: We have implemented a templated Pair class to store a pair of arbitrary-type data. You do not need to modify our implementation, but you will need to understand Pairs enough to return an iterator of key-value pairs for your tree traversals.

A Pair is essentially just a container to publicly store two pieces of data:

  template <typename F, typename S>
  class Pair {
    public:
      F first;
      S second;
      Pair(F f, S s);
  };

Because the data is publicly stored within the Pair, the data can be accessed and changed directly from any other scope of the program. For example:

  int main() {
    Pair<string,int> p("Hello", 42);
    cout << p.first << ", " << p.second << endl;
    p.first = "Bye!";
    cout << p.first << endl;
    return 0;
  }

The Pair class is useful for dictionary data structures, such as the LinkedBST, where we store a key-value pair for each piece of data.



Traversals

Tree traversals turn a tree structure into a linear one, enforcing a particular order on the elements. You will implement four different traversals on binary search trees for this assignment, with each traversal returning a Queue containing Pairs of the keys and values, in order.

An example tree, and the definitions of the orders, are below. Note that the example just shows keys; your traversal should include Pairs of keys and values in the resulting Queue, not just the keys.

      F
    /   \
   /     \
  C       H
 / \     / \
A   D   G   I

Each tree traversal function returns a pointer to an iterator of all key-value pairs in the tree, in the order specified by the tree traversal. For a LinkedBST<K,V> storing keys of type K and values of type V, each key-value pair has type Pair<K,V>. This means that a pointer to an iterator of key-value pairs has type Queue< Pair<K,V> >*, a templated type of a templated type! (This is sometimes called a nested template type.) Be warned that the compiler requires a space between the > for the inner template and the > for the outer template. If you forget the space you might get some error such as:

  testBST.cpp:42:  error: '>>' should be '> >' within 
    a nested template argument list
Wikipedia has some nice pictures of tree traversals for reference (and you can also derive some useful test cases from it). Note that our program isn't "displaying" the elements as that article suggests, but rather adding them to a Queue.



Handing In

As always, make sure to run git push by the deadline. Also, please make sure the code you want graded is on the master branch.