CS35 Lab 7: A Binary Search Tree

Due by 11:59 p.m., Wednesday, March 21, 2012 Written questions due in lab, Friday, March 23, 2012

Your work for this lab consists of two parts:

  1. Complete and test a templated implementation of a binary search tree, LinkedBst.
  2. Use the LinkedBst to build an efficient Morse code encoder for translating English phrases into Morse code.

As usual, you can get the initial files for this lab by running update35. When you run update35 you will obtain a large number of files for this lab, but you need to edit only three of them to complete this lab. The files that you will need to edit are highlighted below:

Below we introduce the Pair class, Iterator class, and QueueBasedIterator implementation; you'll need to understand -- but not modify -- these classes to complete your lab. Below that introduction we describe the work you'll need to complete the LinkedBst implementation.



The Pair class

We have implemented a templated Pair class to store a a pair of arbitrary-type data. You do not need to modify our implemenation, but you will need to understand Pairs enough to return an iterator of key-value pairs for your tree traversals.

A Pair is essentially just a container to publicly store two pieces of data:

  template <typename F, typename S>
  class Pair {
    public:
      F first;
      S second;
      Pair(F f, S s);
  };

Because the data is publicly stored within the Pair, the data can be accessed and changed directly from any other scope of the program. For example:

  int main() {
    Pair<string,int> p("Hello", 42);
    cout << p.first << ", " << p.second << endl;
    p.first = "Bye!";
    cout << p.first << endl;
    return 0;
  }



The Iterator interface and QueueBasedIterator implementation

A common programming task is to execute some code for each item in a data structure. An iterator is an object that provides a consistent interface for traversing all data from some container class. (A container is just a generic term for some class that contains other data. Lists, stacks, queues, and trees are all examples of container classes.)

We have defined one possible Iterator interface and a generic queue-based implementation, QueueBasedIterator, of that interface. Our Iterator interface is:

  template <typename T>
  class Iterator {
    public:
      bool hasNext();
      T    getNext();
  };
The idea is that the getNext() function returns each item from the container once, and the hasNext() function returns true until each item in the container has been returned with getNext().

Suppose that our LinkedList class included a getIterator() function that returned an iterator for the list. (Our actual implementation did not include this function.) You would then be able to use the iterator as:

  int main() {
    LinkedList<int> myList;
    // ... insert some items into the list.

    Iterator<int> iter = myList.getIterator();
    while (iter.hasNext()) {
      int someItem = iter.getNext();
      // execute some code for the current item, someItem...
    }
    return 0;
  }
to execute some arbitrary code for each item in the list. See testBst.cpp for some example uses of the LinkedBst iterators.

The QueueBasedIterator is a queue-based implementation (surprise!) of the Iterator interface. In addition to the required iterator methods it also provides an insert(T item) method, which allows an arbitrary container to insert items into the iterator, to be accessed later with the getNext() function. Your LinkedBst traversal functions should use insert to place items into an iterator. Your testBst.cpp and morseCode.cpp programs should use hasNext() and getNext() to access the items in the iterator.



A binary search tree implementation

Your main work for this lab is to complete and test the LinkedBst implementation. In linkedbst.h we have defined templated LinkedBst and LinkedBstNode classes for you, and partially implemented these classes in linkedbst.inl. Each incomplete function in linkedbst.inl is marked with a TODO comment.

Our LinkedBst itself contains just two pieces of data:

Each LinkedBstNode contains:

As described in lecture, most of the public functions defined by the Bst interface are simple calls to private recursive helper functions, which perform the required operation on some subtree of the tree.

One peculiarity of our implementation is that our helper update functions (insertInSubtree and removeFromSubtree) return a pointer to root of the subtree they modified. This pointer is usually the same node as the node passed as an argument to the function (current), but can be a new node (for insertInSubtree) or a different previously-existing node (for removeFromSubtree) if their modification changes the root node of the subtree. This peculiarity greatly simplifies the binary search tree implementation because it allows a node's parent to easily maintain a correct tree structure if the modification changes the parent's child node.

Each tree traversal function returns a pointer to an iterator of all key-value pairs in the tree, in the order specified by the tree traversal. For a LinkedBst<K,V> storing keys of type K and values of type V, each key-value pair has type Pair<K,V>. This means that a pointer to an iterator of key-value pairs has type Iterator< Pair<K,V> >*, a templated type of a templated type! (This is sometimes called a nested template type.) Be warned that g++ requires a space between the > for the inner template and the > for the outer template. If you forget the space you might get some error such as:

  testBst.cpp:42:  error: '>>' should be '> >' within a nested template 
  argument list
See the LinkedBst::getPostOrderIterator function for an example of how to create our iterators for the tree traversals, and testBst.cpp for examples of how to use them.



Unit-testing with assert

assert(bool condition) is a C function that takes a boolean condition as an argument and causes your program to immediately crash if that boolean condition is false. (assert does nothing if that condition is true.) This is useful because it allows you to test parts of your program as you write, checking that the program values are what you expect when your program runs.

To use assert you must #include <assert.h>. See testBst.cpp for many examples of its use. You should add code to testBst.cpp as you implement LinkedBst features. Be warned that testBst already tests some LinkedBst features that you must implement; testBst will crash if you run it before implementing those features.



Encoding Morse Code

Inside morseCode.cpp, you will use LinkedBst to store an English to Morse code translator. Morse code was one of the first methods for long-distance electronic communication. It transmits text-based information as a series of on-off tones (or clicks). Each character in the alphabet translates into a series of "dots" (short pulses) and "dashes" (three times as long as a short pulse). For example, the distress signal "SOS" is communicated as "... --- ...". For this lab, you will create a simple program to translate an English phrase into Morse code.

To do this, you will represent the English alphabet in a binary search tree, with they key being the English letter and the data value being the translation of that letter into Morse code. While the size of data is small (you could just as easily use an array look-up table), this will provide a simple illustration of how to use a BST, while allowing us to ignore issues of an unbalanced tree for now.

You must use good top-level design for your program. The general outline of your task is as follows:

  1. Load and construct a English to Morse Code translation into a LinkedBST. You have been provided two different encoding files, which represent the same translation, but in a different order. See the Tips and Hints section for a description of the file format and tips on performing file input.
  2. Once you build the tree, you will read in a phrase from the user. (See the Tips and Hints for suggestions on handling multiple-word phrases using cin). For each character in the word, you will translate it into Morse code and output the final signal. Each character should be separated by a space and each word by a new line.
  3. Print out the height of the tree and number of nodes in the tree.
  4. Print out the key-value pairs as if you would encounter them in the tree, level by level (the details of how to do this are left to you. See sample output below. HINT: what kind of traversal is this?)
  5. Print out the translation dictionary encoded by the tree in alphabetical order.
  6. Any traversal of the tree must utilize one of the defined iterators for the LinkedBst

Here is one example execution of this program:

  $  ./morseCode 

Enter filename for English to Morse code translator: encoding1
Enter phrase: My name is Inigo Montoya
The encoding is: 
-- -.-- 
-. .- -- . 
.. ... 
.. -. .. --. --- 
-- --- -. - --- -.-- .- 

***************************************

The height of the tree is: 25
There are 26 nodes in the tree


Translator in alphabetical order: 
A | .-
B | -...
C | -.-.
D | -..
E | .
F | ..-.
G | --.
H | ....
I | ..
J | .---
K | -.-
L | .-..
M | --
N | -.
O | ---
P | .--.
Q | --.-
R | .-.
S | ...
T | -
U | ..--
V | ...-
W | .--
X | -..-
Y | -.--
Z | --..


Tree in order by level
A | .-
B | -...
C | -.-.
D | -..
E | .
F | ..-.
G | --.
H | ....
I | ..
J | .---
K | -.-
L | .-..
M | --
N | -.
O | ---
P | .--.
Q | --.-
R | .-.
S | ...
T | -
U | ..--
V | ...-
W | .--
X | -..-
Y | -.--
Z | --..

In your testFiles directory, I have provided two simple input files (test0.in, test1.in) with their respective ouputs (test0.out, test1.out). You should think of more tests to be thorough.



Tips and Hints

File Input

You will need to read in the English-to-Morse code translation dictionary from file. We used file input in the DNA simulation lab, so be sure to look at your solution there. But in general, you will need to do the following:
  1. Include the fstream library at the top of you main program
      #include <fstream>
      
  2. Declare an ifstream object and open a file in read mode ( filename is a string variable). You should check to make sure the file actually opened.
      ifstream infile;
      infile.open(filename.c_str(), ifstream::in);
      if( !infile.is_open() ){
        //handle error here
        return;
      }
      
  3. Read information from the file much as you would using cin. The file format is as follows: one line per character-code pair, with the character and code separated by a space. E.g.
      A .-
      B -...
      
    You should be able to read these separately using the >> operator E.g.,
      char letter;
      string code;
    
      infile >> letter >> code; //reads one line from the file
      
  4. the eof() method of infile returns true when there is no more data left to read in the file. You should use this to determine when you are done reading in information. E.g.
      infile.eof()  //is true if the file is done
      

Reading in the user phrase

One downside with the >> operator is that it reads in information until it reaches the first white space character. So for the entry:
Hello world
The code:
string sentence;
cin >> sentence;
will only read the first word, "Hello", into sentence since it hits a whitespace and returns. "world" will stay on the buffer and be read the next time you use >>. You could get around this by using the getline() method. For example:
string sentence;
getline(cin, sentence);
will read the entire entry until it hits a newline character i.e., sentence will have the value "Hello world". Note: if you use getline, you must use it throughout your program when reading from cin; mixing in the >> operator may cause probems as getline won't wait for a user entry.

Modifying linkedbst.h

You should not need to modify the declaration for LinkedBst. The allowed exceptions: The last exception is if you want to use pass-by-value to send your Morse code binary search tree to a function. I recommend allocating your tree on the heap instead and passing a pointer to the tree to any functions you have.

Misc



Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.



Analysis Questions

Once you have completed the programming portion of the lab, consider the following questions. Please hand in your solutions in lab on Friday.

  1. What was the total height for a tree using encoding1? How about for encoding2?

  2. For each encoding file, draw the tree as it would appear after a few insertions. You don't need to insert all 26 codes, but insert at least the first 8 lines. Your illustration only needs to show the key for each node; you can ignore the values. Use the convetions show in lecture.

  3. Briefly indicate if there is a difference in the efficiency between the two trees. Justify your answer in a sentence or two (HINT: what property do we seek in a tree)

  4. It is well known that certain letters in the alphabet are used more commonly then others. In a few sentences, discuss at a high-level how you might account for this in constructing a better Morse code tree. Where would you want popular letters to be? How could you modify our algorithm to accomplish this goal without changing the definition of our binary search tree implementation?