CS35 Lab 7: Morse Code Encoder

Due by 11:59 p.m., Friday, November 2, 2012 Written questions due in class, Thursday, November 1, 2012

Your work for this lab consists of two parts:

  1. Complete and test a templated implementation of a binary search tree, LinkedBST.
  2. Use the LinkedBST to build an efficient Morse code encoder for translating English phrases into Morse code.

As usual, you can get the initial files for this lab by running update35. When you run update35 you will obtain a large number of files for this lab, but you need to edit only three of them to complete this lab. The files that you will need to edit are highlighted below:

Below, we introduce the Pair class (which you do not need to modify) and describe the work you'll need to complete the LinkedBST implementation.



The Pair class

We have implemented a templated Pair class to store a pair of arbitrary-type data. You do not need to modify our implementation, but you will need to understand Pairs enough to return an iterator of key-value pairs for your tree traversals.

A Pair is essentially just a container to publicly store two pieces of data:

  template <typename F, typename S>
  class Pair {
    public:
      F first;
      S second;
      Pair(F f, S s);
  };

Because the data is publicly stored within the Pair, the data can be accessed and changed directly from any other scope of the program. For example:

  int main() {
    Pair<string,int> p("Hello", 42);
    cout << p.first << ", " << p.second << endl;
    p.first = "Bye!";
    cout << p.first << endl;
    return 0;
  }

The Pair class is useful for dictionary data structures, such as the LinkedBST, where we store a key-value pair for each piece of data.



Iterators

A common programming task is to execute some code for each item in a data structure. An iterator is an object that provides a consistent interface for traversing all data (essentially, providing a linear interpretation of a non-linear data structure).

In this program, you will use Queues as an iterator; you will enqueue key-value pairs using an appropriate traversal algorithm (e.g., in-order, pre-order, level-order, post-order). The idea is that the dequeue() function returns each item from the original tree once, and the isEmpty() function returns false until each item in the container has been returned with dequeue(). In normal implementations, a special Iterator class is used to perform these operations, but essentially it is a Queue. Iterators are widely used for many different types of data structures (such as hash tables and graphs, as we will see later).



A binary search tree implementation

Your main work for this lab is to complete and test the LinkedBST implementation. In linkedBST.h we have defined templated LinkedBST and BSTNode classes for you, and partially implemented these classes in linkedBST.inl. Each incomplete function in linkedBST.inl is marked with a TODO comment.

Our LinkedBST itself contains just two pieces of data:

Each BSTNode contains:

As described in lecture, most of the public functions defined by the BST interface are simple calls to private recursive helper functions, which perform the required operation on some subtree of the tree.

One peculiarity of our implementation is that our helper update functions (insertInSubtree and removeFromSubtree) return a pointer to root of the subtree they modified. This pointer is usually the same node as the node passed as an argument to the function (current), but can be a new node (for insertInSubtree) or a different previously-existing node (for removeFromSubtree) if their modification changes the root node of the subtree. This peculiarity greatly simplifies the binary search tree implementation because it allows a node's parent to easily maintain a correct tree structure if the modification changes the parent's child node.

Each tree traversal function returns a pointer to an iterator of all key-value pairs in the tree, in the order specified by the tree traversal. For a LinkedBST<K,V> storing keys of type K and values of type V, each key-value pair has type Pair<K,V>. This means that a pointer to an iterator of key-value pairs has type Queue< Pair<K,V> >*, a templated type of a templated type! (This is sometimes called a nested template type.) Be warned that g++ requires a space between the > for the inner template and the > for the outer template. If you forget the space you might get some error such as:

  testBST.cpp:42:  error: '>>' should be '> >' within a nested template 
  argument list
See the LinkedBST::getPostOrder function for an example of how to create our iterators for the tree traversals, and testBST.cpp for examples of how to use them.



Unit-testing with assert

assert(bool condition) is a C function that takes a boolean condition as an argument and causes your program to immediately crash if that boolean condition is false. (assert does nothing if that condition is true.) This is useful because it allows you to test parts of your program as you write, checking that the program values are what you expect when your program runs.

To use assert you must #include <assert.h>. See testBST.cpp for many examples of its use. You should add code to testBST.cpp as you implement LinkedBST features. Be warned that testBST already tests some LinkedBST features that you must implement; testBST will crash if you run it before implementing those features.



Encoding Morse Code

Inside morseCode.cpp, you will use LinkedBST to store an English to Morse code translator. Morse code was one of the first methods for long-distance electronic communication. It transmits text-based information as a series of on-off tones (or clicks). Each character in the alphabet translates into a series of "dots" (short pulses) and "dashes" (three times as long as a short pulse). For example, the distress signal "SOS" is communicated as "... --- ...". For this lab, you will create a simple program to translate an English phrase into Morse code.

To do this, you will represent the English alphabet in a binary search tree, with they key being the English letter and the data value being the translation of that letter into Morse code. While the size of data is small (you could just as easily use an array look-up table), this will provide a simple illustration of how to use a BST, while allowing us to ignore issues of an unbalanced tree for now.

You must use good top-level design for your program. The general outline of your task is as follows:

  1. Load and construct a English to Morse Code translation into a LinkedBST. You have been provided two different encoding files, which represent the same translation, but in a different order. See the Tips and Hints section for a description of the file format and tips on performing file input.
  2. Once you build the tree, you will read in a phrase from the user. (See the Tips and Hints for suggestions on handling multiple-word phrases using cin). For each character in the word, you will translate it into Morse code and output the final signal. Each character should be separated by a space and each word by a new line.
  3. Print out the height of the tree and number of nodes in the tree.
  4. Print out the key-value pairs as if you would encounter them in the tree, level by level (the details of how to do this are left to you. See sample output below. HINT: what kind of traversal is this?)
  5. Print out the translation dictionary encoded by the tree in alphabetical order.
  6. Any traversal of the tree must utilize one of the defined iterators for the LinkedBST

Here is one example execution of this program:

  $  ./morseCode 

Enter filename for English to Morse code translator: testFiles/encoding1
Enter phrase: My name is Inigo Montoya
The encoding is: 
-- -.-- 
-. .- -- . 
.. ... 
.. -. .. --. --- 
-- --- -. - --- -.-- .- 

***************************************

The height of the tree is: 25
There are 26 nodes in the tree


Translator in alphabetical order: 
A | .-
B | -...
C | -.-.
D | -..
E | .
F | ..-.
G | --.
H | ....
I | ..
J | .---
K | -.-
L | .-..
M | --
N | -.
O | ---
P | .--.
Q | --.-
R | .-.
S | ...
T | -
U | ..--
V | ...-
W | .--
X | -..-
Y | -.--
Z | --..


Tree in order by level
A | .-
B | -...
C | -.-.
D | -..
E | .
F | ..-.
G | --.
H | ....
I | ..
J | .---
K | -.-
L | .-..
M | --
N | -.
O | ---
P | .--.
Q | --.-
R | .-.
S | ...
T | -
U | ..--
V | ...-
W | .--
X | -..-
Y | -.--
Z | --..

In your testFiles directory, I have provided two simple input files (test0.in, test1.in). You should come up with more test cases to thoroughly test your program



Tips and Hints

File Input

You will need to read in the English-to-Morse code translation dictionary from file. We used file input in the DNA simulation lab, so be sure to look at your solution there. But in general, you will need to do the following:
  1. Include the fstream library at the top of you main program
      #include <fstream>
      
  2. Declare an ifstream object and open a file in read mode ( filename is a string variable). You should check to make sure the file actually opened.
      ifstream infile;
      infile.open(filename.c_str(), ifstream::in);
      if( !infile.is_open() ){
        //handle error here
        return;
      }
      
  3. Read information from the file much as you would using cin. The file format is as follows: one line per character-code pair, with the character and code separated by a space. E.g.
      A .-
      B -...
      
    You should be able to read these separately using the >> operator E.g.,
      char letter;
      string code;
    
      infile >> letter >> code; //reads one line from the file
      
  4. the eof() method of infile returns true when there is no more data left to read in the file. You should use this to determine when you are done reading in information. E.g.
      infile.eof()  //is true if the file is done
      
    Moreover, eof() does not return true after reading the last line. It returns true after you try reading past the end of the file. For example, a file containing:
      A
      B
      C
      
    eof doesn't get set to true after the third read (i.e., when it reads C). It is set to true the fourth time you read from the file. To utilize this, you should put the reading as the last line of the while loop. If you do not, your loop will go an extra iteration, trying to insert the C a second time. You will then do a priming read before the loop. In other words:
      infile >> letter;
      while(!infile.eof()){
        //Do stuff with letter
        //Etc,
        //This is the last line of loop:
        infile >> letter;
      }
      

Reading in the user phrase

One downside with the >> operator is that it reads in information until it reaches the first white space character. So for the entry:
Hello world
The code:
string sentence;
cin >> sentence;
will only read the first word, "Hello", into sentence since it hits a whitespace and returns. "world" will stay on the buffer and be read the next time you use >>. You could get around this by using the getline() method. For example:
string sentence;
getline(cin, sentence);
will read the entire entry until it hits a newline character i.e., sentence will have the value "Hello world". Note: if you use getline, you must use it throughout your program when reading from cin; mixing in the >> operator may cause probems as getline won't wait for a user entry.

String operations

For many of you, this is the first time working in detail with strings. Strings are an object in C++, and work very similarly to strings in Python. You can access individual characters of a string by using the indexing operator []. For example sentence[0] will return the first character of the string sentence as a char. Note, that in in C++ chars are different than strings. For example, to see if the first character in the string is the letter A you would do:
if(sentence[0] == 'A'){
  //Do something
}
Notice the single quotation marks around A signifiying it is a char. Double quotation marks would make it a c-string.

The string class has many important methods that you fill find useful, including length() which returns the length of the string. E.g.,

int l = phrase.length();
There are other useful functions you can read about here

Modifying linkedBST.h

You should not need to modify the declaration for LinkedBST. The allowed exceptions are only to create additional private methods

Misc



Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.



Analysis Questions
  1. A property of binary trees is that knowing the pre-order and in-order traversal of a tree, the original tree structure can be inferred exactly.
    • Given the following pre-order and in-order traversals of a tree, reconstruct the original tree:
      PRE-ORDER: E,X,A,M,F,U,N
      IN-ORDER: M,A,F,X,U,E,N
      
    • List all of the leaves of the tree
    • List all nodes with depth 2
    • What is the height of the tree?

  2. For the following applications, briefly explain which of the 4 traversal algorithms in class would best provide a solution
    • Given the hierarchical outline for a book (in tree format), print the Table of Contents. For example, the root of the tree would be the title, all Level 1 nodes are the chapters, all Level 2 nodes are the sections of the respective chapters.

    • For each directory on a disk, calculate the amount of disk usage. For, example, running the du command on your labs directory produces:
      $ du -h cs35/labs/
      472K	cs35/labs/06
      6.0M	cs35/labs/01/input
      6.0M	cs35/labs/01
      2.2M	cs35/labs/04
      44K	cs35/labs/03
      48K	cs35/labs/02
      16K	cs35/labs/05/inputs
      120K	cs35/labs/05
      28K	cs35/labs/07/library
      20K	cs35/labs/07/testFiles
      304K	cs35/labs/07
      9.2M	cs35/labs/
      
    • Any mathematical formula can be set up as a parse tree, where the internal nodes are operators (e.g., *,+,-,/) and the leaves are the operands. For example, the formula (3*5+1)*2 is the tree:
              *
            /   \
           +      2
          / \
         *   1
        / \
       3   5
      
      What traversal would produce the correct order of operations?