CS35 Lab 08: Scrabble Assistant

Due 11:59pm Wednesday, November 29 2017

The main goals for this lab are to gain experience working with hash tables and to evaluate real-world data structure performance by running experiments. To that end, you will:

  1. Implement a hash table using separate chaining.
  2. Apply your hash table implementation to create a simple program to assist Scrabble players.
  3. Compare the performance of your hash table implementation to that of a BST by running a series of experiments.

As with most assignments for the rest of the semester, you will be working with a partner. Both partners should be present and working on the code together. You will both be responsible for understanding all concepts, so dividing and conquering is not an option. The academic integrity policy applies to the entire pair; you cannot work or share code with anyone outside your partner.

You and your lab partner will share the same git repository, which is named lab08-<user1>-<user2>.git. Please access your lab by cloning the appropriate git repository, e.g.:

$ git clone git@github.swarthmore.edu:CS35-F17/lab08-mgagne1-adanner1.git
You can also get the ssh git url from CS35 github org.

Starting Code

Below is an overview of the files required for submitting this lab. Those highlighted in blue will require implementation on your part. Those highlighted in black are complete and should not be modified except for comments.

Part I: Hash Table Implementation.
For the first part of this lab assignment, you will implement a hash table, by modifying hashTable.h and hashTable-inl.h. Unit tests have been provided; once your hash table implementation is complete, these unit tests should pass. Below are some implementation details and suggestions:

Expanding Capacity

If the load factor ever exceeds the maximum load factor of your hash table, you will need to expand the table. This is conceptually similar to expanding capacity for ArrayList or a BinaryHeap. There are a couple of differences. One is that when to expand differs here -- it depends on the load factor, not on whether the array is full. A more subtle difference is that it's not enough to just copy table entries over. Each key/value pair in the hash table needs to be rehashed, because which bucket it goes into depends on the capacity of the table itself! Below we itemize what tasks need to be done to properly expand capacity. Remember, your hash table should never have load factor higher than the maximum load factor. As soon as this happens, you should expand capacity.

Removing From a vector

The vector class provides a means to remove an element from the end of a list (pop_back) but doesn't give a clear interface for removing elements from the middle. We suggest two possible approaches.

In the first approach, you can use pop_back, by swapping the item at the end of the vector with the item at index i that you want to remove. Then simply call pop_back to remove the item that is now at the end.

The second approach uses some advanced features of the STL vector class. Vectors to provide a general method called erase which can delete entire contiguous regions:

vector myVector = ...;
myVector.erase(myVector.begin()+start_index, myVector.begin()+end_index);
The above function works much like a Python slice: the start index is included, but the end index is not, e.g., erasing with start/end indices (1,3) would remove the second and third indices (i.e. indices 1 and 2) from the vector. If you want to remove a single element from the vector, you can do something like:
vector myVector = ...; // whatever defines/initializes the vector
int index = ...;            // the index of the element you want to remove
myVector.erase(myVector.begin() + index, myVector.begin() + index+1);
For more information, consult the vector documentation in cplusplus.com.
Part II: Scrabble Assistant.

Once you have a working hash table, you can start writing your Scrabble Assistant. We'll design the application in two parts: a ScrabbleAssistant class and a separate main program. The main program (which we've written for you) provides a simple menu interface and makes calls to the ScrabbleAssistant, which encapsulates all of the Scrabble Assistant features. The ScrabbleAssistant class should load an English-language dictionary (allow the user to specify a dictionary, but use /usr/share/dict/words as a default) into a hash table and support the following functions:

When we say that anagramsOf should take constant time, we mean that its running time should not depend on the size of the English-language dictionary you are using. The running time is however allowed to depend on the length of the string.

Implementing anagramsOf and findWords will require some algorithm design. You're welcome to include any data members you need for the ScrabbleAssistant, as well as any private helper functions you'd like.

In order to make anagramsOf run in constant time, it will be helpful to preprocess the word list in the ScrabbleAssistant constructor and maintain a table of anagrams. First, write a helper function to sort the letters in a string. (e.g. sortString("cat") should return "act"). You don't have to do this yourself; instead, use the std::sort function provided in the algorithm library.

#include <algorithm>
string sortLetters(string s) {
  sort(s.begin(), s.end());
  return s;
}
Then, create a dictionary that maps each sorted string to a vector of all words that sort to that string.

anagramsOf should provide all anagrams of a given set of letters, but won't be enough to provide all legal plays given a set of letters. To do this, you'll also need to evaluate all legal plays using fewer letters as well. Implementing findWords will require some clever design. One way to implement findWords is to break down the task into the following subtasks:

Once the above code is complete, your ScrabbleAssistant will be complete!
Part III: Dictionary Experiments.

The Experiment

For the final part of this lab, you'll run some experiments comparing the performance of several dictionary implementations, including a BST, your hash table using a good hash function, and your hash table using a suboptimal hash function. The experiment will add some number of elements to each dictionary and then retrieve them. It reports the average runtime for each insert and get operation.

There are two classes used in this experiment, both defined in experimentClasses. The StringIntGoodHash and StringIntBadHash classes both contain a string and an int data member. These classes are identical except for the hash functions, given in experimentHashes. It is your responsability to understand how these hash functions differ and exploit these differences in the experiment.

Your Tasks

Most of this code has been written for you in experiment.cpp. Your only implementation task is to implement getExperimentKeys function in experimentUtil.cpp. This function is used by the experiment to determine which keys to add to the dictionary. Your task is to create a set of keys tailor-made to cause several collisions for StringIntBadHash. Note that every key returned by getExperimentKeys must be distinct; the experiment will not run if you create duplicate keys.

Once you have written your getExperimentKeys function, use the experiment program to produce a graph much as you did for the mystery functions program from Lab03. This program does not require any arguments, so you can run it as e.g.:

$ ./experiment | gnuplot
This will both save a file called experiment.png as well as open a new window showing you the results of your experiment. If you successfully exploited the bad hash function, the runtime of the bad hash experiment should be comparitively high.

After completing your experiment, create a brief written summary that contains:

A couple of paragraphs should be sufficient. Please include a .tex or .pdf of your writeup, as well as the png file you created. If you provide .tex only and the grader cannot compile your writeup, you will receive no credit for the writeup.
Memory Management

Your program is required to run without memory errors; run valgrind to eliminate them. For this lab, memory leaks are not a priority. Small memory leaks are acceptable and will not significantly affect your grade. While "lost" memory should not occur in large amounts (e.g. millions of bytes), we encourage you to not spend hours and hours tracking down every last memory leak.

Commenting and Coding Style Requirements
Graders will assign minor penalties for poor commenting and coding style.
Please review the CS35 Coding Style for good C++ style.
Survey
When you have completed your lab, answer a few short survey questions in the file README.md.
Summary

When you have completed your lab, you should have

Submit

Once you are satisfied with your code, hand it in using git. Remember the add, commit, push development cycle. You can push as many times as you like, but only the most recent submission will be graded. You may want to run git status to confirm all modifications have been pushed.