CS35: Lab 8: Scrabble Assistant

Due Wednesday, November 24th at 11:59pm.

Overview

The main goal for this lab is to gain experience working with hash tables. To that end, you will:

  1. Implement two new versions of the Dictionary ADT:

    • A LinearDictionary for a simpler dictionary implementation for small sets of elements.

    • A HashTable implementation that uses separate chaining (each chain will itself be a LinearDictionary).

  2. Apply your hash table implementation to create a simple program to assist Scrabble players find legal plays.

As usual, you can select your partner using Teammaker.

The URL for your repository will be:

git@github.swarthmore.edu:cs35-f21/lab08-<your-teamname>

Starting Code

Your starting code can be found in the appropriate repository for your team. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.

  • adts/: As with your last lab, the ADT interface declarations are stored here.

  • Makefile: the build instructions for this lab.

  • linearDictionary.h/-inl.h: The LinearDictionary class you will be implementing to store collisions within your HashTable.

  • hashTable.h/-inl.h: the HashTable class implementation files.

  • hashFunctions.h/cpp: Simple hash functions for integer and string keys, similar to hashes we saw in class.

  • tests.cpp: Unit test wrapper that will call provided tests for all three classes you will define. The tests for the classes are in testLinearDictionary.cpp, testHashTable.cpp, and testScrabbleAssistant.cpp.

  • manualTests.cpp: A "sandbox" test program that you can use while you develop your program.

  • main.cpp: a simple Scrabble Assistant application.

  • scrabbleAssistant.h/cpp: a C++ class encapsulating the functionality of the Scrabble assistant.

Part I: Linear Dictionary Implementation

We recommend that you implement a LinearDictionary as a vector of key-value pairs. All of the operations should run in \(O(n)\) time. You may be thinking: “Why would I want such an inefficient implementation of the Dictionary ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.

We will use the LinearDictionary to represent the chain of collisions in our HashTable. Because we expect each chain to be relatively short, the LinearDictionary provides a nice abstraction that will be efficient enough for our purposes.

Incrementally develop and test your linear dictionary as you implement using manualTests.cpp. We recommend starting with your constructor, insert, and either get or contains. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.

Unit tests have been provided for this lab; once you have completed your LinearDictionary implementation, the appropriate unit tests should pass.

$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.

Removing From a vector

Implementing the various methods will require you to manipulate the underlying vector you are using to store key-value pairs.

For the most part, this is straightforward (e.g., using vector::size() to get the size of the LinearDictionary, or vector::push_back() to insert). The exception to this is implementing LinearDictionary::remove(), which requires removing an element in the middle of the vector. The vector class provides a means to remove an element from the end of a list (vector::pop_back()) but doesn’t give a simple interface for removing elements e.g. from an arbitrary index. In linearDictionary.h/inl.h we have fully implemented a helper method, removeFromVector() to make this task simpler by specifying a vector. The method takes the vector you wish to delete from and the index for locating the element to remove. For example:

vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]

Concatenating vectors

It may be useful to you in this lab to concatenate one vector to another. Here’s an example of how to do this. The vector a contains the integers 10 and 20. The vector b contains the integers 30 and 40. You can insert all the elements of b to the end of vector a as shown below:

vector<int> a = {10, 20};
vector<int> b = {30, 40};
// concatenate b onto the end of a
a.insert(a.end(), b.begin(), b.end());

Part II: Hash Table Implementation

Hash Table

Once your LinearDictionary implementation is complete, you can begin implementing a chaining hash table as we discussed in class. The figure at right shows an example of a chaining hash table with size 3, capacity 4, and storing integer keys with string values.

Your implementation will be in the files hashTable.h and hashTable-inl.h. Here are some important details about this class:

  • We recommend that you use a dynamically allocated array of statically-allocated LinearDictionary objects to represent the hash table.

  • The HashTable class has two constructors. One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). In either case, select a reasonable default table capacity, (e.g., capacity=5).

  • All operations on your HashTable (other than getKeys and getItems) must run in \(O(1)\) average time.

  • The HashTable must maintain a reasonable load factor in order to operate efficiently. You will create a private helper method called expandCapacity to assist with increasing the table’s capacity when the load factor becomes too high.

  • Note that hashing functions have been provided for you in hashFunctions.h and hashFunctions.cpp for keys of type int and string.

Incrementally develop and test your hash table in a similar manner as LinearDictionary e.g., write tests along the way using manualTests.cpp Start with your constructor, insert, and get to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.

Once you are able to pass all of your own tests in manualTests.cpp then try our unit tests:

$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.

Part III: Scrabble Assistant

Once you have a working hash table, you can start writing your Scrabble Assistant. We’ll design the application in two parts: a ScrabbleAssistant class and a separate main program. The main program (which we’ve written for you) provides a simple menu interface and makes calls to the ScrabbleAssistant, which encapsulates all of the Scrabble Assistant features. The ScrabbleAssistant class should load an English-language dictionary (the main program allows the user to specify a dictionary on the command line, but uses /usr/share/dict/words as a default) into a hash table and support the following functions:

  • spellCheck: given a word, this function should return true if and only if the word is in the dictionary. Implementing this should be straightforward and should run in \(O(1)\) time.

  • anagramsOf: given a string of letters, this function returns a vector of all anagrams (i.e., permutations of these letters) which represent valid words. For example the letters "tca" will return "cat" and "act" as they contain exactly the same letters. This function should also take constant time in terms of the size of the dictionary (it is allowed to depend on the length of the string).

  • findWords: given a string of letters, this function returns a vector of all legal words that the player could spell with the given letters.

You’re welcome to include any data members you need for the ScrabbleAssistant, as well as any private helper methods you’d like.

As you incrementally implement your solution, test your ScrabbleAssistant using the manualTests.cpp file. Once you are convinced that your implementation is working, then try our unit tests:

$ make tests
$ ./tests scrabbleAssistant
Success: 4 tests passed.
Test time: 0.88 seconds.

The ScrabbleAssistant constructor

Your ScrabbleAssistant constructor should use a HashTable data member to store all of the legal words provided in the vector parameter called words. Remember that a Dictionary ADT stores keys with their associated values. In this application, we only care about the keys, which are the legal words in a real-world dictionary. It will take some work to insert all of these words in the beginning, but after that we should be able to look up any word in constant time.

Create your constructor and then implement the spellCheck method.

Finding Anagrams

In order to make anagramsOf run in constant time, it will be helpful to maintain an additional data member in the ScrabbleAssistant constructor. We can also keep a HashTable of anagrams. First, write a helper function to sort the letters in a string. (e.g. sortString("cat") should return "act"). You don’t have to do this yourself; instead, use the std::sort function provided in the algorithm library.

#include <algorithm>
string sortLetters(string s) {
  sort(s.begin(), s.end());
  return s;
}

Then, create a HashTable that maps each sorted string to a vector of all words that sort to that string. For each word in the dictionary:

  • sort the letters of the word to form the key

  • if the sorted letters are not already a key in the dictionary, add it with a vector containing just the current word.

  • if the key exists, then update the vector to include the word

For example, the search "tca" will sort to "act". Our dictionary should return a vector containing all legal words with the letters "act" (i.e., anagramDictionary.get("act") should return ["act", "cat"]). Be sure to write your own tests to verify that anagrams are being constructed properly. If no anagrams exist, your function should return an empty vector (not throw a runtime_error).

Optional improvement: Now that you’ve created a second HashTable to store all legal anagrams of words, think about how you could use this one table to also implement spellcheck, still in constant time. If you’d like to, you can eliminate the legal words HashTable and just implement all methods using the anagrams table.

This anagram dictionary will quickly provide you all possible legal words that use the whole set of letters in the search. In Scrabble, however, you are not required to play all letters in your hand. To account for this, you’ll also need to evaluate all legal plays using fewer letters as well in the function findWords. Implementing findWords will require some clever design. One way to implement findWords is to break down the task into the following subtasks:

  1. Find the power set of the given string

    • The power set is the set of all subsets of the given letters. For example, "abcd" should return all (set) combinations of size 4 ("abcd"), size 3 ("abc", "abd", "acd", "bcd"), size 2 ("ab", "ac", "ad", "bc", "bd"," cd"), and size 1 ("a", "b", "c", "d"). We have provided this for you in the method stringPowerSet.

  2. Get all anagrams of each string in the power set (using anagramsOf)

  3. Accumulate all the anagrams in a single vector

  4. Remove Duplicates: When you use the stringPowerSet method, you will likely find at some point that you have a vector of words that contain several duplicates. It might be helpful to create a removeDuplicates function that eliminates any extra copies of a word. There are several ways of accomplishing this:

    • Use a double-for loop that compares all pairs of words in the vector, looking for pairs that are the same. This will run in \(O(n^2)\) time.

    • First, sort the vector of words. Then, make a linear scan over the list, eliminating adjacent duplicates. This will take \(O(n log n)\) time using an efficient sorting algorithm.

    • Create a hash table with words as keys. Then, insert each word in the hash table assuming it isn’t already there. This should take \(O(1)\) time per word, or \(O(n)\) overall time.

Once the above code is complete, your ScrabbleAssistant will be complete!

Memory Management

For this lab, your program is required to run without memory errors or leaks. Because we are using large data structures in this lab, valgrind will run very slowly. We recommend that you run valgrind on each section of the lab separately as shown below. However, each test may still run quite slowly, so be patient.

  • Test the LinearDictionary: valgrind ./tests linearDictionary

  • Test the HashTable: valgrind ./tests hashTable

  • Test the ScrabbleAssistant: valgrind ./tests scrabbleAssistant

Coding Style Requirements

You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.

Summary of Requirements

When you are finished, you should have

  • A completed implementation of LinearDictionary and the helper methods

  • A completed implementation of HashTable and the helper methods

  • A completed implementation of ScrabbleAssistant

  • No memory errors

  • No memory leaks

  • Code which conforms to the style requirements