CS35: Lab 10

Introduction

Your work for this lab consists of three parts:

Complete an implementation of a HashTable, which implements the Dictionary interface.
Use your HashTable to implement a program that will allow a user to check if a word is a valid word in the dictionary. Your program will also allow the user to enter a set of letters and (i) receive all legal anagrams of that word and (ii) receive all legal plays using the set of letters.
You will test the efficiency of your HashTable by varying properties such as the load factor, initial capacity, and hash function and report your results.

As usual, you can get the initial files for this lab via Github. The files that you will need to edit are highlighted in blue:

dictionary.h: Defines a Dictionary, a pure virtual interface for a dictionary.
hashTable.h: Defines a HashTable implementation of the Dictionary.
hashTable-inl.h: Implements the HashTable class.
testHashTable.cpp: A program to test your implementation of the HashTable class.
scrabbleSolver.cpp/scrabbleSolver.h/scrabbleMain.cpp: Your main program for helping a Scrabble player achieve success
testScrabbleSolver.cpp: A program to test helpers for your Scrabble solver.

Completing the HashTable implementation

You should start by completing and testing the HashTable class. Our HashTable uses an array of vectors to store the data within the hash table, using chaining to resolve collisions. Note that this is not a vector of vectors. buckets is a standard C++ array, where each index contains an actual vector object (not a pointer). You should be able to draw this out as a memory diagram.

Each vector stores key-value pairs; for keys of type K and values of type V each key-value pair therefore has type Pair<K,V>.

You must implement methods for the HashTable interface. Feel free to divide the implementation into multiple files (or not).

You must test each HashTable method you implement using the testHash program. We have provided the data members you should keep track of, and implementations of hashing methods for strings and integers.

Do not forget to use exceptions if a function was not called properly. For example, if there are no items in the table, which functions should throw exceptions? If a search process fails, which functions can handle that properly and which should throw an exception?

Scrabble Checker

Scrabble is a popular board game based on a players ability to generate words from a given set of letters (called tiles). While there are more details needed to learn the game, all that you need to know for this assignment is that a player may only play actual words. That is, a legal play is is a word that exists in an English dictionary. Second, an important skill is being able to generate a list of possible plays given a finite set of letters in no particular order.

Your goal for the main program is to help a user learn about possible word plays. As in lab 9, we will use command-line options to provide input from the user. The program should be run with:

./scrabbleMain <dictionary-to-use> <action> <letters>

dictionary-to-use is a file path to a dictionary (for example /usr/local/doc/sowpods.txt). Keep in mind that you can create your own small sample dictionaries for testing, just by creating a text file with a different word on each line.
action is one of "check", "anagrams", or "plays", and decides what to do with the input. For check, the program should print out whether or not the given letters, in the order given, are a word in the dictionary. For anagrams, the program should print out all the anagrams of the given letters that appear in the dictionary. For plays, the program should print all the words in the dictionary that can be made using any combination of the given letters.

The two main technical parts to implement are a spell checker and an anagram solver. You will solve both problems using dictionaries. There are not too many other details, we want to encourage creativity for this lab.

To solve both problems, you will need to load a dictionary of known words. In the simplest case, this is quite similar to other problems you've seen in this course before, like in the spellChecker. For example, for check, all you need is a dictionary that will index all known words, which is simply a problem of verifying if a word is in the data structure or not.

For the anagram solver, you will need to store a dictionary of letter combinations mapping to the associated list of words that can be created from it. This is slightly more complicated. One tip to consider is that anagrams are not order specific, so it may be best to sort all of the letters first. For example the letters A,E,R form the words ARE, ERA, and EAR. If the user had given the letters in a different order, e.g. R,A,E the answer would be the same. So it is probably best to store your sequence of letters in sorted order and then sort the letters given by the user to search the dictionary. This requires implementing a small sorting function on strings – feel free to use any appropriate algorithm for this.

There are a few ways to generate all legal plays. Perhaps the easiest way is to generate a list of all possible subsets of letters, and then use the anagram solver on each subset of letters. Implement the anagram solver first. Make sure to avoid printing out the same legal plays multiple times. One way to do this is to build up a dictionary of legal plays as you iterate over subsets. You can print out your dictionary after you finish iterating.

Additional Requirements

Your program must use good design principles. Your main program should be concise and easy to follow
You must use a Dictionary data structure to solve both the spell-checking and anagram options.
Your searches/dictionaries should be case insensitive. To make this easy, you should convert all words ready from the file into lower case before inserting them into the Dictionary data structure. You should also convert user searches.
Loading the sowpods dictionary should take less than 10 seconds on department machines, and none of the actions should take more than a few seconds to complete beyond the dictionary loading. If you're having performance issues, do test on smaller dictionaries to make sure your algorithms are correct before trying to make them faster.

Sample Runs

$ time ./scrabbleMain /usr/local/doc/sowpods.txt check leader
leader is in the dictionary.

real  0m1.160s
user  0m1.135s
sys 0m0.024s

$ time ./scrabbleMain /usr/local/doc/sowpods.txt anagrams leader
dealer
laered
leader
leared
redeal

real  0m1.354s
user  0m1.322s
sys 0m0.032s 

$ time ./scrabbleMain /usr/local/doc/sowpods.txt plays leader
ale
lea
a
arle
earl
laer
lare
lear
rale
real
are
ear
era
dele
leed
lard
leer
lere
reel
deer
dere
dree
ered
rede
reed
laree
leare
leear
arede
deare
eared
dealer
laered
leader
leared
redeal
ad
da
ae
ea
de
ed
al
la
ar
alee
eale
lar
dale
deal
lade
lead
dae
ee
alder
arled
lader
ared
dare
dear
eard
rade
read
dal
lad
eel
lee
dee
el
ard
rad
ere
ree
del
eld
led
er
re
red
elder
lered

real  0m1.334s
user  0m1.305s
sys 0m0.028s

Vectors

The Standard Template Library (STL) contains many templated implementations of data structures covered in this course, plus many beyond. That is, you do not, in fact, need to copy-paste a List implementation every time you need one! In this lab, you will use the vector class which is most closely related to the List interface, and you can think of as essentially a built-in ArrayList implementation. You can read about its usage here.

One detail of the implementation that isn't immediately obvious is how to remove an element from a vector. For our data structures, we had a remove method that took an index as a number. Vectors require that you use a different notion of indexing when removing, based on an offset from the beginning of the vector. So to remove an element at a particular index i, from a vector v use:

v.erase(v.begin() + i);

Other than this detail, the provided documentation should be enough to let you use vectors as if they were array lists without too much trouble. There is an example of creating and populating a vector in readFileToList, so you can see the interface in action.

Written Portion

The final task is to measure some aspects of your HashTable implementation. As discussed in class, the performance of HashTables depend on several factors (e.g., initial choice of capacity, hash function, load factor, etc.). Pick one or more of these factors as the independent variable in your experimental design. Use reportStats to gather information about several possible dependent variables (i.e., the y-axis) including the number of key-comparisons, the percentage of non-zero buckets in the table, the maximum size of any list, the average size of non-zero buckets, etc. This is by no means an exhaustive set of possible dependent and independent variables.

Once you explore some possible pairings of variables, produce graphs that show a couple interesting relationships. This can be done with one or two graphs. For example, you may have one graph showing the effect of initial capacity (x-axis) on comparisons (y-axis) with a curve for two different hash functions. If you completely separate tests, show them in two graphs (e.g., one for capacity vs comparisons, one for load factor maximum versus average list length). Write a short paragraph describing how you generated the data (e.g., inserted 1,000,000 values into a HashTable, varying the maximum load factor each time). Write another sentence or two summarizing your findings.

We should be able to get a feel for how you generated your statistics by running:

$ ./scrabbleMain report-stats

In the starter code, this option does nothing. Put the code you use for your experiments here.

Submitting your work

As usual, git push by the deadline to submit. Note the extended deadline this week!

CS35 Lab 10: Scrabble Solver

Additional Requirements

Sample Runs