CS35 Lab 10: Scrabble Solver

Due by 11:59pm on Monday, November 26, Written portion due in class, Tuesday, November 27.
I encourage you to work with a partner on this lab. For this lab, you may work on the written portion with your lab partner.

Introduction

Your work for this lab consists of three parts:

  1. Complete an implementation of a HashTable, which implements the Dictionary interface.
  2. Use your HashTable to implement a program that will allow a user to check if a word is a valid word in the dictionary. Also your program will allow the user to enter a set of letters and receive all legal anagrams of that word.
  3. You will test the efficiency of your HashTable by varying properties such as the load factor, initial capacity, and hash function and report your results.

As usual, you can get the initial files for this lab by running update35. The new files are listed here, and the files that you will need to edit are highlighted in blue:



Completing the HashTable implementation

You should start by completing and testing the HashTable class. Our HashTable uses an array of vectors to store the data within the hash table, using chaining to resolve collisions. Note that this is not a vector of vectors. table is a standard C++ array, where each index contains an actual vector object (not a pointer). You should be able to draw this out as a memory diagram.

Each vector stores key-value pairs; for keys of type K and values of type V each key-value pair therefore has type Pair<K,V>.

Please complete each function marked with a TODO comment in hashtable.inl. You must test each HashTable function you implement using the testHash program. I have provided the data members you should keep track of and the implementation of insert. You should first read and understand these portions. I have also provided two sample hash functions, one for ints and one for strings. You may modify these as you see fit.

Note that I have defined a variable MAX_ALPHA. Load factor is generally noted as the letter alpha, and thus the highest load factor allowed is specified using MAX_ALPHA. Also note the odd declaration for the constructor:

 HashTable(int capacity = 53, double maxLoadFactor = 0.8);
This says that the user can specify two parameters, capacity and maxLoadFactor when constructing a HashTable. If the user omits these arguments, the default values are 53 and 0.8 respectively. This is a general technique for specifying optional parameters in C++.

Lastly, do not forget to use exceptions if a function was not called properly. For example, if there are no items in the table, which functions should be called? If a search process fails, which functions can handle that properly and which should throw an exception?

Scrabble Checker

Scrabble is a popular board game based on a players ability to generate words from a given set of letters (called tiles). While there are more details needed to learn the game, all that you need to know for this assignment is that a player may only play actual words. That is, a legal play is is a word that exists in an English dictionary. Second, an important skill is being able to generate a list of possible plays given a finite set of letters in no particular order.

Your goal for the main program is to help a user learn about possible word plays. (Of course, this tool will NEVER be used to cheat in a real game. Scout's honor :) ). Your program will greet the user and provide a simple interface with three options: exit, cheat a little, or cheat a lot. In more marketable terms:

  1. Exit the program loop
  2. Verify a word is in the dictionary. This option will ask the user for a "play" and check if the play is legal, i.e., if it is an actual word
  3. Generate a list of legal plays. This option will ask the user for a series of letters; duplicate letters are allowed. You will then return all of the words that use those letter in any order.

These two parts are respectively the roles of a spell checker and an anagram solver. You will solve both problems using dictionaries. There are not too many other details, I want to encourage creativity for this lab. For the anagram solver, you can assume all letters have to be used. You are more than welcome to solve the harder problem of considering all subset of letters.

To solve both problems, you will need to load a dictionary of known words. Fortunately, there is an example of this in testHash.cpp. You will need to load the words in the file /usr/share/dict/words. You will need to construct dictionaries to solve each problem. The first dictionary will index all known words, and is simply a problem of verifying if a word is in the data structure or not.

For the anagram solver, you will need to store a dictionary of anagrams and the associated list of words that can be created from it. This is slightly more complicated. One tip to consider is that anagrams are not order specific, so it may be best to sort all of the letters first. For example the letters c,a,t form the words cat, act, and tac. If the user had given me the letters in a different order, e.g. t,c,a the answered would be the same. So it is probably best to store your sequence of letters in sorted order and then sort the letters given by the user to search the dictionary.

Additional Requirements

Sample Run

./scrabbleCheat 
These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 1

Enter the word you wish to play: quilt
Congratulations, you have found a legal play!

These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 1

Enter the word you wish to play: QuIlT
Congratulations, you have found a legal play!

These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 1

Enter the word you wish to play: blerg
Sorry, that is not a legal play

These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 2

Enter the letters you wish to play: tca
Your legal plays are: 
	act
	cat

These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 2

Enter the letters you wish to play: ACT
Your legal plays are: 
	act
	cat

These are you options: 

	(0) Exit
	(1) See if a play is legal
	(2) Find all legal plays for a set of tiles


Which option would you like? 0

Thank you for playing Scrabble Solver!


Vectors

The Standard Template Library (STL) contains many templated implementations of data structures covered in this course, plus many beyond. In this lab, you will use the vector class which is most closely related to the List interface and uses something akin to the ArrayList implementation. You can read about its usage here.



Written Portion

In testHash.cpp or a separate file, explore some of the parameters of your HashTable implementation. As discussed in class, the performance of HashTables depend on several factors (e.g., initial choice of capacity, hash function, load factor, etc.). Pick one or more of these factors as the independent variable in your experimental design. Use reportStats to gather information about several possible dependent variables (i.e., the y-axis) including the number of key-comparisons, the percentage of non-zero buckets in the table, the maximum size of any list, the average size of non-zero buckets, etc. This is by no means an exhaustive set of possible dependent and independent variables.

Once you explore some possible pairings of variables, produce graphs that show a couple interesting relationships. This can be done with one or two graphs. For example, you may have one graph showing the effect of initial capacity (x-axis) on comparisons (y-axis) with a curve for two different hash functions. If you completely separate tests, show them in two graphs (e.g., one for capacity vs comparisons, one for load factor maximum versus average list length). Write a short paragraph describing how you generated the data (e.g., inserted 1,000,000 values into a HashTable, varying the maximum load factor each time). Write another sentence or two summarizing your findings.

This portion can be done with your lab partner. You will hand in your solution in during class on Tuesday, November 27. I suggest modifying testHash.cpp. When you hand in the normal part of the lab, be sure that testHash.cpp shows the original hash table tests. You can encapsulate your experiments in additional functions, not be deleting the original tests.

Extension - Define Subscript Operator

In class, we discussed how dictionaries are like arrays but with generic indexing. The subscript operator in arrays (e.g., C++ standard arrays, Python lists) is very similar to the get operation. In fact, let us overload the subscript operator for CircularArrayLists. First, let us recall how getItem(int i) is defined:

template <typename T>
T CircularArrayList<T>::getItem(int i) {
  if (i < 0 || i >= size) {
    throw std::runtime_error("Attempted to getItem out of bounds.");
  }
  return values[(headPos+i)%capacity];
}
Hopefully that is nothing new! Now let us see how the indexing operator is defined for this class:
template <typename T>
T& CircularArrayList<T>::operator[](int i) {
  if (i < 0 || i >= size) {
    throw std::runtime_error("Attempted to index out of bounds");
  }
  return values[(headPos+i)%capacity];  
}

Notice that this looks almost exactly the same. The main difference is in the return value. getItem returns a value of type T. Just as we pass-by-value in C++, by default we return-by-value. That means that the value returned is a copy of the original item stored in the CircularArrayList.

The subscript operator is used to get values, and thus the getItem return-by-value would be suitable. But we also wand to use indexing to modify or set values e.g., list[i] = 7. To do this, we need to have access to the actual piece of memory where the data is stored. So, we return a data type of T&. This tells C++ not to return a copy, but to return the actual chunk of memory where the data is stored so that it can be modified.

Try defining the indexing operator for Dictionaries. Unleash the power of polymorphism!



Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.