Data Structures and Algorithms

Lab 10: Scrabble Assistant

Due on Sunday, April 17th at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, you will write a hash table implementation for a dictionary data structure. You will then use that implementation to create a tool which assists in finding solutions to tiled word games (such as Scrabble). In the process, you will learn about additional parts of the C++ standard libraries. You will also provide a written analysis of the performance of your hash table under different circumstances.

Starting Code

Crossword letter tiles

Your starting code consists of the following files. Files that you will need to change are bolded below.

More C++ Standard Library Tools

This section points out some C++ standard library tools which will be helpful in completing your lab.

pair

This lab does not provide you with a Pair class; in fact, one already exists in the C++ standard library (spelled in lower case: pair). You can create and use a pair as follows:

#include <utility>

pair<int,char> foo(int x) {
    pair<int,char> p(5,'z'); // statically allocated pair
    p.first += x; // update first field of the pair
    if (p.first > 10) { // read first field
        p.second = 'q'; // update second field
    }
    return p; // return the pair; note that this is a statically-allocated object, so the return is *copied*
}

sort

The C++ standard library also provides a sorting algorithm through a template function called sort, although the interface is somewhat peculiar. It uses an object of type RandomAccessIterator which, in this case, is used to mark locations within a data structure. This allows you to sort parts of e.g. a list. To sort a vector, you might write:

#include <algorithm>
#include <vector>

void sort_a_vector(vector<int>* vec) {
    sort(vec->begin(), vec->end());
}

Here, the begin and end method produce iterators. These iterators can be used for far more than just marking an index and are an important abstraction, but we needn’t worry about that just to use the sort template function.

Note that vec above is a pointer. If we passed in a vector (instead of a pointer to one), we would be sorting a copy of the vector (since it would be copied when it was passed in).

erase

The vector class provides a means to remove an element from the end of a list (pop_back) but doesn’t give a clear interface for removing elements e.g. from the front of a list. In fact, vectors provide a general function called erase which can delete entire contiguous regions:

void remove_range_from_vector(vector<int>* vec, int start_index, int end_index) {
    vec->erase(vec.begin() + start_index, vec.begin() + end_index);
}

The above function works much like a Python slice: the start index is included but the end index is not. Thus, remove_range_from_vector(1,3) would remove the second and third elements (indices 1 and 2) from the vector.

Scrabble Assistant

Dictionary

Once you have a working (i.e. tested) HashTable, you can write the Scrabble Assistant. We will design the application in two parts: a ScrabbleAssistant class (which does all of the interesting work) and a scrabble_assistant_main function (which provides the interface that allows the end user to interact with that object).

The ScrabbleAssistant class has four methods. One is used to tell the ScrabbleAssistant about a new word; when you first create the object, you will call this method for each word that appears in the dictionary and the ScrabbleAssistant will use that call to update its data strutures and remember which words exist. Once it has been informed of these words, the ScrabbleAssistant has three primary functions.

Spellchecking

The spellcheck method determines whether a given word is in the dictionary. This would be trivial except that we have a performance requirement: spellcheck must run in \(O(1)\) time!

In order to satisfy this constant time requirement of spellchecking, you’ll need to maintain a mapping in which the keys are dictionary words and the values are anything at all (e.g. true, 0, or the word itself). Since get is constant time on HashTable objects, you can determine whether a word is in the dictionary just by trying to get it as a key.

Anagrams

The anagrams_of method determines the dictionary words which can be created by rearranging a given string of characters. For instance, the word “bat” contains exactly the same letters as the word “tab” and no other English words contain exactly those letters.

In order to find anagrams, you should first write a simple function which sorts the letters in a string. For instance, the string "bat" would sort to "abt". (Note: "tab" would sort to the same string.) Then, when a new word is added to the ScrabbleAssistant, you will sort it and store it in a Dictionary that maps from sorted letter strings to lists of words that can be spelled by them.

In this example, for instance, that Dictionary would map the string "abt" to a list containing the strings "bat" and "tab". Later, when a user wants to know the anagrams of "bat", you can sort the string and use the Dictionary to find this list.

Word-Finding

The ultimate goal of the ScrabbleAssistant is to help a player determine which words they could play in a word tile game like Scrabble. This goal is met by the find_words method which, given a string of letters, will give each of the words that the player can form from that string. This includes words that are made of some subset of those letters.

A simple algorithm to address this would be to write an additional function which, given a string, produces all of the sorted combinations of letters than can be made from that string. For instance, the string "abt" would produce the following list of strings:

a b t ab at bt abt

Note that, for a string of length \(n\), this vector is of size \(O(2^n)\). That’s okay.

To find words for the game, you can simply compute this vector of possible strings and then determine the legal anagrams for each one. Given an input (such as "bat"), you can sort it ("abt"), produce the above vector, and then determine the anagrams for each such string.

Writing the Main Program

Pressed Switch

Once you’ve written and tested your ScrabbleAssistant class, the only thing you have left to do is to write a main program that allows the user to interact with it. In order to make this program useful, we will allow it to be used in one of two ways: batch mode (in which we operate entirely with files) and interactive mode (in which we interact with the user directly). This is a lot easier than it might sound.

To launch your program in interactive mode, we run it as follows:

scrabble_assistant ./test_data/dictionary.txt

Here, the first command-line argument is the dictionary to use. The user may then give any of the following three commands:

If the user gives you an incorrect command, just reply with a sensible error message.

To launch your program in batch mode, we run it as follows:

scrabble_assistant ./test_data/dictionary.txt ./test_data/some_input.txt ./test_data/some_output.txt

In this case, it reads commands from the second file and writes the result lines to the third file.

This is very easy to implement. You will find a function, answer_scrabble_assistant_questions, in scrabble_assistant_main.cpp. This function takes an istream and an ostream. istream, for instance, is a superclass of both the type of cout as well as of ifstream. That means that, in scrabble_assistant_main, you can call answer_scrabble_assistant_questions either with cin and cout (in interactive mode) or with an ifstream and an ofstream that you created (in batch mode). From within answer_scrabble_assistant_questions, you won’t have to worry about which mode you’re in!

Written Portion

Scientist Conducting Experiment

The HashTable class relies upon external guarantees to achieve its performance: many operations have \(O(1)\) time complexity as long as the hash function is “good”, the load factor is “low enough”, and so on. In order to get a more precise understanding of how HashTable works, you will devise and conduct a small experiment.

Your starter code includes a simple program called stat_collection; further, your HashTable class contains a method called print_stats. You should implement print_stats to provide some information about the internal state of the hash table. Here are some examples:

For your experiment, you should pick an independent variable that you will control (e.g. size of table, numerical distance between keys added to table, load factor, etc.) and a dependent variable that you will observe (e.g. one of the above). Then, write code that prints the statistics for HashTables at varying values of your independent variable. Create one or two graphs explaining this. Then, create a write-up in which you discuss how you collected your data (at least one paragraph) and discussing the results (at least one paragraph).

You may use any software you like to create the graphs as long as the final result is accessible in a common image format (PNG or JPG preferred) or as a PDF or LaTeX source. Your write-up should be either a text document, a PDF, or LaTeX source. You can, for instance, use gnuplot to generate your graph if you are so inclined; there’s a brief tutorial here. You can also use a tool such as Microsoft Excel or OpenOffice Calc if you like, as long as your hand-in meets the requirements above.

Testing

As usual, you have been provided RUN_MAIN and CHECK_FILES_EQUAL. You should write tests in tests.cpp for your HashTable data structure, your ScrabbleAssistant class, and your scrabble_assistant_main function.

Memory

Your program is required to run without any memory leaks or errors; you must free up all of the memory that you allocate and you must only use memory that you have allocated and initialized. We will use valgrind to determine if you have done this correctly, so you should as well. You should also consider using valgrind to find bugs in your program as you proceed; it’s one of two very powerful C++ debugging tools that we’ve discussed.

Coding Style Requirements

You are required to observe some good coding practices:

Well-dressed people (style)

Peer Review

As this is a team assignment, you are required to complete a peer review of your lab partner or partners. You must do this even if everything is fine. You should complete this peer review after you have turned in your assignment. The peer review comes in the form of a simple Google Form and can be accessed here; in most cases, it will likely take less than a minute to complete. If you have trouble accessing this peer review, please make sure that you are logged into Google using your Swarthmore credentials.

Your peer review will be kept private; only course staff will have access to this information. You can also update your peer review after you have submitted it. However, if you do not submit a peer review, you will lose points on your lab. Please don’t forget!

Summary of Requirements

When you are finished, you should have