CS35 Lab 10: Using a hash table to cache query results

Due by 11:59 p.m., Wednesday, November 16, 2011

In Lab 09, every query's result was computed from the word frequency trees, even if the user entered the same query repeatedly. In this lab you will use a dictionary to cache search results, making the search process more efficient when the user repeats a search. Before computing a query result, you will first check a dictionary to learn if this query has been answered before; if so, you can return (a copy of) the result from the dictionary without any further processing. If the query has not been answered before, you compute the query result as in Lab 09, save the result in the dictionary so it can be used later, and then return it.

Your work for this lab consists of three parts:

  1. Complete an implementation of a HashTable, which implements the Dictionary interface.
  2. Augment the BinaryHeap from Lab 09 with a copy constructor, which constructs a new BinaryHeap from another BinaryHeap.
  3. Extend your search program from Lab 09 to use a hash table to cache query results, returning a PriorityQueue from the query result cache when possible instead of recomputing the query result.

As usual, you can get the initial files for this lab by running update35. The new files are listed here, and the files that you will need to edit are highlighted in red:

You will also need to include your queryprocessor.h, queryprocessor.cc, and sortedSearch.cc solutions from Lab 09 and modify the QueryProcessor files to support query result caching.



Completing the HashTable implementation

You should start by completing and testing the HashTable class. Our HashTable uses an array of CircularArrayLists to store the data within the hash table, using chaining to resolve collisions. Each CircularArrayList stores key-value pairs; for keys of type K and values of type V each key-value pair therefore has type Pair<K,V>.

Please complete each function marked with a TODO comment in hashtable.inl. You must test each HashTable function you implement using the testHash program.



Implementing the BinaryHeap's copy constructor

Once you've completed and tested the HashTable class, continue by implementing and testing the BinaryHeap copy constructor. A copy constructor is a constructor that creates a copy of a class from another instance of the same class, in this case a BinaryHeap. (The parameter for a copy constructor actually needs to be a const reference to an instance of the same class, but this is a C++ detail that you don't need to learn.)

To copy an object you can usually directly copy (by value) most data from the source object, but you must avoid copying some data (such as a pointer to a memory location) that would make the source object and the copy dependent on each other. For an example of a copy constructor, see the newly-provided CircularArrayList class.



The QueryProcessor and your main sortedSearch program

If your sortedSearch program from Lab 09 was well-designed, you can complete this portion of the lab by editing only your queryprocessor.h and queryprocessor.inl implementation, without modifying sortedSearch.cc.

As described in the introduction above, your modified QueryProcessor should first check a dictionary to learn if this query has been answered before; if so, it can return a copy of the result (using the BinaryHeap's copy constructor!) from the dictionary without any further processing. (It's important to return a copy -- and not a value from the cache itself -- to prevent the main program from modifying the BinaryHeaps inside the query cache.) If the query has not been answered before, you compute the query result as in Lab 09, save the results in the dictionary, and then return them.

Your query result cache should be a hash table using query strings as keys and PriorityQueue pointers (BinaryHeap pointers) as values. To maximize the chance that a query string uses the cache, you should first convert the query string into a canonical form before using it as the cache key. To create the cache key from a search query, do the following:

  1. Convert the search query to all lower case, using the tolower function. You'll need to #include <cctype> before using tolower.
  2. Split the search query into whitespace-delimited words using the split function.
  3. Remove all ignored words from the search query.
  4. Sort the remaining words.
  5. Concatenate the sorted remaining words into a single string, with each word separated by a space.
The final string (after step 5) will be the cache key for the search query.

For example, if "of" is an ignored word, the search query "department of computer science" would be converted to the cache key "computer department science". Similarly, "Computer Science department" would also be converted to "computer department science", which would allow the cache to return a saved query result instead of recomputing the result.

Please print output to tell us when a query result is newly-computed and when a query result was used from the cache. For example:

  $  ./sortedSearch urls.txt ignore.txt
  Enter your search query:  tree
  Searching for "tree":
  Cache miss.
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 21
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 13

  Enter your search query:  node
  Searching for "node":
  Cache miss.
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 11
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 10
      www.cs.swarthmore.edu/~newhall/index.php: 1

  Enter your search query:  tree node
  Searching for "tree node":
  Cache miss.
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24
      www.cs.swarthmore.edu/~newhall/index.php: 1

  Enter your search query:  tree node
  Searching for "tree node":
  Returning query result from cache!
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24
      www.cs.swarthmore.edu/~newhall/index.php: 1

  Enter your search query:  A Node Tree
  Searching for "A Node Tree":
  Returning query result from cache!
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24
      www.cs.swarthmore.edu/~newhall/index.php: 1

  Enter your search query:  the a nOdE trEE the a
  Searching for "the a nOdE trEE the a":
  Returning query result from cache!
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31
      www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24
      www.cs.swarthmore.edu/~newhall/index.php: 1

  Enter your search query:  <CTRL-D>
  $


Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.