In Lab 09, every query's result was computed from the word frequency trees, even if the user entered the same query repeatedly. In this lab you will use a dictionary to cache search results, making the search process more efficient when the user repeats a search. Before computing a query result, you will first check a dictionary to learn if this query has been answered before; if so, you can return (a copy of) the result from the dictionary without any further processing. If the query has not been answered before, you compute the query result as in Lab 09, save the result in the dictionary so it can be used later, and then return it.
Your work for this lab consists of three parts:
As usual, you can get the initial files for this lab by running update35. The new files are listed here, and the files that you will need to edit are highlighted in red:
You should start by completing and testing the HashTable class. Our HashTable uses an array of CircularArrayLists to store the data within the hash table, using chaining to resolve collisions. Each CircularArrayList stores key-value pairs; for keys of type K and values of type V each key-value pair therefore has type Pair<K,V>.
Please complete each function marked with a TODO comment in hashtable.inl. You must test each HashTable function you implement using the testHash program.
Once you've completed and tested the HashTable class, continue by implementing and testing the BinaryHeap copy constructor. A copy constructor is a constructor that creates a copy of a class from another instance of the same class, in this case a BinaryHeap. (The parameter for a copy constructor actually needs to be a const reference to an instance of the same class, but this is a C++ detail that you don't need to learn.)
To copy an object you can usually directly copy (by value) most data from the source object, but you must avoid copying some data (such as a pointer to a memory location) that would make the source object and the copy dependent on each other. For an example of a copy constructor, see the newly-provided CircularArrayList class.
If your sortedSearch program from Lab 09 was well-designed, you can complete this portion of the lab by editing only your queryprocessor.h and queryprocessor.inl implementation, without modifying sortedSearch.cc.
As described in the introduction above, your modified QueryProcessor should first check a dictionary to learn if this query has been answered before; if so, it can return a copy of the result (using the BinaryHeap's copy constructor!) from the dictionary without any further processing. (It's important to return a copy -- and not a value from the cache itself -- to prevent the main program from modifying the BinaryHeaps inside the query cache.) If the query has not been answered before, you compute the query result as in Lab 09, save the results in the dictionary, and then return them.
Your query result cache should be a hash table using query strings as keys and PriorityQueue pointers (BinaryHeap pointers) as values. To maximize the chance that a query string uses the cache, you should first convert the query string into a canonical form before using it as the cache key. To create the cache key from a search query, do the following:
For example, if "of" is an ignored word, the search query "department of computer science" would be converted to the cache key "computer department science". Similarly, "Computer Science department" would also be converted to "computer department science", which would allow the cache to return a saved query result instead of recomputing the result.
Please print output to tell us when a query result is newly-computed and when a query result was used from the cache. For example:
$ ./sortedSearch urls.txt ignore.txt Enter your search query: tree Searching for "tree": Cache miss. www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 21 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 13 Enter your search query: node Searching for "node": Cache miss. www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 11 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 10 www.cs.swarthmore.edu/~newhall/index.php: 1 Enter your search query: tree node Searching for "tree node": Cache miss. www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24 www.cs.swarthmore.edu/~newhall/index.php: 1 Enter your search query: tree node Searching for "tree node": Returning query result from cache! www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24 www.cs.swarthmore.edu/~newhall/index.php: 1 Enter your search query: A Node Tree Searching for "A Node Tree": Returning query result from cache! www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24 www.cs.swarthmore.edu/~newhall/index.php: 1 Enter your search query: the a nOdE trEE the a Searching for "the a nOdE trEE the a": Returning query result from cache! www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab08.php: 31 www.cs.swarthmore.edu/~charlie/cs35/f11/labs/lab07.php: 24 www.cs.swarthmore.edu/~newhall/index.php: 1 Enter your search query: <CTRL-D> $
Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.
You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.