CS 35

Program 9

due Tuesday 10 April by midnight
You may work in pairs on this if you like.
Implement a Cs35hashset class that uses what Weiss calls 'separate chaining hashing' (many call it 'open hashing'). You do NOT have to implement all the Java HashSet methods. You DO have to implement it so that you can count the number of probes in building it and/or using it. In addition I want you to implement constructor(s), add, contains, size. Use Java's hashCode method.

You may use Java's ArrayList or LinkedList for the each list of elements with the same hashvalue or you may roll your own. BUT, because you have to be able to count the number of probes involved in the Cs35hashset operations, you cannot use the contains method of the built-in Java ArrayList or LinkedList class.

Once you have a working Cs35hashset class, I want you to build a CS35hashset of Strings with load factor 0.8 from the words in: /usr/share/dict/american-english. Print out the length of the longest chain and the average chain length per word (i.e. divide by the number of words not the tablesize). Also print out the total number of probes to build the set and the average number of probes per word.

Then look up, in the set constructed above, every word in /home/cfk/pub/cs35/week6/wordfreq/crusoe.txt counting the number of probes. Output the total number of probes, the total number of words in crusoe.txt and the average number of probes per word to complete this task. In doing this latter task, I have in mind that you read the crusoe.txt file linearly looking up each word as it comes along. Thus common words will be looked up often contributing frequently to both the probe count and the total word count.

In your README file, in addition to instructions on how to run your program, compare your results with the theoretical results Weiss reports in section 20.5.