CS35: Lab #8

You may work with a partner on this assignment if you wish. Be sure to include both names in your README.

Hash maps with separate chaining

Implement the Map interface using separate chaining as described in section 9.2.5 of the text. Use the code in class/09-Map as a starting point. At least one of your constructors should support a floating point load factor and a initial capacity as parameters. Your implementation should double the array size and rehash the keys when the load capacity is exceeded. You implementation should include a method collisions() that returns the total number of collisions observed in the hash table. To test your implementation, build Maps on the words in /usr/share/dict/words using a load factor of 0.2, 0.4, 0.6, and 0.8. Code for running such an experiment is in the file class/09-Map/TestCollisions.java. Display the total number of collisions and the average number of collisions per word for each load factor. How do your results compare to the HashTableMap that uses linear probing? Compute a histogram of the chain lengths in each bucket in the final hash table for each load factor. An example histogram is shown below. This histogram is completely made up and may not look at all like the actual distribution.

length     freq (%)
0          20.00
1          40.00
2          30.00
3           5.00
4           2.00
5           1.00
6           1.00
7           0.50
9           0.25
10          0.24
15          0.01

Summarize the results of your experiments in a README file. Is hashing a viable alternative to tree-based searching? Is chaining better or worse than linear probing?

Submitting

Along with your Java source code, you should hand in a README file that takes the from of a short lab report. That is, you should

State your Research Questions - What are you exploring in this lab? What are the important parameters that you can control and systematically vary?
Describe your experimental setup - implementation details, description of data, etc.
Report your results - Nicely format your results. Perhaps one table with implementation and maximum load factor as rows, and observed load factor, total number of collisions, maximum number of consecutive collisions. And a second table that has shows the comparative distributions of consecutive collisions.
List your conclusions - What can you say about the comparative advantages and disadvantages of each implementations based on your results?

Your README need not be long-winded. Rather convey you experiment in a concise manner.
These files will be imported automatically via handin35.

CS35 Lab #8:

Hash Tables