You may work with a partner on this assignment if you wish. Be sure to include both names in your README.
Hash maps with separate chaining
Implement the
Map interface using separate chaining as described in
section 9.2.5 of the text. Use the code in
class/09-Map as a
starting point. At least one of your constructors should support a floating
point load factor and a initial capacity as parameters. Your implementation
should double the array size and rehash the keys when the load capacity is
exceeded. You implementation should include a method
collisions() that
returns the total number of collisions observed in the hash table. To test your
implementation, build Maps on the words in
/usr/share/dict/words using
a load factor of 0.2, 0.4, 0.6, and 0.8. Code for running such an experiment is in the file
class/09-Map/TestCollisions.java. Display the total number of collisions
and the average number of collisions per word for each load factor. How do your
results compare to the
HashTableMap that uses linear probing? Compute
a histogram of the chain lengths in each bucket in the final hash table for
each load factor. An example histogram is shown below. This histogram is
completely made up and may not look at all like the actual distribution.
length freq (%)
0 20.00
1 40.00
2 30.00
3 5.00
4 2.00
5 1.00
6 1.00
7 0.50
9 0.25
10 0.24
15 0.01
Summarize the results of your experiments in a
README file. Is hashing a viable alternative to tree-based searching? Is chaining better or worse than linear probing?
Submitting
Along with your Java source code, you should hand in a README file that takes the from of a short
lab report.
That is, you should
- State your Research Questions - What are you exploring in this lab? What are the important parameters that you can control and systematically vary?
- Describe your experimental setup - implementation details, description of data, etc.
- Report your results - Nicely format your results. Perhaps one table with implementation and maximum load factor as rows, and observed load factor, total number of collisions, maximum number of consecutive collisions. And a second table that has shows the comparative distributions of consecutive collisions.
- List your conclusions - What can you say about the comparative advantages and disadvantages of each implementations based on your results?
Your README need not be long-winded. Rather convey you experiment in a concise manner.
These files will be imported automatically via handin35.