CS21 Lab8: Spell Checker

Due 11:59pm Tuesday, March 30

A skeleton version of the program spellchecker.py will appear in your cs21/labs/08 when you run update21. In addition, this will copy two documents called letter and correct that can be used for testing, as well as the start of a list of words to ignore in the file called ignore.

You may work with a partner on this assignment. The best way to work as a team is to sit together to design and to code your solution. Take turns typing and watching. If you work with a partner, put your name and the name of your partner in the list of authors at the top of your program. Only one partner should run handin21 to submit the files for the group. Both partners will receive the same grade.

Overview

For this assignment you will implement an automated spell checking program that uses both searching and sorting. You should continue to practice using top-down design, and focus on breaking up the problem into appropriate functions. Also be sure to test your program incrementally.

You will use a dictionary of words that is available in a file on our system to determine whether each word in the document is a valid word. However, some valid words like Swarthmore will not appear in this dictionary. Therefore you will need another file of words, created by you, that should also be considered correct.

The goal of your program is to report all misspelled words found in the document. For example, suppose you had the following document called letter:

Dear Sam,

How are you?  I'm fine, but I mis seeing you more oftn.  Hope allis
well with your family, especialy your brother. I'll be stoping by next
Satuday when I get home from Swarthmore.  Love, Alice
Running the spell checker on this document should result in the following interaction with the user:
Enter filename of document to spell check: letter
Enter filename of list of words to ignore: ignore

The following errors were found by the spell checker:
mis
oftn
allis
especialy
stoping
Satuday
In the above example, the ignore file only contained the words Swarthmore and Chopp. The names Alice and Sam are actually in the dictionary.

1. Looking up words in the standard dictionary

Focus initially on checking whether the words in the document are found in the standard dictionary.

Implement and test this functionality before moving on to the next step.

2. Handling the words in the ignore file

Once you have the basic spell checking functionality implemented from above, you can add the ability to ignore other words not found in the dictionary. Many spell checking programs use this ignore list idea. The ignore list often contains proper nouns (e.g. Chopp), well known acronyms (e.g. WWW) that do not appear in a dictionary, but are also not misspellings.

With the lab starting point code is an example ignore file that you can add entries to. Just be sure to keep in the line break after the very last line in this file (just add new words to the top of the file to avoid making this change).


Requirements
  1. Your main function should be short and concise. You must break the problem up into suitable functions.
  2. You must use binary search to check for words in the dictionary.
  3. You must use selection sort to sort the list of words to ignore.

Optional Extenstions

Note:These are not required parts of the assignment.

Once you have the full functionality implemented and tested. You can try adding some additional features. Here are some suggestions:

  1. Change your program so that it only lists a misspelled word one time, rather than once per misspelled occurrence in the document.
  2. Handle words inside quotes and parentheses (i.e. "dogs" and (dogs) should be considered correctly spelled words.
  3. Ignore non-alphabetic collections of characters, like numbers.

These features should not be added by using the ignore file, but should be added to the spell checking functionality.

A good test case for these changes is something like the following that contains quotes, numbers, and parens:

Barak Obama said today, "I'm pleased with the passage of the Health
Care bill."  After 8 months of negotiations (which were sometimes
quite heated), the bill is a reality.

With the above features added, your program should indicate that this document has no misspelled words.

Submit

Once you are satisfied with your program, hand it in by typing handin21 in a terminal window.