CS21 Lab8: Spell Checker

Due 11:59pm Tuesday, March 30

For this program you may work with one partner (you may not work in groups larger than two).

If you work with a partner:

The best way to work with a partner is to work together in the lab on all parts of the program together. You can work together on a copy of the code in one partner's cs21/labs/08 subdirectory and then email the file as an attachment to the other partner so that he/she can save a copy of your joint solution in his/her cs21/labs/08/ subdirectory too.

Run update21, if you haven't already, to create the cs21/labs/08 directory. Then cd into your cs21/labs/08 directory and create the python programs for lab 08 in this directory (handin21 looks for your lab 08 assignments in your cs21/labs/08 directory):

$ update21

$ cd cs21/labs/08
$ vim spellchecker.py
With the labs/08/ directory we are giving you an example ignorefile and document file you can use to test your program. You should test your program on other documents and ignore files as well.

As you create your own documents and ignore files for testing, make sure that there is a blank line at the very end.

Overview

For this assignment you will implement an automated spell checking program that uses both searching and sorting. You should continue to practice using top-down design, and focus on breaking up the problem into appropriate functions. Also be sure to test your program incrementally.

You will use a dictionary of words that is available in a file on our system to determine whether each word in the document is a valid word. However, some valid words like Swarthmore will not appear in this dictionary. Therefore you will need another file of words, created by you, that should also be considered correct.

The goal of your program is to report all misspelled words found in the document. For example, suppose you had the following document called letter:

Dear Sam,

How are you?  I'm fine, but I mis seeing you more oftn.  
Hope allis well with your family, especialy your brother. 
I'll be stoping by next Satuday when I get home from 
Swarthmore.  Love, Alice
And suppose that you have a file named ignore with the following contents (Alice and Sam are in the system dictionary file):
Swarthmore
WWW

Running the spell checker on this document should result in the following interaction with the user:
Enter filename of document to spell check: letter
Enter filename of list of words to ignore: ignore

The following errors were found by the spell checker:
mis
oftn
allis
especialy
stoping
Satuday

1. Looking up words in the standard dictionary

Focus initially on checking whether the words in the document are found in the standard dictionary.

Some notes on File I/O in Python

Implement and test this functionality before moving on to the next step.

2. Handling the words in the ignore file

Once you have the basic spell checking functionality implemented from above, you can add the ability to ignore other words not found in the dictionary. Many spell checking programs use this ignore list idea. The ignore list often contains proper nouns (e.g. Chopp), well known acronyms (e.g. WWW) that do not appear in a dictionary, but are also not misspellings.

With the lab starting point code is an example ignore file that you can add entries to. Just be sure to keep in the line break after the very last line in this file (just add new words in the middle to avoid making this change).


Requirements
  1. You should use Top-Down-Design in designing your program. Your main function should be short, containing calls to high-level functions that implement different parts of the larger program. Think about writing good generic functions that you can use in multiple places in your program.
  2. You must use Binary Search (don't use linear search)
  3. You must implement Bubble Sort (don't use another sorting algorithm)

Sample Output
This is sample output from the example document and ingorefile you grabbed with update21. Note:The first example is just the Part 1 solution, the second example is the full solution (both Parts 1 and 2).

The sample output is using the document and ignorefiles that you got with update21.

Part 1 Output:

$ python spellchecker.py

Enter filename of document to spell check: document

The following errors were found by the spell checker:

Swarthmore
WWW
Swarthmore's
students'
department:
Danner
Horner
Kelemen
Knerr
Lia
Meeden
Newhall
Rothera
Turnbull
Wicentowski

Part 2 Output:

$ python spellchecker.py

Enter filename of document to spell check: document
Enter filename of list of words to ignore: ignorefile

The following errors were found by the spell checker:

Swarthmore's
students'
department:
Lia

Optional Extenstions

Note:These are not required parts of the assignment.

Once you have the full functionality implemented and tested. You can try adding some additional features. Here are some suggestions:

  1. Change your program so that it only lists a misspelled word one time, rather than once per misspelled occurrence in the document.
  2. Handle words inside quotes and parentheses (i.e. "dogs" and (dogs) should be considered correctly spelled words.
  3. Ignore non-alphabetic collections of characters, like numbers.

These features should not be added by using the ignore file, but should be added to the spell checking functionality.

A good test case for these changes is something like the following that contains quotes, numbers, and parens:

Barak Obama said today, "I'm pleased with the passage of the Health
Care bill."  After 8 months of negotiations (which were sometimes
quite heated), the bill is a reality.
With the above features added, your program should indicate that this document has no misspelled words.
Submit

Once you are satisfied with your program, hand it in by typing handin21 in a terminal window.