CS35 Lab 5: Spell Checker

Due by 11:59pm Sunday, October 4, 2015
In this lab, you will implement a C++ linked list. You will then write a program that will load a dictionary into a linked list and perform basic operations on the dictionary. Your program will also have a feature which allows users to use your dictionary to check and correct spelling of documents.

To get started, clone lab5-<partner-names>.

This lab will be done in pairs. Both partners should be present and working on the code together. You will both be responsible for understanding all concepts, so dividing and conquering is not an option. The academic integrity policy applies to the entire pair; you cannot work or share code with anyone outside of your partner.

You will earn reasonable partial credit for this lab if you complete and test your linked list implementation. So be sure to work in a modular fashion, testing and completing your linked list first before implementing the main program.



Implementation details

Here, I list the requirements for each major component of the lab. See the Implementation Strategy section below for hints on how to begin attacking the problem.

You have been given the following files (blue indicates you will need to modify/implement in this file):

Makefile

There are several options in the Makefile for this week.

The vg- options are provided as conveniences. The binaries that are run are called testStringList for tests, and spellCheck for the main implementation. You can also separately run, for example:
$ make testStringList
$ valgrind ./testStringList
Which does the same thing as make vg-test. This can be useful if you want to run:
$ valgrind --leak-check=full ./testStringList
to get more information about memory leaks.

StringList

I have provided a StringList interface. It is very similar to the interface given in lecture, but adds two additional methods that will be needed for this lab. search takes a string value and returns the index of the first occurence in the list (or -1 if value is not in the list). deleteItem deletes the ith element of the list.

You must define and implement a LinkedStringList class that implements the StringList interface, without altering the StringList interface.

LinkedStringList (and its Requirements)

Begin by thinking about your methods in pseudocode, and then implement them one-by-one. Some requirements to note:

Using Valgrind

valgrind is a tool that runs C++ binaries and checks for certain incorrect uses of memory. In particular, it checks for three kinds problems that could show up in your projects:

To use Valgrind, take the command you use to run a binary, say ./main, and add the valgrind command in front:
$ valgrind ./main
A report will appear that looks like:
... Lots of lines like the below
==59100== Invalid read of size 8
==59100==    at 0x10006EF72: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (in /usr/lib/libc++.1.dylib)
==59100==    by 0x100009617: ASList::get(int) (in ./run_tests)
==59100==    by 0x100004DEC: Testa_test::RunImpl() const (in ./run_tests)
==59100==    by 0x10000EECA: void UnitTest::ExecuteTest<UnitTest::Test>(UnitTest::Test&, UnitTest::TestDetails const&, bool) (in ./run_tests)
==59100==    by 0x10000EDB9: UnitTest::Test::Run() (in ./run_tests)
==59100==    by 0x10000FBDC: UnitTest::TestRunner::RunTest(UnitTest::TestResults*, UnitTest::Test*, int) const (in ./run_tests)
==59100==    by 0x10000FE4D: int UnitTest::TestRunner::RunTestsIf<UnitTest::True>(UnitTest::TestList const&, char const*, UnitTest::True const&, int) const (in ./run_tests)
==59100==    by 0x10000F7FF: UnitTest::RunAllTests() (in ./run_tests)
==59100==    by 0x100005363: main (in ./run_tests)
==59100==  Address 0x10081f7b8 is 8 bytes after a block of size 80 alloc'd
==59100==    at 0x100024EA1: malloc (vg_replace_malloc.c:303)
==59100==    by 0x10006C43D: operator new(unsigned long) (in /usr/lib/libc++.1.dylib)
==59100==    by 0x1000096A4: ASList::expandCapacity() (in ./run_tests)
==59100==    by 0x100009AAC: ASList::addHead(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (in ./run_tests)
==59100==    by 0x1000032BB: Testa_test::RunImpl() const (in ./run_tests)
==59100==    by 0x10000EECA: void UnitTest::ExecuteTest<UnitTest::Test>(UnitTest::Test&, UnitTest::TestDetails const&, bool) (in ./run_tests)
==59100==    by 0x10000EDB9: UnitTest::Test::Run() (in ./run_tests)
==59100==    by 0x10000FBDC: UnitTest::TestRunner::RunTest(UnitTest::TestResults*, UnitTest::Test*, int) const (in ./run_tests)
==59100==    by 0x10000FE4D: int UnitTest::TestRunner::RunTestsIf<UnitTest::True>(UnitTest::TestList const&, char const*, UnitTest::True const&, int) const (in ./run_tests)
==59100==    by 0x10000F7FF: UnitTest::RunAllTests() (in ./run_tests)
==59100==    by 0x100005363: main (in ./run_tests)
==59100== 
Success: 1 tests passed.
Test time: 0.08 seconds.
==59100== 
==59100== HEAP SUMMARY:
==59100==     in use at exit: 39,544 bytes in 435 blocks
==59100==   total heap usage: 521 allocs, 86 frees, 46,067 bytes allocated
==59100== 
==59100== LEAK SUMMARY:
==59100==    definitely lost: 80 bytes in 1 blocks
==59100==    indirectly lost: 0 bytes in 0 blocks
==59100==      possibly lost: 0 bytes in 0 blocks
==59100==    still reachable: 436 bytes in 5 blocks
==59100==         suppressed: 39,028 bytes in 429 blocks
==59100== Rerun with --leak-check=full to see details of leaked memory
==59100== 
==59100== For counts of detected and suppressed errors, rerun with: -v
==59100== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0)

The parts you need to care about are:

  1. The definitely/indirectly/possibly lost rows. These indicate cases where some value was allocated but never freed, for example by forgetting a delete statement at the end of a method, or a destructor that doesn’t correctly delete sub-pieces of a data structure. In this case, I removed the use of delete in the destructor of ASList, causing an array to not be freed.
  2. The ERROR SUMMARY, and at least a little bit about the lines above it. In this example (which we’ll go over in lab), I used an implementation of the array-backed list that didn't increase capacity correctly, causing an out-of-bounds read for a number of tests. This was reported by Valgrind as a “read error,” and the error indicates that it happens inside the ASList::get method.
When there are no problems, the report will look like:
==59194== LEAK SUMMARY:
==59194==    definitely lost: 0 bytes in 0 blocks
==59194==    indirectly lost: 0 bytes in 0 blocks
==59194==      possibly lost: 0 bytes in 0 blocks
==59194==    still reachable: 436 bytes in 5 blocks
==59194==         suppressed: 38,868 bytes in 427 blocks
==59194== 
==59194== For counts of detected and suppressed errors, rerun with: -v
==59194== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 17)

For this lab, you need to ensure that there are no errors or leaks when running the tests or main program with valgrind.

testStringList

You must modify your testStringList.cpp file to make sure your LinkedStringList implementation is working. The ninjas will ask you to show your test file to help you with errors.

Main SpellCheck Program (and its requirements)

Once you have thoroughly tested your LinkedStringList implementation, you should implement the main program. This program has been started for you. To keep the program clean, we divided the main program into three files. spellCheckMain.cpp only contains the main loop. Modify this to get it working, but any large implementation details belong in the helper files. Those details should use good top-down design (e.g. one function per feature). Any functions should be declared and defined in spellCheck.h/.cpp respectively. We provided readFileToList() and printListToFile() for you. To complete the main program:

A couple pointers when implementing these methods:
An implementation strategy

You should always program incrementally such that you can test your work as you go. Here is one suggested strategy:

  1. Begin by declaring the LinkedStringList class in linkedStringList.h. Remember that you must declare all data members and at least all methods specified in the StringList interface before proceeding.
  2. Start implementing your class in linkedStringList.cpp I suggest stubbing all functions (that is, defining the function and giving a dummy return value). This will let you compile the class properly without having done any actual implementation. An example function stub is:
    /* getItem - returns the value for the ith item in the list
     * @param i, the index of the string to return
     * @return a string containing an element of the list
     */
    string LinkedStringList::getItem(int i) {
      //TODO: implement this
     
      return "EXAMPLE WORD";
    }
    
  3. Check to see if your class compiles, fix any errors before proceeding. Now, begin implementing one method at a time. I suggest starting with insertAtHead or insertAtTail. Your procedure should be to test and implement one method at a time as much as possible. (Obviously, it's difficult to test insertAtHead without getItem, for example, but you should be able to proceed somewhat incrementally).
  4. Implement the destructor last. Carefully consider the ways the class might allocate memory on the heap, and be sure that there is a corresponding call to delete for each call to new.
At this point you should have a complete working LinkedStringList class, as well as a very thorough set of tests.

You should not under any circumstances begin the spell check implementation until you have thoroughly verified the correctness of your LinkedStringList class. Any bugs in your LinkedStringList class will hinder progress on the main program. Begin working on spellCheck.cpp by implementing and testing one feature of the program at a time. Be sure that your solution follows good design/modularity guidelines.


Tips and Hints

Throwing and Handling Exceptions

Your linked list should handle invalid usage by throwing a runtime_error. There is an example in the ASList implementation from class. For example, if you may want to handle an attempt to get an invalid item:

// At the top of the file
#include <stdexcept>

// In an implementation
string LinkedStringList::getItem(int index) {
  if (/* check for out of bounds here */) {
    throw runtime_error("Attempt to get out of bounds item");
  }
}
When testing, you can check for an error with:
//          some erroneous call   the type of error it throws
CHECK_THROW(mylist->getItem(), runtime_error);
Don't forget to include the stdexcept library when testing or throwing exceptions.

Handling Upper Case Letters

All of the words in your dictionary will be lower case. It will be nice to be able to spell-check, say, the first word in a sentence without losing the case. One way to do this is to store the word from the document as is, but search for a lower-case version of the word in the dictionary. To do this, you'll need a function that converts a string to lower case. Here is one such implementation:

string lower(string word) {
  string toReturn = "";
  for(int i = 0; i < word.length(); i++){
    toReturn += tolower(word[i]);
  }
  return toReturn;
}

Handling Punctuation

I recommend handling punctuation in the same way. Store each "word" without removing punctuation, but strip this away when checking for the word in the dictionary. For example, the following code removes possible periods from the end of a word.

string stripPunctuation(string word) {
  string toReturn = word;

  if(toReturn[len-1] == '.')) {
    toReturn.erase(len-1); // erase is a string method
  }

  return toReturn;
}

Sample Run
The following illustrates how your main program might operate:
$ cat q.txt 
The quick broown fox jumps over the lazy dog.
$ ./spellCheck 
Enter dictionary name, or "sowpods" for scrabble dictionary: sowpods

Loading /usr/local/doc/sowpods.txt...
Dictionary Loading Complete
Dictionary contains 267752 words.

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 1

What word do you want to add? oogabooga

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 3

Enter the word you want to check: oogabooga
oogabooga is in the dictionary.

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 2

What word do you want to remove? oogabooga

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 3

Enter the word you want to check: oogabooga
oogabooga is NOT in the dictionary.

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 1

What word do you want to add? bear
Warning: bear is already in the dictionary!

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 4

Enter name of file to spell check: input/q.txt
[broown] is not in the dictionary.
What do you want to do?
  (1) add broown to dictionary.
  (2) ignore word
  (3) correct spelling
3
Enter correction now
brown

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 4

Enter name of file to spell check: input/gettysburg.txt
[battle-field] is not in the dictionary.
What do you want to do?
  (1) add battle-field to dictionary.
  (2) ignore word
  (3) correct spelling
1
[--] is not in the dictionary.
What do you want to do?
  (1) add -- to dictionary.
  (2) ignore word
  (3) correct spelling
2
[--] is not in the dictionary.
What do you want to do?
  (1) add -- to dictionary.
  (2) ignore word
  (3) correct spelling
2
[--] is not in the dictionary.
What do you want to do?
  (1) add -- to dictionary.
  (2) ignore word
  (3) correct spelling
1

(0) Exit
(1) Add word
(2) Remove word
(3) Check word
(4) Spellcheck document

Enter Choice: 0

Goodbye!
$ more input/q.txt
The quick brown fox jumps over the lazy dog. 
Note that the input file q.txt now has no spelling mistakes.
Submit

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt.

Run git push to hand in your submission (and to save your code intermittently).