Week 7, Wednesday: File IO, Top-Down Design



Idea of Top-Down Design (TDD):

Here is the main() function from a tic-tac-toe design:

def main():
  """run one game of tic-tac-toe"""

  # print rules of game??
  board = [" "]*9
  display(board)

  while True:
    userturn(board)
    printline()
    display(board)
    result = checkIfGameOver(board)
    if result != None:
      break
    computerTurn(board)
    display(board)
    printline()
    result = checkIfGameOver(board)
    if result != None:
      break

  # game is over, print results
  if result == "x":
    print "X wins!"
  elif result == "o":
    print "O wins!"
  else:
    print "it's a draw...."

And here are two function stubs for the above main():

def checkIfGameOver(board):
  """
  check if someone has won, or a draw.
  return "x" if X wins, "o" if O wins, "d" if draw, 
         None if game not over
  """
  return None

def computerTurn(board):
  """find best spot and play it"""

  board[5] = "o"
  return

I may or may not know how to code up these functions, but I at least have an idea of what they should do, including what they need from main() (the parameters), what the are going to do, and what they are going to return to main(). Plus, since I have written the function stubs, the whole design runs, and I can type in each function (one at a time!) and test them as I go.

For example, when implementing the computerTurn() function, a simple first step might be to have it choose a random location. That at least allows me to play the game and test out other functions. Once I know the rest of the game works, I can come back to computerTurn() and make it better.

Your turn: write a design for spellchecker.py

Suppose we were writing a spelling checker. One use would be to check a file and show any misspelled words, possibly providing alternative spellings or suggested words. Here is a quick example:

$ cat typos.txt 
teh
tthe
seperate
beleive
colum
wierd
peple

So the file has a lot of misspelled words in it. Now we run the spellchecker on that file:

$ python spellchecker.py 

--- spell checker ---

filename to check: typos.txt

0 teh ['eh', 'the', 'tea', 'tee', 'ten']
             eh      3344994
            the  19401194714
            tea     12125831
            tee      3744469
            ten     30258816
********* best word:  the
====================
1 tthe ['the', 'tithe']
            the  19401194714
          tithe       244972
********* best word:  the
====================
2 seperate ['separate']
       separate     32765953
********* best word:  separate
====================
3 beleive ['believe']
        believe     71132619
********* best word:  believe
====================
4 colum ['column']
         column     26162181
********* best word:  column
====================
5 wierd ['weird', 'wired', 'wield']
          weird      7222149
          wired      2892180
          wield       381835
********* best word:  weird
====================
6 peple ['people']
         people    382195698
********* best word:  people
====================

Using a variety of algorithms, we find all misspelled words, and then try to figure out what the user meant to type. To do this, we make use of two files:

$ cat /usr/share/dict/words

That is just a system word list. It has one word per line, and some 99,000+ words in it.

$ head /usr/local/doc/wordcts.txt 
a 7841087012
aardvark 93030
aardvarks 11762
abaci 5391
aback 272920
abacus 149548
abacuses 4473
abaft 19188
abalone 237386
abalones 6544

This file has words and counts in it, with each count being a measure of how common the word is in the English language (basically, they just counted how often a word was used in a lot of books).

Given those two files, can you write the spell-checker program?

design of spellchecker.py

Our algorithm will be something like this:

That looks like a lot of work, but is not so bad if we write the design and implement things one function at a time. Here is a start to the main() function:

def main():
  filename = raw_input("filename to check: ")
  words = readFile(filename)
  english = readSystemWords("/usr/share/dict/words")
  print(len(english))
  print(english[45000])

And here are the two function stubs:

def readSystemWords(filename):
  """open system file, read words into a list"""
  words = []
  return words

def readFile(filename):
  """
  open given filename, read in text,
  split into words, return list of words
  """
  words = []
  return words

The functions do not do anything useful, yet, but main() runs! The next step would be to keep adding to main(), until it is all written out. And each time you call a function in main(), write the stub.