Week 7: top-down design (TDD)

Monday

This week and next week we are working on top-down design. This is a useful technique for writing larger, more complex programs. Today (Monday) we will learn about file input and output, so we can use data stored in files in our programs. On Wednesday we will start learning TDD.

Files

Motivations and possible uses:

  • video game data files (read in terrain data; keep track of high scores)

  • grader program: store student grade data in a file (don’t have to type it in each time the program runs)

  • iTunes: how does iTunes keep track of number of plays for each song??

Files can be just text files, like you edit with atom.

syntax

The basic syntax for opening a file is:

myfile = open(filename,mode)

where filename is the name of a file, and mode is the mode used for opening: usually read ("r") or write ("w") mode. Both arguments are strings, and myfile is just the variable I picked to store the file object returned by the open() function.

Here is an example of opening a file called poem.txt for reading, and storing the file object in a variable called infile:

infile = open("poem.txt", "r")

examples

Once you have a file object, you can use the input and output methods on the object.

OUTPUT

Here’s how to open a file for writing (note: myfile is a variable name that I choose, and "newfile" is the name of the file to write to):

$ python3
>>> myfile = open("newfile", 'w')
>>> myfile.write("write this to the file \n")
>>> myfile.write("and this.... \n")
>>> myfile.close()

and here are the results of this:

$ ls
newfile
$ cat newfile
write this to the file
and this....

What happens if we leave out the \n on each line??

INPUT

I have a file called words.txt with a few words in it:

$ cat words.txt
happy
computer
lemon
zebra

To open a file for reading, use 'r' mode:

>>> infile = open("words.txt", 'r')

File words.txt must exist, otherwise we get an error.

The infile variable, which is a FILE type, can be used as a sequence (e.g., in a for loop!):

>>> for line in infile:
...    print line
...
happy
computer
lemon
zebra

We can use the for loop like above, or we could use the file methods: readline() to read one line, readlines() to read them all at once.

>>>> # need to close and reopen to get back to start of file
>>> infile.close()
>>>
>>> infile = open("words.txt", "r")
>>> word = infile.readline()
>>> print word
happy
>>> word = infile.readline()
>>> print word
computer
>>> infile.close()
>>> infile = open("words.txt", "r")
>>> words = infile.readlines()
>>> print words
['happy\n', 'computer\n', 'lemon\n', 'zebra\n']

So readlines() reads in EVERYTHING and puts each line into a python list. NOTE: the newline characters are still part of each line! Sometimes you want to read in EVERYTHING, all at once. Sometimes it’s better to read data in line-by-line and process each line as you go (use the for loop: for line in infile)

File I/O Notes:

  • reading from and writing to files is usually S L O W

  • for this reason, we usually read in data at the beginning of a program and store it in a list or other data structure (ie, if we need the data throughout the program, it’s much faster to refer to the list rather than the file)

  • also, reading from the file is similar to watching a movie on VHS tapes — at the end of the movie, you have to rewind the tape to get back to the beginning. Once we do that for line in infile loop above, we are at the end of the file. You can "rewind" the file by closing and reopening it (or use the seek() method in python)

str methods: strip() and split()

Suppose we have a file of usernames and grades, like this:

$ cat grades.txt
saul:     93
thibault: 92.5
lauri:   100
andy:     70
jeff:     67.5
kevin:    85

If I want to find the average of all of those grades, I need to read in each line, then somehow pull out the grade and store it. This is where the str methods strip() and split() can be used. Here are examples of each:

>>> infile = open("grades.txt", "r")
>>> line = infile.readline()
>>> print(line)
saul:     93
>>> data = line.split(":")
>>> print(data)
['saul', '     93\n']
>>> grade = float(data[1])
>>> print(grade)
93.0

So split() just splits a string and returns the results as a list. In the above example we split the string on a colon (":"). Here are a few more examples of split(). By default (with no arguments given), it splits the string on whitespace.

>>> S = "a,b,c,d,e,f,g"
>>> L = S.split(",")
>>> print(L)
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> phrase = "the quick brown fox jumped over the lazy dog"
>>> words = phrase.split()
>>> print(words)
['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

And strip() will strip off leading and trailing characters. Again, by default it strips off whitespace. If you provide an argument, it will strip off that:

>>> S = "    hello\n"
>>> print(S)
    hello

>>> print(S.strip())
hello
>>>
>>> word = "Hello!!!!!"
>>> print(word.strip("!"))
Hello

Your turn

Can you write a program to read the grades.txt file into a python list of grades, and then calculate the average grade?

Here’s an example of what we want:

$ python3 grader.py
[93.0, 92.5, 100.0, 70.0, 67.5, 85.0]
average grade:  84.7

challenge

Once you have the grades in a list, can you find the highest and lowest grades?

$ python3 grader.py
[93.0, 92.5, 100.0, 70.0, 67.5, 85.0]
average grade:  84.7
highest grade: 100.0
 lowest grade:  67.5

Wednesday

top-down design

As we write bigger and more complex programs, designing them first will save time in the long run. Similar to writing an outline for a paper, using top-down design means we write out main() first, using functions we assume will work (and that we will write, later). We also decide on any data structures we will need.

What we want to avoid is writing a whole bunch of functions and code, and then realizing we forgot something, which changes some of the functions, which changes other functions, and so on.

Furthermore, we want a way to test each function as we write it. Many first-time programmers will write lots of functions and then start testing them. If each function has a few bugs in it, this makes it a lot harder to debug.

A typical top-down design includes the following:

  • main() all written out

  • function stubs written (def with params, function comment, dummy return value)

  • data structures clearly defined (store data in a list? an object? a list of objects? something else?)

  • the design should run without any syntax errors

  • running the design should show how the program will eventually work

Once you have the design all written out, you can now attack each function one-at-a-time. Get the function to work. Test it to make sure it works, then move on to the next function.

TDD example

Suppose we want to write this square word game:

$ python3 squarewords.py
l|l|e
d|c|o
r|r|a

word 1? corralled
Correct! Score = 10

g|p|i
l|l|a
g|i|n

word 2?
Incorrect...word was: pillaging   Score = 0

d|i|n
g|a|c
c|o|r

word 3? according
Correct! Score = 10

s|n|i
d|a|o
o|t|n

word 4? quit

The game uses 9-letter words, and displays them in a 3x3 box, where the letters run either vertically or horizontally, and the start of the word can be anywhere in the 3x3 box. The user’s job is to figure out each 9-letter word.

Here’s my TDD for the above program:

"""
squareword puzzle game as example of tdd

J. Knerr
Fall 2019
"""

from random import *

def main():
    words = read9("/usr/share/dict/words")
    score = 0
    wordnum = 0
    done = False
    while not done:
        word = words[wordnum]
        display(word)
        answer = getInput(wordnum)
        if answer == "quit":
            done = True
        elif answer == word:
            score += 10
            print("Correct! Score = %d" % (score))
        else:
            score -= 10
            print("Incorrect...word was: %s   Score = %d" % (word,score))
        wordnum += 1

def display(word):
    """display word in 3x3 grid with random start position"""
    print(word)

def getInput(n):
    """get user's guess, make sure it's valid, return valid guess"""
    guess = input("word %d? " % (n+1))
    # should allow 9-letter word, quit, and empty string
    return guess

def read9(filename):
    """read all 9-letter words from file, shuffle the order, return list"""
    words = ["aaaaaaaaa","bbbbbbbbb","ccccccccc"]
    return words

main()

Notice that main() is completely written! And the goal is that it shouldn’t need to change much, as I implement the remaining functions.

Also, the program runs, but doesn’t really do much yet (since I haven’t really written all of the functions). Here’s what it looks like so far:

$ python3 design-squarewords.py
aaaaaaaaa
word 1? 123
Incorrect...word was: aaaaaaaaa   Score = -10
bbbbbbbbb
word 2? bbbbbbbbb
Correct! Score = 0
ccccccccc
word 3? quit

Now, since I have a working program (it runs without syntax errors!), I can attack each function separately. I want to write a function and thoroughly test it before I move on to the next function.

write the getInput(n) function

Can you write the getInput(n) function? It should keep asking the user for input until it gets a valid string: either a 9-letter word, "quit", or the empty string.

a|r|y
m|a|x
i|l|l

word 1? hello
Please enter a 9-letter word!

word 1? 123456789
Please enter a 9-letter word!

word 1? wwww eeee
Please enter a 9-letter word!

word 1? abcdefghi
Incorrect...word was: maxillary   Score = -10

Friday

review of squarewords functions

Here’s one way to write the getInput(n) function from the squarewords.py file:

def getInput(n):
    """get user's guess, make sure it's valid, return valid guess"""
    while True:
        guess = input("word %d? " % (n+1))
        # should allow 9-letter word, quit, and empty string
        if guess == "" or guess == "quit":
            return guess
        elif len(guess)==9 and guess.isalpha()==True:
            return guess
        else:
            print("Please enter a 9 letter word!!")

The while True is just an infinite loop. I use it here to get the loop going, and not have to worry about a specific condition. Because it’s an infinite loop, I need to make sure there’s a way out. That’s what the return guess lines do: return the user’s guess back to main(), so both the while loop and the function are done. The only way out of this infinite loop is if we get valid input from the user. Otherwise we print the error message ("Please enter a 9 letter word!!") and loop back to the input() call at the top of the loop.

And here’s how to read in the data from the word file (one word per line), and only select the 9-letter, lowercase, all-alphabetic words:

def read9(filename):
    """read all 9-letter words from file, shuffle the order, return list"""
    words = []
    inf = open(filename, "r")
    data = inf.readlines()
    inf.close()
    # get 9-letter words from data, add to words
    for word in data:
        word = word.strip()
        if len(word)==9 and word.islower() and word.isalpha():
            words.append(word)
    return words

Note in this one how I user word.islower() and not word.islower()==True. Either would work, but word.islower() is already a boolean (True or False), so I don’t need to compare it — I can just use it as is: if the word is 9 characters AND they are all lowercase AND they are all alphabetic characters (abcdefg…​.), add the word to the list of words.

top-down-design on flashcards

Here’s an example of the program I want: read in some flashcards from a file, ask the user each card, keep track of how many they get correct, print an appropriate message, ask if they want to go again.

$ python3 flashcards.py
Flashcards file? german.txt
==============================
essen: to eat
Correct!
- -- -- -- -- -- -- -- -- -- -
kaufen: to buy
Correct!
- -- -- -- -- -- -- -- -- -- -
besuchen: to visit
Correct!
- -- -- -- -- -- -- -- -- -- -
fahren: to travel
Correct!
- -- -- -- -- -- -- -- -- -- -
lieben: to jump
Nope....lieben = to love
- -- -- -- -- -- -- -- -- -- -
schlafen: to sleep
Correct!
- -- -- -- -- -- -- -- -- -- -
spielen: to run
Nope....spielen = to play
- -- -- -- -- -- -- -- -- -- -
trinken: to drink
Correct!
- -- -- -- -- -- -- -- -- -- -
verstehen: to keep
Nope....verstehen = to understand
- -- -- -- -- -- -- -- -- -- -
OK...not terrible.
Go again? (y/n) n
Bye!

And here’s a sample flashcards data file:

$ cat german.txt
essen:to eat
kaufen:to buy
besuchen:to visit
fahren:to travel
lieben:to love
schlafen:to sleep
spielen:to play
trinken:to drink
verstehen:to understand

So again, the goal of the top-down design process is:

  • main() all written out

  • function stubs written (def with params, function comment, dummy return value)

  • data structures clearly defined (store data in a list? an object? a list of objects? something else?)

  • the design should run without any syntax errors

  • running the design should show how the program will eventually work

In class we did this part together. Here’s the design after we typed it all in:

flashcards program

J. Knerr
Fall 2019
"""

from random import *

def main():
    filename = input("Flashcards file? ")
    cards = readFile(filename)
    done = False
    while not done:
        shuffle(cards)
        # ask questions
        ncorrect = flash(cards)
        # print message
        message(ncorrect, len(cards))
        # ask if they want to go again
        done = quit()

def quit():
    """return True if they want to quit"""
    result = input("Go again? ")
    if result != "y":
        return True
    else:
        return False

def message(ncorrect, nprobs):
    """print message to user based on percent correct"""
    print("Good work!")


def flash(cards):
    """given the cards, ask questions, return how many correct"""
    return 3


def readFile(filename):
    """open file, read in all data, return list-of-lists"""
    cards = [["q1","a1"], ["q2","a2"], ["q3","a3"]]
    return cards

main()

list-of-lists

Note the use of a "list of lists" in the readFile() function. I want to read in each card and make a list, like this: ["essen", "to eat"]. Then I want to store all cards (which are lists) in a list. So the final data structure will look like this:

cards = [['essen', 'to eat'], ['kaufen', 'to buy'], ['besuchen', 'to visit'],
       ['fahren', 'to travel'], ['lieben', 'to love'], ['schlafen', 'to sleep'],
       ['spielen', 'to play'], ['trinken', 'to drink'], ['verstehen', 'to understand']]

For the above, what is cards[1]? And what is cards[1][0]?

>>> cards = [['essen', 'to eat'], ['kaufen', 'to buy'], ['besuchen', 'to visit'], ...]
>>> cards[1]
['kaufen', 'to buy']
>>> cards[1][0]
'kaufen'
>>> cards[1][1]
'to buy'

implement the readFile() function

Here’s one way to write the readFile() function to read in the data and return it as a list of lists.

def readFile(filename):
   """open file, read in all data, return list-of-lists"""
   inf = open(filename,"r")
   lines = inf.readlines()
   inf.close()
   cards = []
   for line in lines:
      card = line.strip().split(":")
      cards.append(card)
   return cards