Week 7

Top-down Design

Top-down design is a problem-solving technique.

Most problems are too complex to "solve" as a whole. The premise of TDD is to start at the high level, think of overview, then add levels of detail.

It is not coding, but rather a thinking/designing process. The basic idea is to start with the problem and try to express the solution in terms of smaller problems. Then each smaller problem is handled in the same way until eventually the problems get so small that they are trivial to solve. Then you write each piece, test it, and join it in the with the solution so far.

Advantages:

  • Clarifies what we need to do
  • Breaks up problems into smaller parts (which are easy to solve)
  • Encourages the use of reusable pieces
  • Makes our program more modular, which is good for testing.

Kind of like writing a paper -- break it up into sections, then subsections, then write the body.

Process

  1. Overview of the problem. We first formulate general goals of what we're trying to solve.
  2. Break down the problem into large subparts.
  • Identify the high-level steps. Each step should describe about one function.
  • Kind of like an "outline" of the program
  • main() should look like this outline
  1. Start prototyping functions from Step 2
  • Develop an interface for each of the functions. What information (parameters) does the function need? WHat should it return?
  • Stub out each function (i.e. write the definition), explaining its purpose in a comment.
  • In each function, put one programming statement like print "in function <name>" to see the flow of the program
  • Note any things you aren't sure of (e.g. what should I return?)
  1. Iterative refinement
  • Break subparts into smaller parts
  • Repeat Steps 2 and 3 for any functions that are too complex.
  1. Run your program
  • Should be getting only print statements, but it should work

When done with Steps 1-5, then stat writing code

Implementation: Start from the bottom (where the simplest parts are!) - Write code for smallest subparts (one part = one function) - Unit test each function to be sure it's working - Move up to larger parts

Example

Problem: Simulate a given number of rounds of a craps game and report the probability that the user would win.

Rules of the game:

  • Roll two dice to get the initial roll (sum of dice)
  • If initial roll equals 2, 3, or 12, lose immediately; if initial roll equals 7 or 11, win immediately.
  • Otherwise, "roll for point" -- keep on rolling till either (1) we get the initial roll again, or (2) we roll a 7.
  • If we roll the initial roll again (without seeing a 7 first), we win.
  • If we roll a 7 before seeing the initial roll again, we lose.

Specifications of the program:

  • Ask the user how many games to simulate.
  • Calculate the probability that the player wins.

Top-down design

Top Level -- build the outline

  1. printIntro() -- introduces the program to the user
  2. getNumberOfGames() -- asks the user for the number of games, returns as an integer
  3. simulateNGames() -- simulates the required number of games, takes the user-specified number of games as a parameter, returns percentage that player won
  4. printSummary() -- prints the result for the user

This is essentially an outline of the program, and will end up being our main() function.

Second Level -- iterative refinement

Rule of thumb: If it takes you more than 5 minutes to figure it out, you probably should break it into functions.

  • printIntro() -- simple enough, don't need to break this down
  • getNumberOfGames() -- same, don't need to break this down
  • simulateNGames() -- should break this down:
  • Repeat for number of games specified:
    • simulateOneGame() -- returns True if player won, False otherwise
    • count number of times player won, returns as a fraction
  • printSummary() -- don't need to break this down.

Third Level -- repeat refinement - simulateOneGame(): - rollTwoDice() -- returns the result of throwing two dice, gets stored as the initial roll - Check if player has won or lost immediately. If so, return - rollForPoint() -- receives the initial roll as a parameter, returns True if player wins the roll for point, False otherwise.

Fourth Level -- repeat refinement

  • rollTwoDice() -- no need to break this down
  • rollForPoint() -- also simple enough, no need to break this down

Now we can start writing the function stubs. A function stub consists of:

  • The function definition (the def line). This also gives us the parameters
  • Comments that tell us what the function does, what parameters it takes, and what information it returns.
  • If the function returns something, a placeholder return value is included.
  • We often also put one single print statement ("inside functionname()") so we can test the flow of the program
In [1]:
"""
Grace Ngai

This program simulates the game of craps and calculates the expected winning probability.
"""
import random

"""
This is the main function
"""
def main():
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)
    
"""
This function prints an intro for the user
Parameters: None
Return Value: None
"""
def printIntro():
    print "Inside printIntro()"

"""
This function gets the number of games from the user
Parameters: None
Return Value: integer representing the number of games to simulate
"""
def getUserInputs():
    print "Inside getUserInputs()"
    
    return 10  # just a placeholder value

"""
This function simulates the requested number of games
Parameters: an integer representing the number of games to simulate
Return Value: The probability of winning
"""
def simulateNGames(num_games):
    print "Inside simulateNGames()"
    
    return 0.0  # just a placeholder value

"""
This function prints out the result
Parameters: A float representing the winning probability
Return Value: None
"""
def printSummary(winning_prob):
    print "Inside printSummary()"
    
"""
This function simulates rolling two dice
Parameters: None
Return Value: An integer representing the result of rolling two dice
"""
def rollTwoDice():
    print "Inside rollTwoDice()"
    
    return 11

"""
This function simulates one game of craps
Parameters: None
Return Value: True if player won, False otherwise
"""
def simulateOneGame():
    print "Inside simulateOneGame()"
    
    return True

"""
This function simulates rolling for point
Parameters: The initial roll
Return Value: True if player won, False otherwise
"""
def rollForPoint(initial_roll):
    print "Inside rollForPoint()"
    
    return True

main()
Inside printIntro()
Inside getUserInputs()
Inside simulateNGames()
Inside printSummary()

As seen, we can already test the flow of our program. This looks correct. So now, we start the implementation. With implementation, we always start from the bottom up. This is called "bottom-up implementation".

As each function is implemented, we put in some programming code into main() to test it out. This is called "unit testing".

In [2]:
"""
This function simulates rolling two dice
Parameters: None
Return Value: An integer representing the result of rolling two dice
"""
def rollTwoDice():
    dice1 = random.randrange(1, 7)
    dice2 = random.randrange(1,7)
    
    return dice1+dice2

# Inside main, we add some programming statements to test rollTwoDice()
def main():
    """
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)
    """
    
    print "Trial 1:", rollTwoDice()
    print "Trial 2:", rollTwoDice()
    print "Trial 3:", rollTwoDice()
    
main()
Trial 1: 6
Trial 2: 4
Trial 3: 12

This looks ok, so we move to the next function: rollForPoint(). This function will need parameters, so when we do unit testing, we will just give it some dummy parameters just to see whether things make sense.

In [3]:
"""
This function simulates rolling for point
Parameters: The initial roll
Return Value: True if player won, False otherwise
"""
def rollForPoint(initial_roll):
    while True:
        roll = rollTwoDice()
        if roll == initial_roll:
            return True  # player has won
        elif roll == 7:
            return False  # player loses
    
# Unit Testing in main()
def main():
    """
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)
    """
    
    print "Trial 1:", rollForPoint(11)
    print "Trial 2:", rollForPoint(10)  
    print "Trial 3:", rollForPoint(3)  
    print "Trial 4:", rollForPoint(7)    
    
main()
Trial 1: False
Trial 2: False
Trial 3: False
Trial 4: True

Next up: simulateOneGame()

In [7]:
"""
This function simulates one game of craps
Parameters: None
Return Value: True if player won, False otherwise
"""
def simulateOneGame():
    initial_roll = rollTwoDice()
    if initial_roll == 7 or initial_roll == 11:  # player wins immediately
        return True 
    elif initial_roll == 2 or initial_roll == 3 or initial_roll == 12: # player loses immediately
        return False 
    else:
        return rollForPoint(initial_roll)
    
# Unit Testing in main()
def main():
    """
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)
    """
    
    print "Trial 1:", simulateOneGame()
    print "Trial 2:", simulateOneGame()  
    print "Trial 3:", simulateOneGame()  
    print "Trial 4:", simulateOneGame()   
    
main()
Trial 1: False
Trial 2: True
Trial 3: True
Trial 4: False

Almost there -- if we can simulate one game, we can simulate however many the user wants.

Since ultimately we want to return the probability of winning the game, we need an accumulator to keep track of how many times the player won the game. That number divided by the number of games will give us our probability.

In [8]:
"""
This function simulates the requested number of games
Parameters: an integer representing the number of games to simulate
Return Value: The probability of winning
"""
def simulateNGames(num_games):
    games_won = 0
    # play n games
    for i in range(num_games):
        if simulateOneGame() == True:
            games_won = games_won + 1
    
    # Calculate and return probability
    # Remember floating point division!
    return float(games_won)/num_games

# Unit Testing in main()
def main():
    """
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)
    """
    
    print "Trial 1:", simulateNGames(100)
    print "Trial 2:", simulateNGames(5000)  

main()
Trial 1: 0.47
Trial 2: 0.4878

So the probability of winning at craps ia about 50%. Not too bad really.

Now we can finish up the rest of the functions. They are very easy, so we'll do them all in one go.

In [10]:
"""
This function prints an intro for the user
Parameters: None
Return Value: None
"""
def printIntro():
    print "This program calculates the probability of winning at craps"
    print "We need the number of games to simulate from the user"

"""
This function gets the number of games from the user
Parameters: None
Return Value: integer representing the number of games to simulate
"""
def getUserInputs():
    num_games = raw_input("How many games should we simulate?")
    num_games = int(num_games)
    
    return num_games


"""
This function prints out the result
Parameters: A float representing the winning probability
Return Value: None
"""
def printSummary(winning_prob):
    print "You have a", winning_prob, "chance of winning at craps"
    

"""
This is the main function
"""
def main():
    printIntro()
    num_games = getUserInputs()
    winning_prob = simulateNGames(num_games)
    printSummary(winning_prob)  
    
main()
This program calculates the probability of winning at craps
We need the number of games to simulate from the user
How many games should we simulate?1000
You have a 0.48 chance of winning at craps

Things to note in this exercise:

  1. Note how our functions are very short and very modular. In other words, each function does one and only one thing. This makes it very easy to test whether they are doing the right thing.
  2. Note how we tested each function as we wrote them. It is usually good practice to test incrementally, as you are writing out the program. That way, you know where the errors are.
  3. The function rollTwoDice() is called by two functions: simulateOneGame() and rollForPoint(). This is a demonstration of the reusability of code. If our functions are modular enough, we should be able to reuse them.
  4. Note how main() ends up being essentially an outline of the whole program. It illustrates the flow of the program.

File Input/Output

So far, all of our programs have taken input directly from the user (from the keyboard), and printed out the results directly to the monitor. This is fine for small programs, but for larger and more complex data, we often like to store them in files for easy processing.

In computer science, files are just sequences of characters that are stored in a persistent manner on disk. Persistence here means that the files do not "go away" when the computer is turned off. Another term is non-volatile.

All files are constructed from sequences of characters; however, some characters may not be visible to the human eye (or the human eye has learned to not perceive them.) These are the characters that make up the empty spaces: space (" "), tab (""), newline (""), etc.

Another kind of file is the binary file, which is used to store media such as images, videos, music, and executable programs. We will not consider these files in CS21.

In this class, we will demonstrate file input/output using the roster.txt file. Let's look at the beginning few lines of this file:

amandel1,Mandel,Alexander Joshua,2018
aparikh2,Parikh,Amir,2015
bhsiung1,Hsiung,Benjamin Zhi-Zhong,2018
btang1,Tang,Becky,2018
chanlon1,Hanlon,Clare Catherine,2018
dransho1,Ranshous,David Joseph,2016
eabraha1,Abraham,Elsher,2018

This is an example of a csv, or commas-separated-values file. It is a very common format that was used in programs like Microsoft Excel for a while.

The format of this file should be quite obvious: each line represents a student (actually a student in CS21). The fields are the Swarthmore email ID, the last name, other names, and the year of graduation. The fields are separated by commas. At the end of every line is the newline character ("").

Working with Files

For almost all programming languages, working with files consists of three things:

  1. Open the file
  2. Manipulate the file
  3. Close the file

Opening a file

To do anything with a file, we first need to open it. The syntax is:

<filehandle_var> = open(<filename>, <mode>)

filename is clearly the name of the file, as a string. If the file is in the same directory as the program, we can just use the filename. Otherwise, we need to give it the directory path so Python knows where to look for it.

mode tells Python what we want to do with the file. Python supports three modes:

  • "r" : open for reading
  • "w" : open for writing (will create a new file, but if the file already exists, it will overwrite the file)
  • "w+" : open for appending (if the file already exists, it adds the new information to the end rather than overwriting)

filehandle_var is an object that is created by Python when we open the file. It contains all the information that we need in order to manipulate the file, for example, where the file lives on the hard drive, whether it's open for reading or writing, etc etc. Just like all objects, we store it in a variable. And just like all objects, it has a set of methods that we can use to manipulate the file.

Manipulating a file

All programming languages support the following:

  1. Reading to a file
  2. Writing to a file
  3. Going to a particular location in a file

Closing a file

Opened files consume resources on the computer (e.g. memory, and with some older machines, there is a limit on how many files can be opened at the same time). Therefore, it is good practice to close a file when we're done with it.

Syntax:

<filehandle_var>.close()

Let's try to play around with the roster.txt file

In [16]:
infile = open("roster.txt", "r")  # opens the file for reading
print "Reading one character:", infile.read(1)    
print "Reading another character:", infile.read(1)
print "Reading two characters:", infile.read(2)
print "Reading the rest of the line:", infile.readline()
print "Closing the file"
infile.close()  # closes the file and frees up the filehandle
Reading one character: a
Reading another character: m
Reading two characters: an
Reading the rest of the line: del1,Mandel,Alexander Joshua,2018

Closing the file

From this, we can see:

  1. Files are read in sequentially from the disk.
  2. Each read() operation reads in a specified number of characters (and not the nth character)
  3. Each successive read() operation reads in the next n characters
  4. readline() will read in an entire line (defined as everything up to and including the first newline character after the current position)

So each file handle carries around with it a pointer that denotes the next character that's "in line" to be read. When we get to the end of the file, there are no more characters to be read; in this case, any read() or readline() operations will return the empty string "".

Reading files, the Python way

Most programming languages would require that we read in an entire file using a while loop, and breaking when we see the empty string:

In [18]:
num_lines = 0
infile = open("roster.txt", "r")  # opens the file for reading
while True:
    line = infile.readline()
    if line == "":
        break
    else:
        num_lines = num_lines + 1
infile.close()  # closes the file and frees up the filehandle
print "The file has", num_lines, "lines"
The file has 32 lines

Python, however, makes things much easier for us. We can use a for loop to go over the lines:

In [20]:
num_lines = 0
infile = open("roster.txt", "r")  # opens the file for reading
for line in infile:  # This will automatically check for the end of file empty string
    num_lines = num_lines + 1
infile.close()  # closes the file and frees up the filehandle
print "The file has", num_lines, "lines"
The file has 32 lines

Now let's play around with the file. This file contains the roster of CS21. We want to read this into a list of lists.

A list of lists is somewhat like a table. There are rows and columns. We can think of each row as being a list. Each column in that row would then be an element in that list. So when we put all the rows together, then we have a list of lists.

For illustration, let's read in the file into a list of student information. Each element in the top-level list (the list of rows) will correspond to one student. Each element in the second-level list (each column in each row) will store the information for that particular student.

So we'd like to have something like:

|----------|--------|--------------------|------|
| amandel1 | Mandel | Alexander Joshua   | 2018 | 
|----------|--------|--------------------|------|
| aparikh2 | Parikh | Amir               | 2015 |
|----------|--------|--------------------|------|
| bhsiung1 | Hsiung | Benjamin Zhi-Zhong | 2018 |
|----------|--------|--------------------|------|
| btang1   | Tang   | Becky              | 2018 |
|----------|--------|--------------------|------|

Our program would need to go something like this:

  1. Open the file
  2. Start an empty list (this will be our big list -- the list of rows), call it roster
  3. For each line in the file
  4. Turn that line into a list, with each element holding a field (e.g. login, lastname, etc)
  5. Append this list to the roster list
  6. Close the file
  7. Do whatever we need to with the roster list
In [21]:
def main():
    infile = open("roster.txt", "r")  # opens the file for reading
    roster = []  # This is our big list of lists
    for line in infile:  # This will automatically check for the end of file empty string
        line = line.strip()  # remove the newline character from the end of the line
        student = line.split(",") # separate out the line into a list of fields by splitting it at the commas
        roster.append(student) # add this small list to the end of the list of lists
    infile.close()  # closes the file and frees up the filehandle
    
    # sanity check
    print "Number of rows (also number of students):", len(roster)
    # look at the first student's data
    print "Login of first student:", roster[0][0]  # Note the double indexing
    print "Year of graduation of first student:", roster[0][3]
    
    # We can also assign a particular row to a variable
    second_student = roster[1]
    print "Login of second student:", second_student[0]  # no double indexing needed now
    print "Year of graduation of second student:", second_student[3]    

main()
Number of rows (also number of students): 32
Login of first student: amandel1
Year of graduation of first student: 2018
Login of second student: aparikh2
Year of graduation of second student: 2015

Note that:

  1. len(roster) gives us the number of rows
  2. len(roster[0]) gives us the number of columns in row 0
  3. len(roster[0][0]) gives us the length of the string stored in roster[0][0]

(This implies that Python list of lists can have different number of columns in each row.)

In [25]:
def main():
    infile = open("roster.txt", "r")  # opens the file for reading
    roster = []  # This is our big list of lists
    for line in infile:  # This will automatically check for the end of file empty string
        line = line.strip()  # remove the newline character from the end of the line
        student = line.split(",") # separate out the line into a list of fields by splitting it at the commas
        student[3] = int(student[3])  # the year of graduation should be an int
        roster.append(student) # add this small list to the end of the list of lists
    infile.close()  # closes the file and frees up the filehandle
    
    # Find out the number of students who graduated in a particular year
    year = 2018
    print "Students graduating in %d"%(year)
    
    # we have two ways of doing this. The first is the "all other languages" way
    print "Method 1"
    for i in range(len(roster)):
        if (roster[i][3] == year):
            print roster[i][2] + " " + roster[i][1]
            
    # This is the "pythonic" way
    print "\nMethod 2"
    for student in roster:  # roster is a list of lists, and each item in roster is a list.
        if student[3] == year:
            print student[2]+" "+student[1]
            
            
    

main()
Students graduating in 2018
Method 1
Alexander Joshua Mandel
Benjamin Zhi-Zhong Hsiung
Becky Tang
Clare Catherine Hanlon
Elsher Abraham
Duran Esteban Cabrera
John Vito Calia
Julian Journigan Dasan Turner
Katherine Chiouhwa Huang
Kyungchan Min
Muhaiminul Haque
Max Brandes Kassan
Nicholas J Patel
Rajnish Yadav
Sarah Kay Fischmann
Shua-Kym M McLean
Tyrone Dominick Clay
Tuan Anh Nguyen
Zhiyuan Liu

Method 2
Alexander Joshua Mandel
Benjamin Zhi-Zhong Hsiung
Becky Tang
Clare Catherine Hanlon
Elsher Abraham
Duran Esteban Cabrera
John Vito Calia
Julian Journigan Dasan Turner
Katherine Chiouhwa Huang
Kyungchan Min
Muhaiminul Haque
Max Brandes Kassan
Nicholas J Patel
Rajnish Yadav
Sarah Kay Fischmann
Shua-Kym M McLean
Tyrone Dominick Clay
Tuan Anh Nguyen
Zhiyuan Liu

In [28]:
def main():
    infile = open("roster.txt", "r")  # opens the file for reading
    roster = []  # This is our big list of lists
    for line in infile:  # This will automatically check for the end of file empty string
        line = line.strip()  # remove the newline character from the end of the line
        student = line.split(",") # separate out the line into a list of fields by splitting it at the commas
        student[3] = int(student[3])  # the year of graduation should be an int
        roster.append(student) # add this small list to the end of the list of lists
    infile.close()  # closes the file and frees up the filehandle
    
    # print out the students graduating in each year, sorted by year of graduation
    # Note: This involves a double for loop
    for year in range(2015,2019,1):
        print "Students graduating in %d"%(year)
        for student in roster:  # roster is a list of lists, and each item in roster is a list.
            if student[3] == year:
                print student[2]+" "+student[1]
        print # leave a blank line 
    
main()
Students graduating in 2015
Amir Parikh
Jameson Fremont Lisak
Lucy Peng
Nadeem Hamza
Pooja Kumar

Students graduating in 2016
David Joseph Ranshous
Gibson Reed Cook

Students graduating in 2017
Emmy Liu
Justine Brooke Albers
Julian Andrew Segert
Katherine Lacey Hannah
Meghana Ilsa Ranganathan
Mason Li Yu

Students graduating in 2018
Alexander Joshua Mandel
Benjamin Zhi-Zhong Hsiung
Becky Tang
Clare Catherine Hanlon
Elsher Abraham
Duran Esteban Cabrera
John Vito Calia
Julian Journigan Dasan Turner
Katherine Chiouhwa Huang
Kyungchan Min
Muhaiminul Haque
Max Brandes Kassan
Nicholas J Patel
Rajnish Yadav
Sarah Kay Fischmann
Shua-Kym M McLean
Tyrone Dominick Clay
Tuan Anh Nguyen
Zhiyuan Liu