Class Notes Week 8

Announcements


Week 8 Topics

Monday Wednesday Friday


File Input/Output

A file is a sequence of data that is stored on your computer. For many tasks, especially task which use large amounts of data, input data will come from one or more files, and you will write output to a file instead of to the screen.

A file is typically made of many lines. There is a special newline character that is stored at the end of each line in a file: "\n". However, it is not visible when you look at the file.

Today, we'll use a running example using a file named bestInShow.txt. First, let's look at this file in atom.

2018,Whippet,Whiskey
2017,Brussels Griffon,Newton
2016,Greyhound,Gia
2015,Skye Terrier,Charlie
2014,Bloodhound,Nathan
2013,American Foxhound,Jewel

In this file, data about each winner is stored on a separate line. Each line has the same format:

Year,Breed,Name

To open a file, you do: <filevar> = open(<filename>, <mode>)

To close a file, you do: <filevar>.close()

There are a couple of ways that you can read data from a file. Here is perhaps the simplest:

infile = open("myfile", "r")
for line in infile:
  # process one line of the file
infile.close()

You can also read all the lines in at once:

infile = open("myfile", "r")
# lines is now a list of strings, one for each line of the file
lines = infile.readlines()
for line in lines:
  # process one line at a time
infile.close()

Let's start by just trying to see the contents of the file:

infile = open("colors.txt", "r")
for line in infile:
  print(line)
infile.close()

Is the output of this code snippet what you expected? How is it different than the input file?

The first thing you'll probably want to do with a file is remove the newline character from the end of the line. You can do this with the strip() method.

For files with multiple pieces of data per line, you'll want to break the line up into those individual pieces. Do this with the split() method. This will take a single string and return a list of "words" --- it treats spaces like dividers for different pieces of data.

Common String methods for file I/O

List of lists

We've talked over the semester how a lists can contain any type of data. Lists can even contain other lists. Lists of lists are common with when doing file I/O. If each line is a list of data, then its common to store load the entire contents of the file in to a big list, where each element is itself a list representing data from one line of the file.

Let's modify our program to store our information in a list of lists called winners

Each winner is a list containing year, breed, and name.

winners = [ ['2018', 'Whippet', 'Whiskey'],
 ['2017', 'Brussels Griffon', 'Newton'],
 ['2016', 'Greyhound', 'Gia'],
        ... ]

What would winners[0] give? What about winners[1]?

You can use double indexing to get at the data inside the winners list.

Let's go back to our program and modify it to store the winners information in a list of lists:

infile = open("bestInShow.txt", "r")
winners = []
for line in infile:
  winner = line.strip().split(",")
  winners.append(winner)
infile.close()
print(winners)
print("There are %d winners in the list." % (len(winners)))

Exercise:

Write a function getWinners() that takes a list of winners and a breed and prints out any winners of that breed.

Write a main function that reads in a list of winners from a file, asks the user for a breed, and uses getWinners() to print out all winners from the given breed.

Refactoring

Sometimes our initial design needs later improvements or changes. As our program has expanded, main() is becoming a bit long. Perhaps we can take the portion of main that reads in the file and make it a separate function. This technique is commonly referred to as refactoring code. Let's write a function together called loadWinners(fname) that takes a file name (string) referring to a file containing a list of winners and returns a list of lists. We can then replace the code in main() with winners=loadWinners("bestInShow.txt")

File I/O: writing to files

Opening a file for writing is similar to opening a file for reading --- just change the mode to "w".

Writing to a file is similar to printing to the screen. To write to a file, use the write() method.

outfile = open("helloworld.txt", "w")
outfile.write("hello world!!")
outfile.write("Corg hard! Sleep hard!")
outfile.close()

Exercise:

Modify bestInShow.py to add some additional winning dogs to your list dogs:

Then, write the entire list of winners to a file called moreBestInShow.txt. How does the contents compare to bestInShow.txt?

Unit Testing

Once you created a top-down design of your program, you should start incrementally developing and testing your solution. The process of implementing and testing individual components in isolation is called * unit testing*. It has several advantages:

Unit Testing in Python.

There are two main ways of unit testing in python. First, you can test in some other function (e.g., in your main() function, or in a special test() function). Second, you can use the Python Interpreter. Testing in the interpreter requires a little setup:

  1. add the following around your main function:
if __name__ == "__main__":
  # call main only if you're running as a program
  main()
  1. Inside python interpreter, import your program
>>> from winners import *