CS21 Week 9: File I/O, Top-Down-Design

Week 9 Topics

  • File I/O (reading from files)

  • Top-Down Design

  • More Nested Loop Practice

Monday

Lab 8

Some details about Lab 8, the 2-week Top-Down Design lab.

File Input/Output

For many tasks, especially tasks that use large amounts of data, program input data is read in by the program from one or more input files, and program results may want to be stored an saved, so the program may write program results to one or more output files instead of to the terminal window.

We are going to focus on reading from file this week. Writes to a file are conducted in a very similar way, but we won’t write to files until later in the semester.

Files

A file is a sequence of data that is stored on your computer. A file typically consists of many lines. There is a special newline character that is stored at the end of each line in a file: "\n". However, it is not visible when you look at the file.

Today, we’ll use a running example using a file named words.txt.

We can view the file contents by opening the file in vim, or by cat’ing the file contents out to the terminal window:

  • vim words.txt

  • cat words.txt

    computer
    science
    Python
    Ninjas
    CS21
    Intro
    Think
    Fun

reading from a file

To read (or write) from (to) a file, the programmer must follow these three steps:

  1. Open the file to read. To do this we call the open function passing in the file name string, and the mode in which to open in ("r" for reading):

      infile = open("words.txt", "r")

    If the call to open is successful, it returns a new file object that we can use to read in from the file

  2. Read the contents of the file using some methods of the file class. One example is the readlines method that reads each line of the file into a list of strings, for example:

      # lines is a list of strings, one string element for each line in the file
      lines = infile.readlines()

    Note: there are many other ways to read in values from a file, as not every program may want to read in all the values, nor all the values at one time (this can be a particular problem for huge files). This is just one way to read in the file contents.

  3. Close the file when done reading from it:

      infile.close()

A file is accessed sequentially, which means that that when you open a file to read (or write), reads (or writes) start reading from the current position in the file, which starts at the begining of the file. As data are read (or written) the current position moves to immediately after he last value read (or written), and the next value read (or written) will be read from the current position, causing the current position to move again. We call this a sequential access because inorder to read the nth item in the file, first the previous n-1 items must be read.

For our example file, words.txt, the very first line read in from the file is:

computer

the next line read in is:

science

and so on.

There are ways to move the current file position around in the file other than by reading (or writing) from the start to the desiered spot. For now, however, we will read the file in its default sequential order.

example program that reads from a file

Let’s look at the program readwords.py that reads in the contents of words.txt file into a list of strings, one string per line in the file.

In this file, there is a single word per line. We are going to look the main program that has all three steps fore reading in values from the file.

It contains a call to the function print_list to print out the resulting list of strings read in.

  • Is the output what you expected?

  • How is it different than the input file?

  • Do you have an idea of what is going on here?

We are going to implement the function fix_list to make use of the strip() string method function to try to fix up the list. Then let’s uncomment the calls to fix_list and print_list from the end of main to see if we did it.

More ways to read

We are not going to look at this togther, and you don’t need to read values from a file now in more than one way, but if you are curious, in the file filetest.py are a set of functions for reading in the contents of a file in many different ways (all at once into a list, one line at a time, one character at a time). We will generally use the first way in this class as it is super handy, but if you use python for processing large files later, these other ways might be useful.

Top-Down Design

Continue with Top-Down Design from last week.

Wednesday

We are going to continue with Top-Down Design today.

Friday

More File Input/Output

Today we are going to look at some other programs that read in values from a file and manipulate the values in different ways.

Files with numeric data

We are going to look at readnumbers.py that reads in a set of numbers for an input file. The code to read in the values from a file (you can specify one of two files to read from numbers.txt in your current directory, or a file from a professor’s directory specifying the full path name /home/newhall/public/cs21/lotsofnums.txt).

We are going to look at the main program that already has all three steps for reading in the file contents into an array of strings, one per line, and then we will:

  • implement the function convert_to_int that convert the list of strings to their corresponding int values

  • implement the get_average function that takes this list of ints compute the average to test our converted list

Try out your solution with both files, the smaller one, numbers.txt, first.

Files with multiple elements per line

Finally, sometimes file data has multiple values per line. data.txt is an example file that contains lines student data, where each student has four pieces of data (a name, an age, a major, and a gpa value), each comma separated in the list.

cat data.txt

Alain,22,CS,3.1
Ali,19,Math,2.9
Anastassia,20,Math,3.0
...

We are going to look at a program, readdata.py, that reads in the file contents into a list of strings, fixes up the list calling fix_list that uses strip to remove trailing white space.

We are then going to write a function called separate_list that takes the list and converts each element of the list from a string to a list of strings, such that these inner lists contain a string value for each of the 4 pieces of data associated with each student.

This is an example of breaking up a line with multiple data up to access the individual data items, using the split string method function. split takes a string to split on, in this example it will be , (split(",")), and returns a list of substrings created from the string object, splitting them by the passed string == it treats commas like dividers for different pieces of data.

A call to split() that does not pass a string to split by, splits the string by white space characters — it treats spaces like dividers for different pieces of data.

Common String methods for file I/O

  • line.strip() — remove trailing whitespace (e.g., spaces, tabs "\t", newlines "\n")

  • line.split() — treat line as a list of strings separated by whitespace. return that list

  • line.split(<pattern>) — like line.split(), but treat as list of strings separated by <pattern>

Top-Down Design

Continue with Top-Down Design.