Week 7, Friday: More File IO



reading in large amounts of data

Take a look at the file called /scratch/knerr/itunesData-Feb2016.csv. You can use less or cat or head to view the contents of the file:

$ head /scratch/knerr/itunesData-Feb2016.csv 
The A Team,4:19,Ed Sheeran,9/9/12; 12:34 PM,16
A-Punk,2:18,Vampire Weekend,9/16/13; 12:56 PM,27
A.M. Radio,3:57,Everclear,8/21/09; 5:37 PM,52
A.M. Radio,3:59,Everclear,12/11/09; 9:31 AM,17
About to Break,3:56,Third Eye Blind,10/11/10; 10:57 AM,22
Abracadabra,5:08,Steve Miller Band,4/13/12; 5:43 PM,4
Abraham,5:12,Mark Erelli,4/5/09; 3:00 PM,2
Absence Makes The Heart Grow Fonder,2:28,Loudon Wainwright III,1/8/07; 11:02 AM,28
Accelerate,3:34,R.E.M.,7/20/09; 3:52 PM,43
Accident Waiting To Happen,4:03,Billy Bragg,1/8/15; 3:09 PM,11

Each line in the file represents one song in my itunes library, and the data for each song consists of: song title, song length/time, artist, date of purchase, and number of plays. Since this is a csv (comma-separated values) file, we can easily pull all of this info into python using file IO and str methods like split().

Before we pull in all of this data, how shall we store it? One way is to make a list for each song ([title,time,artist,date,plays]), and then store all of those song lists in another list. So we will have a list-of-lists!

Here's a simple example of python list-of-lists:

>>> L1 = list("abc")
>>> L2 = range(1,4)
>>> L3 = list("XYZ")
>>> LOL = [L1,L2,L3]
>>> print(LOL)
[['a', 'b', 'c'], [1, 2, 3], ['X', 'Y', 'Z']]
>>> print(LOL[0])
['a', 'b', 'c']
>>> print(LOL[0][2])
c

Notice how, using indexing twice ([0][2]) we can select items from the sub-lists!

implement a readFile(filename) function:

Given the name of the data file, read in all lines, store them in a list, and return them to main(). For example, in main(), I would like to call this function like this:

 data = readFile("/scratch/knerr/itunesData-Feb2016.csv")

and the readFile() function would take care of opening the file, reading in all of the lines, into a python list, closing the file, and returning the python list.

Start by just trying to read in all of the lines. Once you get that working, try to split() a line into a sub-list:

>>> line = "Abraham,5:12,Mark Erelli,4/5/09; 3:00 PM,2"
>>> data = line.split(",")
>>> print(data)
['Abraham', '5:12', 'Mark Erelli', '4/5/09; 3:00 PM', '2']
>>> data[4] = int(data[4])
>>> print(data)
['Abraham', '5:12', 'Mark Erelli', '4/5/09; 3:00 PM', 2]

Once you have all of the data store in a list-of-lists, then you can start asking some interesting questions!