CS21 Lab8: Sorting and Searching

Due 11:59pm Tuesday, November 10, 2009

For this program you may work with one partner (you may not work in groups larger than two).

If you work with a partner:

The best way to work with a partner is to work together in the lab on all parts of the program together. You can work together on a copy of the code in one partner's cs21/labs/08 subdirectory and then email the file as an attachment to the other partner so that he/she can save a copy of your joint solution in his/her cs21/labs/08/ subdirectory too.

Run update21, if you haven't already, to create the cs21/labs/08 directory. Then cd into your cs21/labs/08 directory and create the python programs for lab 08 in this directory (handin21 looks for your lab 08 assignments in your cs21/labs/08 directory):

$ update21

$ cd cs21/labs/08
$ vim zipcodes.py
ZIP Code Database
In this lab, you will extend your previous searching and sorting functions from in class to work with more complex data, specifically ZIP code records. The file /usr/local/doc/bigzips.txt contains a list of records of 27 big cities with a population over 500,000. A portion of the file is shown below (you can browse the full file using vim /usr/local/doc/bigzips.txt).
28201,35.260,-80.804,Charlotte,Mecklenburg,NC,540828
02101,42.371,-71.027,Boston,Suffolk,MA,589141
21201,39.297,-76.623,Baltimore,Baltimore City,MD,651154
19019,40.002,-75.118,Philadelphia,Philadelphia,PA,1517550
98101,47.432,-121.803,Seattle,King,WA,563374
85001,33.704,-112.352,Phoenix,Maricopa,AZ,1321045
43085,40.040,-82.897,Columbus,Franklin,OH,711470

Each line of the file contains seven fields separated by commas representing the ZIP code, latitude, longitude, city name, county name, state, and population, respectively. The entry for Philadelphia is shown below:

19019,40.002,-75.118,Philadelphia,Philadelphia,PA,1517550

When you read in a line from this file in python you can use line.split(",") to separate the line into a list of strings. For this lab, complete the functions started for you in zipcodes.py and described below. You will be using these functions again in lab 09, so please follow the instructions carefully and do not modify the parameters, return types, or behavior of the functions described.

readZips
Write a function called readZips that opens a zipcode file, reads each line, and generates a list of zipcode records. Each zipcode record consists of a single list containing the fields [zip, latitude, longitude, city, county, state, population]. In other words, you will be creating a list of lists! Your function should take a single string parameter that provides the filename from which to read the data. For this lab, that parameter will be "/usr/local/doc/bigzips.txt" when readZips is called from main, but it may change in lab 09. Your function should return a newly created list of zip code records.

For each location, store all the information for that location in a list. ZIP, city, county and state should be stored as strings. Latitude and Longitude should be stored as floating points, and the population should be stored as an integer. For example, the variable zipinfo displays the proper format for Seattle

zipinfo=['98101', 47.432, -121.803, 'Seattle', 'King', 'WA', 563374]
Note that some elements in this list have quotes to indicate they are strings while numeric elements do not have quotes. When you use line.split(",") when reading an input line from the file, latitude, longitude, and population will not be automatically converted to numeric types. You must modify the result of split(",") to convert some of the types

Store information about all locations in a single list, where each element in the list is a list describing one location as described above. You are thus creating a lists of lists. Using the example zipinfo above, the following code creates a list containing one location, and retrieves the name Seattle

allZips = [] #a list
allZips.append(zipinfo) #add a list as the first item to allZips
print len(allZips) #what is this? 7? 1? 2? something else?
location = allZips[0] #a list with zip, lat, long, ....
print location[3] #prints Seattle
Once your function readZips is implemented, add some test code to main to check that readZips function works for the file /usr/local/doc/bigzips.txt
Selection sort for lists
Implement the function selectSort(ls, idx). Notice that unlike the sort described in class, this version has two parameters. The first parameter ls must be a list of zip code records, (a list of lists). The second parameter is a non-negative integer between 0 and 6, inclusive. This parameter tells the function on which field in a zip record to sort. Thus selectSort(allzips, 0) should sort by increasing zip codes, while selectSort(allzips,1) should sort by increasing latitude (the field at index 1 in each zipcode record). The modifications needed to the standard selection sort algorithm described in class are fairly minor, but you should think about the modifications needed before implementing your solution.

Your sort should modify the list ls in place. Thus, it does not need to return a new list.

Once your sort is implemented, modify main to sort the zipcodes in /usr/local/doc/bigzips.txt by zipcode and print the result with one zipcode record per line.

zipSearch
Now write a function zipSearch that searches a list ls of zip code records sorted by zip for a specific zipcode. This function should have two parameters: ls (a list of lists) and zipcode (a string). Because the list is sorted by zip code, you can and must use binary search to find the zipcode specified by the second parameter. You will need to make some slight modifications to the standard binary search algorithm from in class to make this work. Hint: think about the types of any items you compare. Your function should return the integer index of the location in the list ls where the specified zipcode is found, or return -1 if no matching record is found.

Once your function is implemented, write a small test in main to verify that your function works.

citySearch
The function citySearch should allow users to search for a specific city and state e.g., Philadelphia,PA. This function has three parameters:
  1. ls: A list of zip code entries
  2. city: A string describing the city name
  3. state: Two letter upper case string for the state abbreviation

Your function should return the integer index of the location in the list ls where the specified city, state is found, or return -1 if no matching record is found.

This function should use linear search. Again, slight modifications to the algorithm discussed in class are needed. Again, you should test your implementation in main once your function is finished.

getValidZip
Implement the function getValidZip which prompts the user to enter a valid zip code. The function should take a single parameter containing a list of zipcode records sorted by zip code. The function should return the list containing all the zip info for a valid location. If the user enters an invalid location,the function should repeatedly ask for a valid location. For this lab, a zipcode is valid if it can be found in the list of zipcodes provided. This a is a very limited set of valid zip codes. See the example below. The restrictions on valid zipcodes will be relaxed a bit in lab 09 when we provide a much larger zipcode file.
Enter a zipcode: test
Sorry, nothing found for zipcode test. Try again
Enter a zipcode: 19081
Sorry, nothing found for zipcode 19081. Try again
Enter a zipcode: 19019
19019,40.002,-75.118,Philadelphia,Philadelphia,PA,1517550
The last line above displaying the zip info was generated by main not by getValidZip. The function getValidZip should simply return an appropriate list.

main
To demonstrate that your functions work, implement a main function that does the following:
  1. reads all the zip code info in "/usr/local/doc/bigzips.txt"
  2. sorts the info by zip code
  3. prompts the user for a valid zip code and displays the info for that location
  4. prompts the user for a city,state pair and displays the info for that location if found, or an error message if no such city state is found (you do not need to repeatedly prompt the user in this case)
I have included some sample output of the main function to illustrate that other functions work properly.
Tip: ZIPs as Strings
While you will be doing actual numerical computations with latitudes, longitudes, and populations in lab 09, you do not need to numerically compare ZIP codes (It is OK to do lexicographic string comparisons of ZIPs in the binary search). Furthermore, ZIP codes beginning with zeroes will be truncated if interpreted as integers, e.g., 01238 will be interpreted as 1238. Store and print ZIP codes as strings instead of numbers and you will save yourself a number of headaches.


Submit

Once you are satisfied with your program, hand it in by typing handin21 in a terminal window.