CS21 Lab10: ZIP Codes

Due 11:59pm Tuesday, November 18, 2008

You may work with one partner on this assignment. If you work with a partner, put your name and the name of your partner in the list of authors at the top of your program. Only one partner needs to run handin21 to submit the files for the group. Both partners will receive the same grade.

Some of the problems will have optional components that allow you to further practice your skills in Python. Optional portions will not be graded, but may be interesting for those wanting some extra challenges.

Run update21, if you haven't already, to create the cs21/labs/10 directory. Then cd into your cs21/labs/10 directory and create the python programs for lab 10 in this directory (handin21 looks for your lab 10 assignments in your cs21/labs/10 directory):

$ update21

$ cd cs21/labs/10
$ vim zipcode.py
ZIP Code Database
What US city has the ZIP code 12345? What is the ZIP code for Truth or Consequences, NM? How far is it from Fairbanks, AK to Miami, FL? What US city with a population over 100,000 is closest to Minot, ND? Your assignment is to edit the file zipcode.py to create a program that answers these types of questions.

The file /usr/local/doc/zipcodes.txt contains the data you need to write your program. Each line of the file contains seven fields separated by commas representing the ZIP code, latitude, longitude, city name, county name, state, and population, respectively. The entry for Swarthmore is shown below:


The full file has entries for 41824 unique ZIP codes. You can open up the file in vim or any other text editor if you would like to look at the file in more detail. When you read in a line from this file in python you can use line.split(",") to separate the line into a list of strings.

Your program should prompt the user for two cities. The user can lookup a city by ZIP code or by a city name and its two letter state abbreviation. For each city, display the following information:

In addition to the information above, display the distance between the two cities the user typed.

A sample run is shown below:

$ python zipcodes.py

 This program creates a database of information about cities
   A user can then enter information about two cities (either
   by zipcode or by city and state) and get the distance
   between them.

City #1:
Enter a zipcode or a city,state: 53703
  City, State, Zip: Madison,WI, 53703
  County: Dane
  Population: 249307
  Lat/Long: 43.08/-89.38
 The closest large city is Rockford,IL (pop: 188073, dist: 53.15 miles)

City #2:
Enter a zipcode or a city,state: Swarthmore,PA
  City, State, Zip: Swarthmore,PA, 19081
  County: Delaware
  Population: 9907
  Lat/Long: 39.90/-75.35
 The closest large city is Philadelphia,PA (pop: 1517913, dist: 6.07 miles)

 Distance between Madison,WI and Swarthmore,PA is 758.47 miles

Check more locations? (yes,no): yes

City #1:
Enter a zipcode or a city,state: 19081
  City, State, Zip: Swarthmore,PA, 19081
  County: Delaware
  Population: 9907
  Lat/Long: 39.90/-75.35
 The closest large city is Philadelphia,PA (pop: 1517913, dist: 6.07 miles)

City #2:
Enter a zipcode or a city,state: 59453
  City, State, Zip: Judith Gap,MT, 59453
  County: Wheatland
  Population: 307
  Lat/Long: 46.68/-109.64
 The closest large city is Billings,MT (pop: 112078, dist: 77.76 miles)

 Distance between Swarthmore,PA and Judith Gap,MT is 1773.76 miles

Check more locations? (yes,no): yes

City #1:
Enter a zipcode or a city,state: San Francisco,CA
  City, State, Zip: San Francisco,CA, 94101
  County: San Francisco
  Population: 776733
  Lat/Long: 37.78/-122.73
 The closest large city is Daly City,CA (pop: 110768, dist: 15.19 miles)

City #2:
Enter a zipcode or a city,state: Boston,MA
  City, State, Zip: Boston,MA, 02101
  County: Suffolk
  Population: 359585
  Lat/Long: 42.37/-71.03
 The closest large city is Cambridge,MA (pop: 101391, dist: 2.94 miles)

 Distance between San Francisco,CA and Boston,MA is 2712.36 miles

Check more locations? (yes,no): no
okay, check you later

If you cannot find an entry for a particular city or ZIP code, inform the user that you cannot find that location and prompt them to enter another location.

You should practice good top-down design, incrementally implement and test your solution, and document your code with comments. While much of the design is up to you, the requirements below are designed to avoid some headaches in the initial design.
  1. For each location, store all the information for that location in a list. ZIP, city, county and state should be stored as strings. Latitude and Longitude should be stored as floating points, and the population should be stored as an integer. For example, the variable zipinfo displays the proper format for Swarthmore
    zipinfo = ["19081", 39.897562, -75.346584, "Swarthmore", "Delaware", "PA", 9907]
    Note that some elements in this list have quotes to indicate they are strings while numeric elements do not have quotes. When you use line.split(",") when reading an input line from the file, latitude, longitude, and population will not be automatically converted to numeric types. Instead, you need to process the split list to convert some buckets from strings to other types. To convert the string in bucket i to a float you can do something like this:
    zipinfo[i] = float(zipinfo[i])
  2. Store information about all locations in a single list, where each element in the list is a list describing one location as described above. You are thus creating a lists of lists. Using the example zipinfo above, the following code creates a list containing one location, and retrieves the name Swarthmore
    allZips = []            # a list
    allZips.append(zipinfo) # add a list as the first item to allZips
    print len(allZips)      # what is this?  7? 1? 2?  something else?    
    location = allZips[0]   # a list with zip, lat, long, ....
    print location[3]       # prints Swarthmore
    If you read the file zipcodes.txt sequentially, the lines are sorted by increasing zipcode. You must preserve this order in you list of all locations. The resulting list of lists might look like this (allZips list is drawn on its side):
      0 |  --|---> ["00501",40.922326,-072.637078,"Holtsville","Suffolk","NY",12119]
      1 |  --|---> ["00544",40.922326,-072.637078,"Holtsville","Suffolk","NY",12119]
      2 |  --|---> ["01001",42.070610,-072.620290,"Agawam ","Hampden","MA",16576]
        | .  |
        | .  |
        | .  |
    n-2 |  --|---> ["99929",56.409507,-132.338228,"Wrangell","Wrangell Petersburg","AK",2424]
    n-1 |  --|---> ["99950",55.875767,-131.466339,"Ketchikan","Ketchikan Gateway","AK",14066]
  3. When searching for a location by ZIP code, you must use binary search to find the location in the list of all locations. This binary search should return the index into the allZips list with the matching entry or -1 if no match. The database file is sorted by zipcode, so when you create the allZips list, it is already in sorted order by zipcodes. When you do the comparison in binary search, make sure you are comparing the search value to correct bucket of an entry in the allZips list (allZips[i] is a list, and in that the zipcode value is in bucket 0). In the week10 in-class file, mergesort.py, is code for binary search on a list of ints, you can use this as a starting point for your implemention of binary search for the zipcode list.

    When searching on city and state, you will need to do a linear search and remember that you need to compare two bucket values in each list entry to find a match.

In addition to the requirements above, we have included some tips and problems to watch out for below.

ZIPs as Strings
While you will be doing actual numerical computations with latitudes, longitudes, and populations, you do not need to numerically compare ZIP codes (It is OK to do lexicographic string comparisions of ZIPs in the binary search). Furthermore, ZIP codes beginning with zeroes will be truncated if interpreted as integers, e.g., 01238 will be interpreted as 1238. Store and print ZIP codes as strings instead of numbers and you will save yourself a number of headaches.

Reporting ZIP codes

Some large cities (Philadelphia, PA, for example) have multiple ZIP codes. When asked to report a ZIP code for a city, you can decide how to handle this. Possible options include reporting all ZIP codes for a particular city, reporting the first ZIP code you find for that city, reporting one example ZIP code and stating there are other possibilities, or designing a method of your own choosing. You must report at least one ZIP code for a city if it is in the list of known cities.

Reporting Closest Cities

Be careful when reporting the closest city to a given city. Do not report the same city. For example, the city with a population over 100,000 closest to Philadelphia, PA, should not be Philadelphia, PA, but some other city. This is further complicated by the fact that many big cities have multiple ZIP codes, when reporting the closest big city, ensure that the city names are in fact different.

Computing Distances

You should use the "great circle" distance formula to compute the distance between two cities. If the first city is at a geographic location (lat1, long1), and the second city is at (lat2, long2), then the distance between the two cities is given by the formula:

   D = R acos( sin(lat1)sin(lat2) + cos(lat1)cos(lat2)cos(long2-long1)) 

Where R is the radius of the Earth (6371 km or 3963 miles), and acos is the inverse cosine. You can get this function by importing the math module in python. Note that each lat/long must be converted from degrees to radians before computing a sine or cosine. The formula for this conversion is

   rad = deg*pi/180 
or use the python function radians imported from the math library.
Missing Data

This data file we have provided you is by no means complete. Some ZIP codes were missing or did not have lat/long data and were removed. For some cities, we did not have population data, so we set the population arbitrarily to 0. If your favorite US city or hometown in the US is missing and you know all the info (ZIP, city name, county name, state, lat, long, and population), let us know and we will be happy to add a few cities, but we are not trying to maintain a comprehensive list.

Optional Components
Some ideas for optional extentions are here. Do not try these before you have solved and completely tested the required parts of the assignment.

Once you are satisfied with your program, hand it in by typing handin21 in a terminal window.