CS 21: Algorithmic Problem Solving

 

HW #7: ZIP Code locations

due 11:59pm Tuesday, March 27

Remember to run update21 to get the files needed for this assignment. The program handin21 will only submit files in the cs21/homework/7 directory.

Your programs are graded on both correctness and style. Please review the comments regarding programming style on the main page

Some of the problems will have optional components that allow you to further practice your skills in Python. Optional portions will not be graded, but may be interesting for those wanting some extra challenges.

ZIP Code Database
What US city has the ZIP code 12345? What is the ZIP code for Truth or Consequences, NM? How far is it from Fairbanks, AK to Miami, FL? What US city with a population over 100,000 is closest to Minot, ND? You assignment is to write a program that answers these types of questions.

The file /usr/local/doc/zipcodes.txt contains the data you need to write your program. Each line of the file contains seven fields separated by commas representing the ZIP code, latitude, longitude, city name, county name, state, and population, respectively. The entry for Swarthmore is shown below:

19081,39.897162,-075.344083,Swarthmore,Delaware,PA,6170

The full file has entries for 41824 unique ZIP codes. You can open up the file in idle or any other text editor if you would like to look at the file in more detail.

Your program should prompt the user for two cities. The user can lookup a city by ZIP code or by a city name and its two letter state abbreviation. For each city, display the following information:

In addition to the information above, display the distance between the two cities the user typed.

A sample run is shown below:

Enter zip code OR city,state: Swarthmore, PA
 City, State, ZIP: Swarthmore, PA, 19081
           County: Delaware
       Population: 6170
       Lat / Long: 39.90 / -75.34
Next closest city: Philadelphia, PA (14.0 miles)

Enter zip code OR city,state: 90210
 City, State, ZIP: Beverly Hills, CA, 90210
           County: Los Angeles
       Population: 33784
       Lat / Long: 33.79 / -118.30
Next closest city: Burbank, CA (0.0 miles)

Distance between Swarthmore, PA and Beverly Hills, CA = 2389.0 miles

If you cannot find an entry for a particular city or ZIP code, inform the user that you cannot find that location and prompt them to enter another location.

You are required to use at least one dictionary to store the records for each city. This dictionary must be indexed on ZIP code. For example, if the dictionary is called zip_dict then zip_dict['19081'] should return all the necessary information about Swarthmore, PA. You may use other dictionaries and lists as needed, but your implementation must include this specific dictionary.

How you design your program is very much up to you. You should sketch out a basic outline of your program on paper before you begin coding.

We have included some tips and problems to watch out for below.

ZIPs as Strings
While you will be doing actual numerical computations with latitudes, longitudes, and populations, you do not need to numerically compare ZIP codes. Furthermore, ZIP codes beginning with zeroes will be truncated if interpreted as integers, e.g., 01238 will be interpreted as 1238. Store and print ZIP codes as strings instead of numbers and you will save yourself a number of headaches.

Reporting ZIP codes

Some large cities (Philadelphia, PA, for example) have multiple zip codes. When asked to report a ZIP code for a city, you can decide how to handle this. Possible options include reporting all ZIP codes for a particular city, reporting the first ZIP code you find for that city, reporting one example ZIP code and stating there are other possibilities, or designing a method of your own choosing. You must report at least one ZIP code for a city if it is in the list of known cities.

Reporting Closest Cities

Be careful when reporting the closest city to a given city. Do not report the same city. For example, the city with a population over 100,000 closest to Philadelphia, PA, should not be Philadelphia, PA, but some other city. This is further complicated by the fact that many big cities have multiple ZIP codes, when reporting the closest big city, ensure that the city names are in fact different.

Computing Distances

You should use the "great circle" distance formula to compute the distance between two cities. If the first city is at a geographic location (lat1, long1), and the second city is at (lat2, long2), then the distance between the two cities is given by the formula:

D = R acos( sin(lat1)sin(lat2) + cos(lat1)cos(lat2)cos(long2-long1))

Where R is the radius of the Earth (6371 km or 3963 miles), and acos is the inverse cosine. You can get this function by importing the math module in python. Note that each lat/long must be converted from degrees to radians before computing a sine or cosine. The formula for this conversion is

rad = deg*pi/180 


Missing Data

This data file we have provided you is by no means complete. Some ZIP codes were missing or did not have lat/long data and were removed. For some cities, we did not have population data, so we set the population arbitrarily to 0. If your favorite US city or hometown in the US is missing and you know all the info (ZIP, city name, county name, state, lat, long, and population), let us know and we will be happy to add a few cities, but we are not trying to maintain a comprehensive list.

Optional Components
As noted above, these questions are NOT required to receive full credit. There are many extensions you could add to this program. Here are a few we thought might be interesting.

k-closest cities

Instead of printing the closest city with a population over 100,000 (or some other threshold), find the k-closest large cities. Make sure that your cities are distinct. For example, if the user asks for the five closest large cities to Swarthmore, do not report five separate ZIP codes that are all located in Philadelphia.

Most isolated city

Find the city whose population is less than 100,000 (or some other threshold) and whose closest city with a population over 100,000 is furthest away. What happens as you change the population threshold? Note that this computation could be very time consuming. It might help to first build a separate list or dictionary of only the cities with a population over 100,000. There are only 224 such cities in our list, but these cities have over 6800 ZIP codes. On my desktop, computing the closest city out of all possible cities for 10 separate cites took just under two seconds. For 1000 separate cities the calculation takes just over two minutes. Computing the distance between every pair of cities in the data set would take about an hour and a half. I don't recommend waiting that long for this assignment.

Plot the cities

Plot a map of the locations of all the ZIP codes in a particular numerical range, a particular state, or a particular county. We haven't tried plotting all the ZIP codes, so this is an other option where you might want to start small and see how many ZIP codes you can plot. If you get really fancy, you can draw circles to represent a ZIP code that is proportional to the size of the city. You could end up with something cool or a big messy blob. Excluding Alaska and Hawaii helps with the drawing, but makes Alaskans and Hawaiians mad. One cool version of this idea is the zipdecode project.