CS21 Lab 8: Searching a Dataset

Due Saturday, November 16, before midnight


Lab Content Warning - READ THIS FIRST

This lab asks students to work with sensitive data on the topic of gun violence in the U.S. The dataset we’ll be searching contains records of fatal police shootings as recorded by the Washington Post.

This dataset is disturbing, so we encourage you to prepare yourself before proceeding. If you believe that working on this lab will be personally traumatizing, please contact your CS 21 instructor to discuss the lab as soon as possible (by Monday, November 11).

For context, we have chosen to use this dataset because it contains real data surrounding an important societal issue. As you progress in computer science, we believe that it’s important to make connections between the technical content we’re discussing in CS classes and the ways in which it might be used to affect the world. This lab is an attempt to do that.

Please be respectful when talking to others about the lab’s content. We want CS 21 to be an open space that invites a civil exchange of ideas.


Goals

  • Write a program that uses linear search

  • Write a program that uses binary search

  • Continue practicing top-down design (TDD)

  • Connect CS topics with social issues

Notes

Please read through the entire lab before starting!

This is a one week lab. For the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. Since the design process should be similar to the previous lab, you will be doing the design and implementation for lab 8 in a single week. If you found the open-ended nature of the last lab challenging, you will want to start this lab early.

Police Shooting Data

Since January 1, 2015, the Washington Post has been compiling a database of every fatal shooting in the United States by a police officer in the line of duty. The Post is documenting only those shootings in which a police officer, in the line of duty, shot and killed a civilian — the circumstances that most closely parallel the 2014 killing of Michael Brown in Ferguson, Mo., which began the protest movement culminating in Black Lives Matter and an increased focus on police accountability nationwide. For this lab, we’ll be using a simplified version of this real dataset that comes from the CORGIS educational dataset archive.

While you may find this topic to be morbid, we hope that this lab illustrates how the skills you’re learning in computer science can be applied to other disciplines, including relevant societal issues.

Searching the Dataset

Your task is to write a program that:

  • Reads in the dataset from a file, storing a list of records. Each record contains details about the incident, including the name and demographic information of the deceased, the date of the incident, and the location where it occurred.

    The data is stored in the file /usr/local/doc/shooting-data.txt (don’t copy this file to your 08 directory, just use "/usr/local/doc/shooting-data.txt" as the file name in your python program). Each line of the file contains the following fields separated by commas:

    Name,Age,Sex,Race,Year,Month,Day,City,State

    Here’s one example:

    Andrew Jospeh Todd,20,Male,White,2015,12,12,East Stroudsburg,PA
  • Prompts the user with a menu of four choices:

    Please select one of the following choices:
    1. Find by name
    2. Find by date
    3. Map state
    4. Quit
    
    Choice?

    Depending on the user’s selection, you will prompt them for additional information and then do one of:

    1. Search the database by name, using binary search.

    2. Search the database by date, using linear search.

    3. Produce a graphical state map to display the locations of police shootings by state.

    4. Exit the program.

You program should continue prompting the user until they enter a valid choice and then perform the following queries listed below for each menu option. The program should exit if the user selects option 4 (Quit).

Find by name

For option 1 (find by name), you must:

  • Prompt the user to enter a name.

  • Use binary search to find the name in your list of records. The data file provided lists each incident in increasing order by name.

  • Print the information for the record matching the user-entered name, if there is one. See the note about printing records in the tips section below.

  • If the name was not in the list, let the user know you couldn’t find it.

You may assume that the names in the dataset are unique, and when grading, we will only query for names that are unique.

In reality, there are a few duplicate names (e.g., Robert Martinez) and several entries for which the name is missing (appearing as "TK TK"). These issues are some of the challenges of working with real data.

Here are some examples:

$ python3 police_shootings.py

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 1
Name to search for? Adam Smith

Adam Smith, 33, Male, White, 2016, 7, 7, Clovis, CA

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 1
Name to search for? Jane Fakename

Could not find any records matching name: Jane Fakename

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 4

Find by date

For option 2 (find by date), you must:

  • Prompt the user for a year, then prompt for a month, then prompt for a day. You may assume that the user will type in strings that can be converted to an integer with int().

  • Use linear search to find all the records that match the entered date. You should expect that, for most dates, there will be more than one matching record. Therefore, the search function you write should return a list of matching records.

  • Print the information for all matching records, one per line, if there were any matches. See the note about printing records in the tips section below.

  • If no records match the entered date, let the user know that you couldn’t find any.

Here are some examples:

$ python3 police_shootings.py

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 2
Year? 2016
Month? 8
Day? 19

Jorge Ceniceros, 22, Male, Hispanic, 2016, 8, 19, Phoenix, AZ
Kole B. Knight, 31, Male, White, 2016, 8, 19, New London, WI
Kyle Zimbelman, 29, Male, White, 2016, 8, 19, Reno, NV
Marcelo Luna, 47, Male, Hispanic, 2016, 8, 19, East Hollywood, CA
Todd P. Browning, 54, Male, White, 2016, 8, 19, East Ridge, TN

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 2
Year? 2025
Month? 10
Day? 22

Could not find any records matching that date.

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 4

Map State

For option 3 (map state), you must:

  • Prompt the use for a state abbreviation (e.g., PA).

  • If the state abbreviation is valid, you should draw the boundary of the given state (with help from a provided library) and draw a Point at the location of each shooting recorded in that state.

  • If the state abbreviation is not valid, print a suitable error message.

Here are some examples:

$ python3 police_shootings.py

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 3
Which state? PB
No state found for PB

Unknown state: PB

Please select one of the following choices:
1. Find by name
2. Find by date
3. Map state
4. Quit

Choice? 3
Which state? PA
Map of Pennsylvania with dots to indicate fatal police shooting locations.

Provided Library

The police_shootings_lab library has some helpful tools for managing shooting records in your program.

Shooting Class

The library provides a Shooting class, which encapsulates all the information you’ll need to keep track of one record. You can create an instance object of this class by calling the Shooting() constructor with each piece of information from one row of the shooting-data.txt file passed in as a separate parameter. Each number (e.g., age, year) is an integer, and all other fields are strings.

You can call the following methods on a Shooting object to retrieve information about it:

  • get_name(): Retrieve the name string.

  • get_age(): Retrieve the age integer.

  • get_sex(): Retrieve the sex string.

  • get_race(): Retrieve the race string.

  • get_year(): Retrieve the year portion of the incident date.

  • get_month(): Retrieve the month portion of the incident date.

  • get_day(): Retrieve the day portion of the incident date.

  • get_city(): Retrieve the city string.

  • get_state(): Retrieve the state string (abbreviated, e.g., PA).

For example:

>>> from police_shootings_lab import *
>>> record = Shooting("John Doe", 40, "Male", "White", 2019, 10, 22, "Nowhere", "PA")
>>> print(record.get_year())
2019
>>> print(record.get_state())
PA
>>> print(record)
John Doe, 40, Male, White, 2019, 10, 22, Nowhere, PA

To create a list of all the shooting records from a file, you could use a sample function like the one below:

def read_input(filename):
    shootings = []
    f = open(filename, 'r')
    for line in f:
        line = line.strip().split(",")
        shooting = Shooting(line[0], int(line[1]), line[2], line[3],
                            int(line[4]), int(line[5]), int(line[6]),
                            line[7], line[8])
        shootings.append(shooting)
    f.close()
    return shootings

Mapping a State

The library also provides a get_state_graph_window(state) function. If you pass this function a valid two-letter state abbreviation, the function will create and return a GraphWin object with the boundary of the state shown in black and the coordinate of the window set to be the minimum and maximum longitudes and latitudes of the state using setCoords. If the provided state string is not a valid abbreviation, a special None object is returned. The None type is not a GraphWin object and you cannot do anything with it besides check if it is None.

Here’s an example of this function:

state = "PA"
win = get_state_graph_window(state)
if win == None:
    print("Unknown state!")
    return

# Window is OK. You can do drawing here

# Wait for mouse click before closing
win.getMouse()
win.close()

City Coordinates

Finally, the library provides a get_city_coordinates(city, state) function. If you pass this function a valid city name and two-letter state abbreviation, it will return a list containing the [latitude, longitude] coordinates of the city. Note: this function only knows about coordinates for the cities listed in the shooting-data.txt dataset. If you pass in a city/state pair that isn’t in the dataset, it will return None:

>>> from police_shootings_lab import *

>>> print(get_city_coordinates("Philadelphia", "PA"))
[39.9527237, -75.1635262]

>>> print(get_city_coordinates("Swarthmore", "PA"))
None

Requirements, Hints, and Tips

  • You should practice good top-down design, incrementally implement and test your solution, and document your code with comments. Start with the main menu and get a skeleton of your code working first. Then add each option from the menu one at a time (start with quit as it is easy and will allow you to exit your program when you are testing it). While much of the design is up to you, the requirements below are designed to avoid some headaches in the initial design.

  • When reading the input data file, process the file once and store all the records as a list of Shooting objects. Reading the file once will make it faster to process later. Using the Shooting list will avoid some list of list headaches.

  • To print an instance object of the Shooting class, you can call print() directly on the object:

    record = Shooting("John Doe", 40, "Male", "White", 2019, 10, 22, "Nowhere", "PA")
    print(record)

    The class knows how to format itself into a string, so you can avoid writing a nasty formatting string.

  • Validate menu selections. If the user doesn’t enter a valid number (1-4), let the user know it wasn’t a valid selection and prompt them again for a new menu item.

  • When reading file input, split() only generates a list of strings. Some of the strings may need to be converted to integers.

  • Latitude is a y-like coordinate, and Longitude is x-like. Mixing these up will likely cause you to not see any cities on your map if everything else is working correctly:

    from police_shootings_lab import *
    
    win = get_state_graph_window("PA")
    
    #Only one of these points will display on the PA map
    pt1 = Point(-76, 40)
    pt2 = Point(40, -76)
    
    pt1.setFill("black")
    pt2.setFill("gold")
    
    pt1.draw(win)
    pt2.draw(win)
    
    win.getMouse()
    win.close()

Extra Challenge

This will not affect your grade or gain extra points. It is included just as an extra challenge, if you are interested (and have already finished the entire assignment as described above).

The mapping portion of the assignment asks you to draw Point objects for each city with a recorded shooting, but it doesn’t account for multiple incidents in the same city — the repeated cities just draw additional points on top of each other. Instead, modify the lab to draw circles whose radius increases as a city appears more frequently.

For example, here’s what a map of PA might look like using Circle objects, centered at the latitude and longitude of the city, whose radius is .01 * num_city_appearances:

Map of Pennsylvania with circles to indicate fatal police shooting locations.  Larger circles represent more shootings at that location.

Answer the Questionnaire

Please edit the Questions-08.txt file in your cs21/labs/08 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​.

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.