CS21 Lab 7: Searching

Due Saturday, November 7th, before midnight

Read through this entire lab before you begin coding!

Content warning: This lab deals with data about hospitals and the costs associated with certain medical issues (heart procedure, pneumonia, and knee/hip replacement). If dealing with this subject matter would make it difficult for you to work on the assignment, please contact an instructor as soon as possible to discuss alternatives.

Goals

  • Continue to gain proficiency at top-down design and bottom-up implementation.

  • Practice reading data from a file.

  • Practice using a list of lists to store data.

  • Understand how to adapt linear search and binary search to real-world data.

Hospital dataset

For this assignment, we will be working with a real-world dataset from CORGIS (which is an acronym for: The Collection of Really Great, Interesting, Situated Datasets). We will be using a dataset that contains information about hospitals throughout the US with the goal of helping consumers make informed choices. This dataset contains many features, but we will be focusing on a small subset of these features.

The hospital data is stored in CSV format and is located at: /data/cs21/hospital/hospitals.csv

Here are the first five lines and last five lines from this file:

Abbeville Area Medical Center,Abbeville,SC,4,0,14548,24265
Abbeville General Hospital,Abbeville,LA,4,0,16666,23508
Abbott Northwestern Hospital,Minneapolis,MN,4,21776,17981,19259
Abilene Regional Medical Center,Abilene,TX,2,23627,18312,25205
Abington Memorial Hospital,Abington,PA,2,23358,16744,20167
...
Yukon Kuskokwim Delta Reg Hospital,Bethel,AK,3,0,11547,0
Yuma District Hospital,Yuma,CO,3,0,16974,0
Yuma Regional Medical Center,Yuma,AZ,3,24631,18413,22354
Zuckerberg San Francisco General Hosp & Trauma Ctr,San Francisco,CA,1,20738,15537,24723
Zuni Comprehensive Community Health Center,Zuni,NM,-1,0,14541,0

Notice that the file is sorted by hospital name.

Each line contains seven features. The following table provides more detailed information about each of these features.

Table 1. Features of Hospital Data

Feature

Type

Description

Notes

0

string

Hospital name

1

string

City

2

string

State

two-letter mail code

3

integer

Overall rating 1-5

-1 means not rated

4

integer

Avg cost of heart procedure

0 means not reported

5

integer

Avg cost of pneumonia

0 means not reported

6

integer

Avg cost of hip/knee replacement

0 means not reported

In this lab we will focus on features 0-3. In the next lab, which adds sorting, we will incorporate more functionality for features 4-6.

Using search to explore hospital data

You will implement a program called search-hospitals.py, which will allow the user to search through this dataset and learn interesting information.

Users may search by location:

Or users may search by hospital name:

Program Requirements

  • All numeric data in the file should be type cast to the correct type when stored in the program.

  • Repeatedly presents a menu to the user with three choices: to search by city and state, to search by name, or to quit.

  • Ends when the user chooses to quit.

  • Ensures that the user provides a valid menu choice.

  • Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.

  • Summarizes the results of each search with a count and average overall rating (excluding ratings that were not reported).

  • Uses linear search to find exact matches by city and state.

  • Uses binary search to find matches by the prefix of a hospital name.

Tips

1. Continue to use TDD

We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions. However, you may not work with a partner on this lab. Also you don’t need to get approval from an instuctor for your design, though we’d be glad to review it with you in lab, office hours, or ninja sessions. You should only begin implementing the other functions after you have a clear plan in place for the structure of main.

2. Create modular, reusable functions

Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of searchs (by location and by name), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of search results and displays them. Then you could call this function for either type of search.

3. Displaying matches

We want the results of our searches to be nicely displayed in the terminal window. However, some hospital names (and city names) are quite long. We can use slicing to truncate these strings to a fixed maximum length, allowing us to create an easy to read table format. This section of the online textbook explains how to slice a string.

The version of linear search that we discussed in class stops as soon as it finds the item it is looking for, and returns the index of that item or -1 if not found. For this problem, your linear search will need to accumulate a list of matching results and return that.

In addition, the basic version of linear search assumes that you are searching a flat list. We will be searching a list of lists, so will need to adapt linear search to use double indexing to get at the data within each sublist.

Because the hospital data is sorted by hospital name we can use binary search to very quickly locate a particular name. However, hospital names are quite long, and users may not know the exact name. So instead of searching for the full name, we will allow users to search using a prefix.

To see if a string begins with a particular prefix, the syntax is: <string>.startswith(<prefix>) which will return True or False.

For example, there are quite a few hospitals that start with "Mary" as shown below:

...
Martin Medical Center            <- this has a different prefix
Mary Black Health System Gaffney
Mary Black Health System Spartanburg
Mary Breckinridge Arh Hospital
Mary Bridge Children's Hospital
Mary Greeley Medical Center
Mary Hitchcock Memorial Hospital
Mary Hurley Hospital
Mary Immaculate Hospital
Mary Imogene Bassett Hospital
Mary Lanning Healthcare          <- suppose binary search finds this
Mary Rutan Hospital
Mary Washington Hospital Inc
Marymount Hospital
Mason District Hospital          <- this has a different prefix
...

You should modify binary search so that it uses startswith rather than == to find name matches. However, things get interesting once you find a match. Binary search will inform you of the index of the first match it finds, but there might be more matches just before that index and just after that index. So you’ll need to search in the vicinity of that location to find all of the other potential matches.

For example, suppose that the first match it finds for "Mary" is the one indicated above. You’ll need to search backward from there until you find a name that does not begin with that prefix and you’ll need to search forward from there until you find a name that doesn’t begin with that prefix. You should accumulate all of those hospitals and display their information.

Extra Challenge

This is an optional extension and is not required.

After completing a search by city and state, inform the user of which hospitals at that location offer the minimum cost for each type of procedure. Be sure to ignore hospitals with a cost of 0 as this simply means that they didn’t report any cost. Also there is no need to find the minimum costs if there is only a single match.

Here is an example run showing how this extra challenge should work:

Answer the Questionnaire

Please edit the Questions-07.txt file in your cs21/labs/07 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​.

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.