CS21 Lab 8: Sorting

Due Saturday, November 14th, before midnight

Read through this entire lab before you begin coding!

Content warning: This lab deals with data about hospitals and the costs associated with certain medical issues (heart procedure, pneumonia, and knee/hip replacement). If dealing with this subject matter would make it difficult for you to work on the assignment, please contact an instructor as soon as possible to discuss alternatives.

Checkpoint

At the end of your lab session, be sure to run handin21.

Your lab instructor will check your files, which should reflect that you have made non-trivial progress (in python or pseudocode) towards the solution. Note that if you have not made much progress towards your solution, we expect that you would have been actively seeking help during your lab over Slack.

A portion of your final lab grade is dependent on this checkpoint.

If there are circumstances which prevent you from making substantial progress on this lab, please contact your lab instructor as soon as possible.

Goals

  • Continue to gain proficiency at top-down design and bottom-up implementation.

  • Continue practicing using a list of lists to store data.

  • Understand how to adapt selection sort to real-world data.

  • Understand how to adapt selection sort to sort from highest to lowest instead of lowest to highest.

Hospital dataset

For this assignment, we will continue to work with the hospital data we used in the previous lab. The hospital data is stored in CSV format and is located at: /data/cs21/hospital/hospitals.csv

As a reminder, here are some details about the data set, beginning with the first five lines and last five lines from this file:

Abbeville Area Medical Center,Abbeville,SC,4,0,14548,24265
Abbeville General Hospital,Abbeville,LA,4,0,16666,23508
Abbott Northwestern Hospital,Minneapolis,MN,4,21776,17981,19259
Abilene Regional Medical Center,Abilene,TX,2,23627,18312,25205
Abington Memorial Hospital,Abington,PA,2,23358,16744,20167
...
Yukon Kuskokwim Delta Reg Hospital,Bethel,AK,3,0,11547,0
Yuma District Hospital,Yuma,CO,3,0,16974,0
Yuma Regional Medical Center,Yuma,AZ,3,24631,18413,22354
Zuckerberg San Francisco General Hosp & Trauma Ctr,San Francisco,CA,1,20738,15537,24723
Zuni Comprehensive Community Health Center,Zuni,NM,-1,0,14541,0

The file is sorted by hospital name.

Each line contains seven features. The following table provides more detailed information about each of these features.

Table 1. Features of Hospital Data

Feature

Type

Description

Notes

0

string

Hospital name

1

string

City

2

string

State

two-letter mail code

3

integer

Overall rating 1-5

-1 means not rated

4

integer

Avg cost of heart procedure

0 means not reported

5

integer

Avg cost of pneumonia

0 means not reported

6

integer

Avg cost of hip/knee replacement

0 means not reported

In this lab we will be using all of the features in the data set, not just the subset we used last lab.

Using sorting to investigate hospital data

You will implement a program called sort-hospitals.py, which will allow the user to search through this dataset and learn interesting information. To start, copy your Lab 07 search-hospitals.py file into a new sort-hospitals.py that you will use for Lab 08:

cp ~/cs21/labs/07/search-hospitals.py ~/cs21/labs/08/sort-hospitals.py

Users may get a list of all the hospitals in a location sorted from highest ranked to lowest ranked.

Users may find the ten most affordable options in their state for different treatments.

Note: If you inspect the data, you will notice that there are actually 56 "states": the 50 US states, plus DC (Washington, D.C.), PR (Puerto Rico), GU (Guam), VI (Virgin Islands), MP (Northern Mariana Islands), and AS (American Samoa). You shouldn’t need to change your code at all to handle these extra "states", but you might want to test the affordable treatment option on GU, VI, MP, or AS since they each have fewer than 10 hospitals in the dataset.

Program Requirements

  • All numeric data in the file should be type cast to the correct type when stored in the program.

  • Repeatedly presents a menu to the user with five choices: to search by city and state, to search by name, to find the highest ranked hospitals by city and state, to find the most affordable treatment option by state, or to quit.

  • Ends when the user chooses to quit.

  • Ensures that the user provides a valid menu choice.

  • Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.

  • Summarizes the results of each search with a count and average overall rating (excluding ratings that were not reported).

  • Uses linear search to find exact matches by city and state.

  • Uses binary search to find matches by the prefix of a hospital name.

  • Uses selection sort to sort results.

  • You do not need to verify the user typed in a valid city or state. If the user types in a city or state that doesn’t exist, your program will report that no results were found.

Tips

1. Sorting results

In order to display the highest ranked hospitals or the most affordable hospitals, you should first use a search method to narrow down the amount of data you need to sort. You already have search functions written from Lab 7, so reuse your code if you can. For example, to display the highest ranked hospitals by location, first find all the hospitals in that location (you already did this for Lab 7) and then sort the resulting list by the ranking.

To find the most affordable hospitals in each state, you may choose to write a new search function or find a way to modify your existing search function. Once that is done, use this search function to find all of the hospitals in the state and then sort the resulting list by the treatment that the user is interested in.

If two (or more) hospitals have the same ranking or the same cost of treatment, it does not matter which order you display them. You don’t need to make sure the hospitals remain in alphabetical order.

2. Displaying treatment costs

Some hospitals do not report the costs of all of the procedures in the data set. To indicate this, the dataset lists the cost of a treatment at 0. When the user wants to display the hospitals sorted by a particular treatment cost, be careful not to list these hospitals as having the lowest cost. After finding the list of hospitals for a particular state, remove all of the hospitals where the treatment cost for the treatment you are looking for is 0. It’s OK if those hospitals list 0 as the treatment cost for other treatments, just not for the treatment cost you are sorting on. Now, when you sort the data by the user-specified treatment cost, the lowest cost hospitals will be first ones in your list.

3. Sorting from largest to smallest

Your selection sort algorithm finds the smallest item and puts it first. Then it finds the second smallest item and puts it second, etc. This gives you a list that is sorted from smallest to largest. If you want to build a list that is sorted from largest to smallest, you have two options:

  • You can change the selection sort algorithm so that it finds the largest element and puts it first, then finds the second largest element and puts it second, etc.

  • You can use your existing selection sort algorithm, then use the reverse() method on lists to reverse the order of the elements:

lst = [1, 2, 3]
lst.reverse()
print(lst)  # prints [3, 2, 1]

Extra Challenge

This is an optional extension and is not required.

In what state would you pay the least for a particular treatment? In what state would you pay the most? You can answer this question by finding the lowest cost hospital in each state for a particular treatment and then sorting those results.

Here is an example run showing how this extra challenge should work:

Answer the Questionnaire

Please edit the Questions-08.txt file in your cs21/labs/08 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​.

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.