CS21 Lab 10: Sorting

Due Saturday, April 16, by 11:59pm


Please read through the entire lab before starting!

In this lab you will continue using the Hyrdoelectric Dam data set from Lab 9, but support filtering and sorting operations instead of searching.

Programming Tips

As you write programs, use good programming practices:

  • Use a comment at the top of the file to describe the purpose of the program (see example).

  • All programs should have a main() function (see example).

  • Use variable names that describe the contents of the variables.

  • Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.

  • Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.

  • Avoid writing any lines of code that exceed 80 columns.

    • Always work in a terminal window that is 80 characters wide (resize it to be this wide)

    • In vim, at the bottom left in the window, there is an indication of both the line and the column of the cursor.

Are your files in the correct place?

Make sure all programs are saved to your cs21/labs/10 directory! Files outside that directory will not be graded.

$ update21
$ cd ~/cs21/labs/10
$ pwd
/home/username/cs21/labs/10
$ ls
Questions-10.txt
(should see your program files here)

Goals

  • Write a program that filters data by different criteria

  • Write a program that sorts data by different fields

  • Continue practicing Top-Down Design (TDD)

  • Connect CS topics with real-world data

1. Filtering the Hydroelectric Dam Data

1.1. Hydroelectric Dam Data Set

Refer to Lab 9 for a overview of the Hydroelectric Dam Dataset.

Recall that each line contains 7 features. The following table provides information about these features.

Table 1. Features of Hydroelectric Dam Data

Feature index

Type

Description

0

string

Dam name

1

string

Watercourse name

2

string

County

3

string

State

4

float

Crest Elevation

5

float

Crest Length

6

int

Year

In this lab we will focus on using features 3 and 6.

You will implement a program called sort-dams.py, which will allow the user to filter this data set and sort by different fields.

1.2. Program Requirements

  • Reads in stored data from a file, storing a list of records. The data is stored in the file:

    /data/cs21/hydropower/hydropower.csv

    Note: don’t copy this file to your labs/10 directory, just use "/data/cs21/hydropower/hydropower.csv" as the file name in your python program.

    As you read in the data, each line of the file corresponds to one record, which your program should represent using a HydroDam object and store in a list.

    This is likely the same as your solution to lab09 and you may copy over the code to read this data over to your lab10 solution.

  • Repeatedly presents a menu to the user with six choices shown below:

    Please select one of the following choices:
    1. Filter dams by state name
    2. Filter dams by year
    3. Sort by state name
    4. Sort by year
    5. Reset list to all dams
    6. Quit
    
    Choice?

    Depending on the user’s selection, prompt them for additional information and then do one of:

    1. Filter the data set by state name.

    2. Filter the data set by min and max year

    3. Sort the current set of records by state name

    4. Sort the current set of records by year

    5. Reset the current set of records to the original data set.

    6. Exit the program.

  • When filtering, your program should prompt for the filter criteria (state name or min/max year) and return a new list of records that match only the filter criteria.

  • Ensures that the user provides valid input (menu choices and numerical years).

  • After a filter or sort step, print all current records in the appropriate order

  • Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.

1.3. Example Output

1.4. Tips

1.4.1. Continue to use TDD

We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions.

1.4.2. Create modular, reusable functions

Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of filters (by state and by year), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of results and displays them. Then you could call this function for either type of filter or sort.

Likewise, sorting by year and sorting by state can re-use the same swap function. You may need to copy/modify the inclass examples to sort objects by different fields (year/state name). Make sure when doing a comparison to determine which item comes first, you only compare the appropriate field/subpart of each object.

1.4.3. Displaying matches

You can re-use the same functions to display matches as you did in lab 09. Refer to Lab 09 section 1.4.3 for string slicing and display tips.

1.4.4. Applying filters

The initial set of records should be all dams. However, anytime you apply a filter operation, the filter operation replaces the current set of records with the records matching the filter. Thus if you filter by the state name "Montana" and then sort by year, you should only display the dams in Montana sorted by year. The reset option restores the current set of records to be all records.

A sample non trivial workflow might be to filter by a state (e.g., "Utah"), sort by year, then filter by year (e.g. 1970-2000). The current list of records after this sequence would be dams in Utah built between 1970 and 2000.

When searching for matching records and applying the filter, you do not know how the records are sorted (they could change after a sort operation). For this reason, you can use a linear search strategy for all your filters.

When filtering by year, you just need to check if the year entered by the user is a non-negative integer. Years of 500, or 2045 are fine inputs, even if there are no dams in the data set from these years.

1.4.5. Adapting sort

You should use one of the quadratic sorts discussed in class. However, you will need to adapt your algorithms to sort objects instead of just a list of ints. You will likely need to have two different but similar sort functions to sort by year and sort by state. The basic structure of both algorithms will be the same. Only the way you do the comparisons would be different.

2. Provided Library

The lab comes with a hydrodam_lab library that gives you several helpful tools intended to make this lab easier:

2.1. HydroDam Record Class

The library provides a HydroDam class, which encapsulates all the features needed to store one "record." You can create an object of this class by calling the HydroDam() constructor with each field of information passed in as separate parameters. The types of the fields are specified in the table below.

Once you have a HydroDam object, you can use methods to retrieve information about it:

Table 2. Features of Hydroelectric Dam Data with Getter Methods

Parameter

Type

Description

Getter Method

0

string

Dam name

get_name()

1

string

Watercourse name

get_watercourse()

2

string

County

get_county()

3

string

State

get_state()

4

float

Crest Elevation

get_height()

5

float

Crest Length

get_length()

6

int

Year

get_year()

You’ll need to get the various fields from the CSV file, convert them to the correct types, and then pass them to the constructor. You can use the string method .split() in order to turn one long string into a list of shorter strings. This method takes an argument to let you specify what to use as the "separator;" in this case, you’ll want to split based on commas, using a command like:

s.split(",")

The constructor creates an object from some parameters:

myDam = HydroDam("My Dam", "Some River", "Some County", "A State", 100.0, 500.0, 2022)

3. Extra Challenges

Here are TWO optional extensions! Neither is required, but either or both might be fun and rewarding.

3.1. Optional: Add more filters

As the lab shows, you can filter on different criteria. Consider adding additional filters of your choosing, or extending your records by adding an e.g., extend by state option that queries all records and expands the current set of records by the given state. For example, a filter by "Montana" followed by an extend by "Pennsylvania" would result in a current list containing dams in either "Montana" or "Pennsylvania".

3.2. Optional: Find the first dam in every state

Print a list of the earliest dam in each state. You will likely want to sort the entire data set by year first. Then when filtering, only add the next record if the state of the corresponding record has not been added already. You may want to make a temporary list of previously seen state name to help determine if you already have the earliest dam for a given state

Answer the Questionnaire

Each lab will have a short questionnaire at the end. Please edit the Questions-10.txt file in your cs21/labs/10 directory and answer the questions in that file.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

When Remotely logged in

When you are ssh’ed into the CS labs, first quit any applications you are running, like vim, then simply type exit at the prompt in your terminal window to disconnect.

When Physically logged in

When you are in a CS lab logged into a CS machine. First quit any applications you are running, like the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!