CS21 Lab 9: Sorting

Due Saturday, April 8th, by 11:59pm


Please read through the entire lab before starting!

In this lab you will continue using the County Data data set from Lab 8, but support filtering and sorting operations instead of just searching.

Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply; feel free to refer to previous lab pages as a reminder.

Goals

  • Write a program that filters data by different criteria

  • Write a program that sorts data by different fields

  • Continue practicing Top-Down Design (TDD)

  • Connect CS topics with real-world data

1. Filtering the County Data Data

In this lab, you will implement a program called sort_demographics.py, which will allow the user to filter this data set and sort by different fields. This will be an extension of work done for the previous lab, so you are encouraged to use portions of your previous program (you don’t need to don’t re- implement functions you’ve already written).

1.1. Program Requirements

  • Reads in stored data from a file, storing a list of records. This is likely the same as your solution to lab 8 and you may copy over the code to read this data over to your lab 9 solution.

    As before, the data is stored in the file:

    /usr/local/doc/county_data.csv

    once again, please don’t copy this file to your labs/09 directory, just use "/usr/local/doc/county_data.csv" as the file name in your python program.

    As you read in the data, each line of the file corresponds to one record, which your program should represent using a CountyDataRecord object and store in a list.

Note that there are actually two versions of the data file; the complete one is: /usr/local/doc/county_data.csv but there’s also a copy that’s had most of the lines removed, which may be easier to use for testing your program, particularly in the early stages: /usr/local/doc/county_data_small.csv
  • Repeatedly presents a menu to the user with six choices shown below:

    Please select one of the following choices:
    1. Filter records by state name
    2. Filter records by total population
    3. Sort by state name
    4. Sort by total population
    5. Reset list to all records
    0. Quit
    
    Enter selection:

    Depending on the user’s selection, prompt them for additional information and then do one of:

    1. Filter the data set by state name.

    2. Filter the data set by min and max total population

    3. Sort the current set of records by state name

    4. Sort the current set of records by total population

    5. Reset the current set of records to the original data set.

    6. Exit the program.

  • Sorting

    • When sorting, your program should perform the sort in place, meaning it should modify the list passed in to the sorting function.

    • You must use different sorting algorithms for the two search operations (e.g. Selection Sort for one and Bubble Sort for the other)

  • Filtering

    • When filtering, your program should prompt for the filter criteria (state name or min/max population) and return a new list of records that match only the filter criteria

      You do not need to modify the list in place when filtering. Instead, you can create, and return a new list of records as stated above.
  • Input Validation

    • Ensure that the user provides valid input (menu choices (between 0 and 5) and numerical populations (no negative numbers or strings)).

    • If the user enters an invalid state or county you can treat this as a failed search, you do not have to validate state and county inputs.

  • Printing

    • After a filter or sort step, print all current records in the appropriate order.

    • Use formatted printing to show search results in a table format that fits in an 80-character wide terminal window like you did for lab 8.

Once you have filtered (by state or population) on the dataset subsequent filters and searches are on this modified dataset. To reset the dataset, the user will have to select option 5 in the menu list "Reset list to all records".

1.3. County Data Data Set

Refer to Lab 8 for a overview of the County Data Dataset.

Recall that each line of the file contains several features, which can be easily accessed using the provided library.

1.4. Tips

1.4.1. Continue to use TDD and incremental development

We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions. Likewise, you should continue to implement and test one function at a time, making sure it works correctly before moving on to the next one.

1.4.2. Re-use code from Lab 8

Since this lab works with the same data as we used for Lab 8 and performs a number of similar operations, you should try to re-use functions you wrote for Lab 8 when completing this lab, rather than re-implementing them from scratch. Simply copy/paste these functions into your new python file, and then modify them if necessary.

It’s fine to copy your own code; however, be sure that you’re only re-using code that you personally wrote. Using code from other students, the internet, etc. is still the same violation of academic ethics that it has been all semester.

1.4.3. Create modular, reusable functions

Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function.

  • For example, we are doing two types of filters (by state and by population), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of results and displays them. Then you could call this function for either type of filter or sort.

  • Note also that you may be able to re-use functions you wrote for the previous lab; you should always re-use code that you wrote in the past to make your current life easier!

  • Likewise, sorting by state and sorting by population can re-use the same swap function. You may need to copy/modify the inclass examples to sort objects by different fields (population/state name). Make sure when doing a comparison to determine which item comes first, you only compare the appropriate field/subpart of each CountyRecord object.

1.4.4. Displaying matches

You can probably re-use the same functions to display matches as you did in lab 08, but you can refer back to Lab 08 section 1.4 for string slicing and display tips.

1.4.5. Applying filters

The initial set of records should be the every record in the data file. However, anytime you apply a filter operation, the filter operation replaces the current set of records with the records matching the filter.

  • Thus if you filter by the state name "Montana" and then sort by population, you should only display the records from Montana sorted by population.

  • The reset option restores the current set of records to be all records. You will therefore need separate variables to keep track of the "original" data set and the "current working set".

Note that filtering is algorithmically very close to the searches you did in Lab 8 that returned multiple matches (though remember you can’t rely on data to be sorted here, so you can’t use binary search).

A sample non-trivial workflow might be to filter by a state (e.g., "Utah"), sort by population, then filter by population (e.g. 5000-20000). The current list of records after this sequence would be counties in Utah with a total population between 5000 and 20000.

  • When searching for matching records and applying the filter, you do not know how the records are sorted (they could change after a sort operation). For this reason, you should use a linear search strategy for all your filters.

  • When getting min and max values to filter by population, you just need to check if the value entered by the user is a non-negative integer. Populations of 5, or 20000000000 are fine inputs, even if there are no records in the data set that have values this extreme.

1.4.6. Copying a list

Normally, assigning two variables to refer to the same list is what you want to do, since copying a list is computationally more expensive. However, in this program you’ll need to modify a list while still maintaining a copy of the original list (so you can implement the "reset" functionality).

In Python, the easiest way to do this is to use the .copy() method of the list class. To see how it works, try running code like the following:

a = [3, 7, 2, 4]
b = a.copy()
a[0] = 9
print(a)
print(b)

1.4.7. Adapting sort

You should use two of the quadratic sorts discussed in class.

  • You will however need to adapt your algorithms to sort objects instead of just a list of ints.

  • You will likely need to have different but similar code to handle sorting by population and sorting by state; the way you do the comparisons will be different, since you’ll need to access the correct field from the object.

2. Provided Library

Remember that this lab comes with a county_demographics library that gives you several helpful tools intended to make this lab easier. See the appropriate section of lab 8 for full details.

3. Extra Challenges

Here are some optional extensions! These are not required, but they might be fun and rewarding.

If you implement extensions, please do so in a copy of your program called sort_demographics_ext.py

3.1. Optional: Add to the working set

As the lab shows, you can filter on different criteria. Consider adding additional filters of your choosing, or extending your records by adding an e.g., 'extend by state' option that queries all records and expands the current set of records by the given state. For example, a filter by "Montana" followed by an extend by "Pennsylvania" would result in a current list containing the records from both "Montana" and "Pennsylvania".

3.2. Optional: Sort or filter on other fields

The lab asks you to sort and filter on two fields, state and population. However, you can apply the same process to any of the fields accessible through the object. Choose one (or more) additional fields to add to the set of things the program can deal with.

3.3. Optional: Find the greatest disparity in education

Print the greatest ratio of education inequality from the records in the working set. You will need to calculate this ratio for each county as the total number of people with low County Data divided by the total population of. You will need to search the working set for the county with the largest value for this ratio.

Answer the Questionnaire

Each lab will have a short questionnaire at the end. Please edit the Questions-09.txt file in your cs21/labs/09 directory and answer the questions in that file.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, like the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!