CS21 Lab 9: Sorting
Due Saturday, November 22, by 11:59pm
Please read through the entire lab before starting!
In this lab you will continue using the Food Access data set from Lab 8, this time supporting filtering and sorting operations instead of just searching.
Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply.
As you write programs, use good programming practices:
-
Use a comment at the top of the file to describe the purpose of the program (see example).
-
All programs should have a
main()function (see example). -
Use variable names that describe the contents of the variables.
-
Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.
-
Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.
-
Avoid writing any lines of code that exceed 80 columns.
-
Always work in a terminal window that is 80 characters wide (resize it to be this wide)
-
In
vscode, at the bottom right in the window, there is an indication of both the line and the column of the cursor.
-
Function Comments
All functions should include a comment explaining their purpose, parameters, return value, and describe any side effects. Please see our function example page if you are confused about writing function comments.
Goals
-
Write a program that filters data by different criteria
-
Write a program that sorts data by different fields
-
Continue practicing Top-Down Design (TDD)
-
Connect CS topics with real-world data
1. Provided Library
Remember that this lab comes with a food_access_lab library that gives you
several helpful tools intended to make this lab easier. See
the appropriate section of lab 8 for full
details.
2. Filtering the Food Access Data
In this lab, you will implement a program called sort_food_access.py, which will allow the user
to filter this data set and sort by different fields. This will be an
extension of work done for the previous lab, so you are encouraged to use
portions of your previous program (you don’t need to don’t re-implement
functions you’ve already written).
2.1. Program Behavior
-
Your program should read in data from a file, storing a list of records. This is likely the same as your solution to lab 8 and you may copy over the code to read this data over to your lab 9 solution.
As before, the data is stored in the file:
/data/cs21/food_access/food_access.csvNote: once again, please don’t copy this file to your
labs/09directory, just use"/data/cs21/food_access/food_access.csv"as the file name in your python program.As you read in the data, each line of the file corresponds to one record, which your program should represent using a FoodAccessRecord object and store in a list.
Note that there are actually two versions of the data file; the complete one is:
/data/cs21/food_access/food_access.csv
but there’s also a copy that’s had most of the lines removed, which may be
easier to use for testing your program, particularly in the early stages:
/data/cs21/food_access/food_access_small.csv
|
-
After reading the data, your program should repeatedly present a menu to the user with the six choices shown below:
Please select one of the following choices: 1. Filter records by state name 2. Filter records by total population 3. Sort by state name 4. Sort by total population 5. Reset list to all records 0. Quit Choice?
Depending on the user’s selection, prompt them for additional information and then do one of:
-
Filter the data set by state name.
-
Filter the data set by min and max total population.
-
Sort the current set of records by state name.
-
Sort the current set of records by total population.
-
Reset the current set of records to the original data set.
-
Exit the program.
-
-
When filtering, your program should prompt for the filter criteria (state name or min/max population) and return a new list of records that match only the filter criteria. For this program, your program can continue to implement prefix matching like we did in lab 8, or you can use exact string matching. The examples below show exact match filtering, but we’ll accept either one.
-
When sorting, your program should perform the sort in place, meaning it should modify the list passed in to the sorting function.
-
You must use different sorting algorithms for the two sort operations (e.g. Selection Sort for one and Bubble Sort for the other)
-
Ensure that the user provides valid input (menu choices and numerical populations).
-
After a filter or sort step, print all current records in the appropriate order
-
Use formatted printing to show search results in a table format that fits in an 80-character wide terminal window like you did for lab 8.
2.2. Example Output
-
Here are some examples of how the program might look in operation:
2.3. Requirements
-
Your program should read data from the
food_access.csvdata set and store the data in a list ofFoodAccessRecordobjects. -
Your program must repeatedly prompt the user for menu options as described above. It must validate all user inputs. In particular, when entering a menu selections or population filtering thresholds, the program should validate that the user enters a positive integer.
-
When the user selects filtering by state name, the program should subsequently ONLY show records from the selected state until the user selects option 5 (reset list to all records).
-
When the user selects filtering by total population, the program should subsequently ONLY show records within the selected population range until the user selects option 5 (reset list to all records).
|
Filtering Details
The initial set of records should be every record in the data file. However, anytime you apply a filter operation, the filter operation replaces the current set of records with the records matching the filter. Thus if you filter by the state name "Montana" and then sort by population, you should only display the records from Montana sorted by population. The reset option restores the current set of records to be all records. You will therefore need separate variables to keep track of the "original" data set and the "current working set". Note that filtering is algorithmically very close to the searches you did in Lab 8 that returned multiple matches (though remember you can’t rely on data to be sorted here, so you can’t use binary search). A sample non-trivial workflow might be to filter by a state (e.g., "Utah"), sort by population, then filter by population (e.g. 5000-20000). The current list of records after this sequence would be counties in Utah with a total population between 5000 and 20000. When searching for matching records and applying the filter, you do not know how the records are sorted (they could change after a sort operation). For this reason, you should use a linear search strategy for all your filters. When getting min and max values to filter by population, you just need to check if the value entered by the user is a non-negative integer. Populations of 5, or 20000000000 are fine inputs, even if there are no records in the data set that have values this extreme. |
-
When the user select sorting by state name, you should sort the current set of records by state and print the results. The records should remain sorted by state until the user requests sorting them by population or opts to reset the records.
-
When the user select sorting by total population, you should sort the current set of records by population and print the results. The records should remain sorted by population until the user requests sorting them by state or opts to reset the records.
|
For the two sorting options, you must use two different sorting algorithms (e.g., bubble sort for one and selection sort for the other). |
2.4. Tips
2.4.1. Continue to use TDD and incremental development
We strongly recommend that you create a top-down design with a fully fleshed
out main and stubs for all other functions. Likewise, you should continue to
implement and test one function at a time, making sure it works correctly
before moving on to the next one.
2.4.2. Re-use code from Lab 8
Since this lab works with the same data as we used for Lab 8 and performs a number of similar operations, you should try to re-use functions you wrote for Lab 8 when completing this lab, rather than re-implementing them from scratch. Simply copy/paste these functions into your new python file, and then modify them if necessary.
It’s fine to copy your own code; however, be sure that you’re only re-using code that you personally wrote. Using code from other students, the internet, etc. is still the same violation of academic ethics that it has been all semester.
2.4.3. Create modular, reusable functions
Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of filters (by state and by population), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of results and displays them. Then you could call this function for either type of filter or sort.
Note also that you may be able to re-use functions you wrote for the previous lab; you should always re-use code that you wrote in the past to make your current life easier!
Likewise, sorting by state and sorting by population can re-use the same swap function. You may need to copy/modify the inclass examples to sort objects by different fields (population/state name). Make sure when doing a comparison to determine which item comes first, you only compare the appropriate field/subpart of each object.
2.4.4. Displaying matches
You can probably re-use the same functions to display matches as you did in lab 08, but you can refer back to Lab 08 section 5 for string slicing and display tips.
2.4.5. Copying a list
Normally, assigning two variables to refer to the same list is what you want to do, since copying a list is computationally more expensive. However, in this program you’ll need to modify a list while still maintaining a copy of the original list (so you can implement the "reset" functionality).
In Python, the easiest way to do this is to use the .copy() method of the
list class. To see how it works, try running code like the following:
a = [3, 7, 2, 4]
b = a.copy()
a[0] = 9
print(a)
print(b)
2.4.6. Adapting sort
You should use two of the quadratic sorts discussed in class. However, you will need to adapt your algorithms to sort objects instead of just a list of ints. You will likely need to have different but similar code to handle sorting by population and sorting by state; the way you do the comparisons will be different, since you’ll need to access the correct field from the object.
3. Extra Challenges
Here are some optional extensions! These are not required, but they might be fun and rewarding.
If you implement extensions, please do so in a copy of your program called sort_food_access_ext.py
3.1. Optional: Add to the working set
As the lab shows, you can filter on different criteria. Consider adding additional filters of your choosing, or extending your records by adding an e.g., 'extend by state' option that queries all records and expands the current set of records by the given state. For example, a filter by "Montana" followed by an extend by "Pennsylvania" would result in a current list containing the records from both "Montana" and "Pennsylvania".
3.2. Optional: Sort or filter on other fields
The lab asks you to sort and filter on two fields, state and population. However, you can apply the same process to any of the fields accessible through the object. Choose one (or more) additional fields to add to the set of things the program can deal with.
3.3. Optional: Find the greatest food insecurity ratio
Print the greatest ratio of food insecurity from the records in the working set. You will need to calculate this ratio for each county as the total number of people with low food access divided by the total population of the county. You will need to search the working set for the county with the largest value for this ratio.
3.4. Optional: Incorporate binary search for filtering
Keep track of the most recently chosen sort option, so that the program always knows if the records are sorted by state name or population, and perform binary search when it’s possible to do so (i.e., when the records are sorted by the field being filtered on).
Answer the Questionnaire
After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 09) from the dropdown menu on the first question.
Once you’re done with that, you should run handin21 again.
Submitting lab assignments
Remember to run handin21 to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21 after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
First quit any applications you are running, including your vscode editor, the browser and the
terminal. Then click on the logout icon (
or
) and choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log
out. It is, however, a good idea to lock your machine while you are gone. You
can lock your screen by clicking on the lock
icon.
PLEASE do not leave a session locked for a long period of time. Power may go
out, someone might reboot the machine, etc. You don’t want to lose any work!