CS21 Lab 9: Sorting a CO2 Database
Due Monday, April 20, by 11:59pm
Please read through the entire lab before starting!
In this lab you will continue using the CO2 Emissions data set from Lab 8, this time supporting a set of menu options that requires sorting the list on different features as well as searching.
Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply.
Goals
-
Practice with sorting by different data fields
-
More practice working with a list of objects
-
More more practice using binary search
-
More practice programming with file I/O, and menu-driven program
-
More practicing Top-Down Design (TDD), functions, and incremental implementation and testing
-
Connect CS topics with real-world data
1. Lab09 Overview
You will write a menu-driven program that computes the results of some queries on a set of historical CO2 emission data from countries around the world. We will be working with a real-world dataset from Climate Watch. The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dollars), and population (in millions of people) data for most countries in the world. However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.
This is the second assignment using CO2 emissions data. Much of the code you wrote for Lab 8 can be re-used for this assignment, leaving you to focus on implementing new menu options for queries on the data that require sorting on different field values.
You will implement your program in a file named co2_ordering.py that you
will create from a copy of your Lab 8 solution
(instructions in Section 1.2)
1.1. Details
In this lab you will focus on sorting and searching a list of CO2Data
objects, one object for each country read in from the file.
After reading data in program will then answer queries specified by the
user on the data that require sorting the data on different fields and
using some that use binary search to find the answers.
At a high level, your program will do the following (each of these steps is described in more detail below):
-
Read in the data file (again, you can use the
small.csvdata file for testing). -
Present the user with a menu of options, specifically:
1. n countries w/highest CO2 in a given year (high to low) 2. n countries w/highest Population in a given year (high to low) 3. n countries w/highest CO2/Capita in a given year (high to low) 4. print a country's info 5. quit
-
Repeatedly get an option from the user, and sometimes some other information that the specific option needs, and search the data based on the selected option. Each option requires that the list is first sorted using one or more fields from the
CO2Dataobjects, and option 4 additionally uses binary search.Note that menu options 4 and 5 are identical to those in lab Lab 8.
-
Exit the program once the user selects the "quit" option
1.2. Getting your starting point file
You are going to use your solution to Lab 8 as a starting point for Lab 09. Perform the following steps to get your starting point file (ninjas and instructors can help you with these steps):
-
After running
update21cdinto yourcs21/labs/09directory:$ update21 $ cd ~/cs21/labs/09
-
Next, use the
cpto make a copy of your Lab 8 solution into yourcs21/labs/09directory in a file namedco2_ordering.py:$ cp ../08/co2.py co2_ordering.py
-
Finally, open this file in
codeand edit it to implement your lab 09 solution.$ pwd /home/you/cs21/labs/09/ $ code co2_ordering.py
Specifically, you will: remove the functions that implement menu options 1-3 and replace them with new ones that implement the new menu options 1-3 for this lab assignment. edit the menu and the main control flow to get required input from user for each option, and to call your new functions to handle these options.
If you used good generic function design, most of the other functions from the previous lab will be used in this lab as they are, not requiring any changes.
1.3. Reading in the data
The code you wrote for Lab 8 to read file data into a list of CO2Data
objects is identical in this lab, and can be used unchanged. Refer to
Lab 8 for more details about this step, including the file format.
Remember that there are missing data for some fields of some countries.
Missing values are encoded as -1 in the file. Any field that stores
a numeric value could have a missing value (i.e, any CO2 value, population
value, or GDP value could be -1 in the input file, and -1.0 as the float
field value in the CO2Data object). A country’s name field will never be a
missing: no country has a name of -1.
Your program needs to correctly handle any missing values correctly
do not do arithmetic using -1 values, but skip over these values instead.
1.4. CO2Data Class
You are using the CO2Data class in this lab. Refer
to CO2Data class documentation from the Lab 8 page for details about
the CO2Data class and its methods.
1.5. Sample Output
Here are some sample runs of a working program, note output for each menu option and error handling for bad input:
Also note that sample output for each individual menu item is shown in the "Menu Option" sections below.
2. Menu Option 1
For this menu option you will:
-
prompt the user to enter two values:
-
the number of countries they want to list
-
the year of CO2 emissions data they want to order them by, one of 1960, 1980, 2000, 2020, or 2022.
-
-
sort the list of
CO2Dataobjects by their CO2 emissions values from the specified year (sort in decreasing order, from highest to lowest). -
print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country print:
-
its position in the ordered list
-
its name
-
its CO2 level for the given year
-
-
If there is not valid CO2 level data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are
-1!)
Below is example program output for this option. Note:
-
we are not including the full output from the first query due to space; your program should list all 178 countries with data in order for this example
-
countries with identical values (e.g.
0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).
% python co2_ordering.py
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 1
Enter a value for n:
Enter a value between 1 and 193: 3000
3000 is not a valid choice, try again
Enter a value between 1 and 193: 193
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 1960
The 193 largest CO2 emitters of 1960
Country Mt of CO2
----------------------------------------------------------
1. United States of America 2897.3153
2. European Union (27) 2099.3709
3. Russia 884.5549
4. Germany 813.9502
5. China 798.7999
6. United Kingdom 584.0200
... < skipping some output for space > ...
174. Dominica 0.0110
175. Kiribati 0.0000
176. Vanuatu 0.0000
177. Botswana 0.0000
178. Seychelles 0.0000
15 of 193 countries are missing CO2 data for 1960
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 1
Enter a value for n:
Enter a value between 1 and 193: 5
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2022
The 5 largest CO2 emitters of 2022
Country Mt of CO2
----------------------------------------------------------
1. China 11396.7774
2. United States of America 5057.3038
3. India 2829.6442
4. European Union (27) 2761.9071
5. Russia 1652.1773
2.1. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.
-
You should implement a helper function does the sorting part. It should sort the list on CO2 level for a given year. Think about how you to design this function so that a single CO2 sorting function can sort data on CO2 levels for any of the 5 year options.
Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.
-
You should use Selection Sort to sort the list by CO2 level for the given year.
-
Use methods of the
CO2Dataclass to access appropriate values from each country’s data: Section 1.4 -
Print out each matching country’s position in the ordering, name, and CO2 emissions for the specified year in tabular format, with a heading that is identical to ours. Use formatted print to line up output in columns. For example, to print out the position you could use
%3d.(the.is just a period character after the int value). CO2 values should be printed with 4 places beyond the decimal point. -
You should handle missing values (ignore them in your output). Missing values are represented with
-1. -
If there are fewer countries with valid CO2 data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year. See the output above for an example of this.
2.2. Hints/Tips
-
Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (
/usr/local/doc/co2data.csv). -
You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).
-
Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)
3. Menu Option 2
For this menu option you will:
-
prompt the user to enter two values:
-
the number of countries they want to list
-
the year of Population data that they want to order them by, one of 1960, or 2020.
-
-
sort the list of
CO2Dataobjects by their Population value of the specified year (sort in decreasing order, from highest to lowest). -
print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country, print:
-
its position in the ordered list
-
its name
-
its Population in the given year.
-
-
If there is not valid Population data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are
-1!)
Below is example program output for this option.
Note:
-
we are not including the full output from the first query due to space; your program should list all 192 countries with data in the correct order for this example.
-
countries with identical values (e.g.
0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 2
Enter a value for n:
Enter a value between 1 and 193: 193
Enter 1960 or 2020 for the year: 1960
The 193 Most Populous Countries of 1960
Country population (in millions)
-----------------------------------------------------------------
1. China 667.0700
2. India 450.5477
3. European Union (27) 356.9061
4. United States of America 180.6710
5. Russia 119.8970
6. Japan 93.2160
... < skipping some output for space > ...
191. Tuvalu 0.0053
192. Nauru 0.0044
1 of 193 countries are missing Population data for 1960
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 2
Enter a value for n:
Enter a value between 1 and 193: 10
Enter 1960 or 2020 for the year: 2020
The 10 Most Populous Countries of 2020
Country population (in millions)
-----------------------------------------------------------------
1. China 1411.1000
2. India 1380.0044
3. European Union (27) 447.4795
4. United States of America 331.5011
5. Indonesia 273.5236
6. Pakistan 220.8923
7. Brazil 212.5594
8. Nigeria 206.1396
9. Bangladesh 164.6894
10. Russia 144.0731
3.1. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.
-
You should implement a helper function does the sorting part. It should sort the list on Population level for a given year. Think about how you to design this function so that a single Population sorting function can sort population data for any of the year options.
Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.
-
You should use Bubble Sort to sort the list by Population in the the specified year.
-
Use methods of the
CO2Dataclass to access appropriate values from each country’s data: Section 1.4 -
Print out each matching country’s information in tabular format, with a heading that is identical to ours.
-
You should handle missing values (ignore them in your output). Missing values are represented with
-1. -
If there are fewer countries with valid population data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year.
3.2. Hints/Tips
-
Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (
/usr/local/doc/co2data.csv). -
You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).
-
Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)
4. Menu Option 3
For this menu option you will:
-
prompt the user to enter two values:
-
the number of countries they want to list
-
the year for which you want CO2/capita data, one of 1960, or 2020.
-
-
sort the list of
CO2Dataobjects by their CO2/capita values of the specified year (sort in decreasing order, from highest to lowest). -
print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country, print:
-
its position in the ordered list
-
its name
-
its CO2/capita value for the given year
-
its CO2 value and its population in parenthesis
-
-
If there is not valid data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are
-1!)
Below is example program output for this option. Note:
-
we are not including the full output from the first query due to space; your program should list all 178 countries with data in the correct order for this example.
-
countries with identical values (e.g.
0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 3
Enter a value for n:
Enter a value between 1 and 193: 193
Enter 1960 or 2020 for the year: 1960
The 193 countries with the largest CO2/capita in 1960
Country CO2/capita ( Mtons / millions )
---------------------------------------------------------------------------
1. Luxembourg 36.6525 ( 11.5078 / 0.3140)
2. Kuwait 28.9823 ( 7.7970 / 0.2690)
3. United States of America 16.0364 (2897.3153 / 180.6710)
... < skipping some output for space > ...
174. Nepal 0.0080 (0.080608 / 10.1051)
175. Kiribati 0.0000 (0.000000 / 0.0412)
176. Vanuatu 0.0000 (0.000000 / 0.0637)
177. Botswana 0.0000 (0.000000 / 0.5027)
178. Seychelles 0.0000 (0.000000 / 0.0417)
15 of 193 countries are missing data for 1960
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 3
Enter a value for n:
Enter a value between 1 and 193: 5
Enter 1960 or 2020 for the year: 2020
The 5 countries with the largest CO2/capita in 2020
Country CO2/capita ( Mtons / millions )
---------------------------------------------------------------------------
1. Qatar 35.5776 ( 102.5012 / 2.8811)
2. Brunei 25.3768 ( 11.1019 / 0.4375)
3. Trinidad and Tobago 25.0303 ( 35.0297 / 1.3995)
4. Kuwait 22.8804 ( 97.7124 / 4.2706)
5. Bahrain 21.9771 ( 37.3959 / 1.7016)
4.1. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 3.
-
You should implement a helper function does the sorting part. It should sort the list by CO2/population for the given year. Think about how you to design this function so that a single CO2 per capita sorting function can sort population data for any of the year options.
Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.
-
You may use either Selection Sort or Bubble Sort to sort the list.
-
Use methods of the
CO2Dataclass to access appropriate values from each country’s data: Section 1.4 -
Print out each matching country’s information in tabular format, with a heading that is identical to ours.
-
You should handle missing values (ignore them in your output). Missing values are represented with
-1. -
If there are fewer countries with valid population data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year.
4.2. Hints/Tips
-
Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (
/usr/local/doc/co2data.csv). -
You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).
-
Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)
5. Menu Option 4
This menu option is identical to menu option 4 from Lab 8. However,
the list of CO2Data objects may not be in sorted order of country name
anymore due to the actions of other menu options by the user.
As a result, you need to sort the list by Country name first before
doing a binary search to find the matching country.
Here is some example program output for this option:
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 4
Enter the name of a country: Mexico
Mexico
1960 pop: 37.771861 2020 pop: 128.932753
1960 GDP: 13.040000 2020 GDP: 1087.117783
1960 1980 2000 2020 2022
63.052300 267.743800 391.725000 442.289100 511.972000
---------- Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 4
Enter the name of a country: Kenya
Kenya
1960 pop: 8.120082 2020 pop: 53.771300
1960 GDP: 0.791265 2020 GDP: 100.666543
1960 1980 2000 2020 2022
2.424100 6.189300 10.408800 21.982400 24.851900
5.1. Required Features
-
Satisfies all required features for Option 4 from Lab 8, including that you use binary search on the list of
CO2Dataobjects to find the matching country. -
You should implement a helper function does the sorting part. It should sort the list by the name of each country in INCREASING order (from lowest to highest alphabetically). This is the same order as the data appear in the file. Your binary search function from Lab 8 will work with no modification if you sort the list of
CO2Dataobjects by country name in increasing order. -
You may use either Selection Sort or Bubble Sort to sort the list.
5.2. Hints/Tips
-
To implement this method, you should be able to just call your sorting by country name function and then use the rest of your menu option 4 code from Lab 8.
-
Test your sorting function in isolation first, use the smaller data file and debug print statements to verify it is correct, before implementing the rest of the functionality.
Answer the Questionnaire
After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 09) from the dropdown menu on the first question.
Once you’re done with that, you should run handin21 again.
Submitting lab assignments
Remember to run handin21 to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21 after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
First quit any applications you are running, including your vscode editor, the
browser and the terminal. Then click on the logout icon
(
or
) and
choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log
out. It is, however, a good idea to lock your machine while you are gone. You
can lock your screen by clicking on the lock
icon.
PLEASE do not leave a session locked for a long period of time. Power may go
out, someone might reboot the machine, etc. You don’t want to lose any work!