CS21 Lab 9: Sorting a CO2 Database

Due Monday, April 20, by 11:59pm


Please read through the entire lab before starting!

In this lab you will continue using the CO2 Emissions data set from Lab 8, this time supporting a set of menu options that requires sorting the list on different features as well as searching.

Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply.

Goals

  • Practice with sorting by different data fields

  • More practice working with a list of objects

  • More more practice using binary search

  • More practice programming with file I/O, and menu-driven program

  • More practicing Top-Down Design (TDD), functions, and incremental implementation and testing

  • Connect CS topics with real-world data

1. Lab09 Overview

You will write a menu-driven program that computes the results of some queries on a set of historical CO2 emission data from countries around the world. We will be working with a real-world dataset from Climate Watch. The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dollars), and population (in millions of people) data for most countries in the world. However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.

This is the second assignment using CO2 emissions data. Much of the code you wrote for Lab 8 can be re-used for this assignment, leaving you to focus on implementing new menu options for queries on the data that require sorting on different field values.

You will implement your program in a file named co2_ordering.py that you will create from a copy of your Lab 8 solution (instructions in Section 1.2)

1.1. Details

In this lab you will focus on sorting and searching a list of CO2Data objects, one object for each country read in from the file. After reading data in program will then answer queries specified by the user on the data that require sorting the data on different fields and using some that use binary search to find the answers.

At a high level, your program will do the following (each of these steps is described in more detail below):

  • Read in the data file (again, you can use the small.csv data file for testing).

  • Present the user with a menu of options, specifically:

    1. n countries w/highest CO2 in a given year (high to low)
    2. n countries w/highest Population in a given year (high to low)
    3. n countries w/highest CO2/Capita in a given year (high to low)
    4. print a country's info
    5. quit
  • Repeatedly get an option from the user, and sometimes some other information that the specific option needs, and search the data based on the selected option. Each option requires that the list is first sorted using one or more fields from the CO2Data objects, and option 4 additionally uses binary search.

    Note that menu options 4 and 5 are identical to those in lab Lab 8.

  • Exit the program once the user selects the "quit" option

1.2. Getting your starting point file

You are going to use your solution to Lab 8 as a starting point for Lab 09. Perform the following steps to get your starting point file (ninjas and instructors can help you with these steps):

  1. After running update21 cd into your cs21/labs/09 directory:

    $ update21
    $ cd ~/cs21/labs/09
  2. Next, use the cp to make a copy of your Lab 8 solution into your cs21/labs/09 directory in a file named co2_ordering.py:

    $ cp ../08/co2.py co2_ordering.py
  3. Finally, open this file in code and edit it to implement your lab 09 solution.

    $ pwd
    /home/you/cs21/labs/09/
    
    $ code co2_ordering.py

    Specifically, you will: remove the functions that implement menu options 1-3 and replace them with new ones that implement the new menu options 1-3 for this lab assignment. edit the menu and the main control flow to get required input from user for each option, and to call your new functions to handle these options.

If you used good generic function design, most of the other functions from the previous lab will be used in this lab as they are, not requiring any changes.

1.3. Reading in the data

The code you wrote for Lab 8 to read file data into a list of CO2Data objects is identical in this lab, and can be used unchanged. Refer to Lab 8 for more details about this step, including the file format.

Remember that there are missing data for some fields of some countries. Missing values are encoded as -1 in the file. Any field that stores a numeric value could have a missing value (i.e, any CO2 value, population value, or GDP value could be -1 in the input file, and -1.0 as the float field value in the CO2Data object). A country’s name field will never be a missing: no country has a name of -1.

Your program needs to correctly handle any missing values correctly do not do arithmetic using -1 values, but skip over these values instead.

1.4. CO2Data Class

You are using the CO2Data class in this lab. Refer to CO2Data class documentation from the Lab 8 page for details about the CO2Data class and its methods.

1.5. Sample Output

Here are some sample runs of a working program, note output for each menu option and error handling for bad input:

Also note that sample output for each individual menu item is shown in the "Menu Option" sections below.

2. Menu Option 1

For this menu option you will:

  1. prompt the user to enter two values:

    1. the number of countries they want to list

    2. the year of CO2 emissions data they want to order them by, one of 1960, 1980, 2000, 2020, or 2022.

  2. sort the list of CO2Data objects by their CO2 emissions values from the specified year (sort in decreasing order, from highest to lowest).

  3. print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country print:

    1. its position in the ordered list

    2. its name

    3. its CO2 level for the given year

  4. If there is not valid CO2 level data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are -1!)

Below is example program output for this option. Note:

  • we are not including the full output from the first query due to space; your program should list all 178 countries with data in order for this example

  • countries with identical values (e.g. 0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).

% python co2_ordering.py

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 1
Enter a value for n:
Enter a value between 1 and 193: 3000
  3000 is not a valid choice, try again
Enter a value between 1 and 193: 193
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 1960

 The 193 largest CO2 emitters of 1960
    Country                                    Mt of CO2
----------------------------------------------------------
  1. United States of America                  2897.3153
  2. European Union (27)                       2099.3709
  3. Russia                                     884.5549
  4. Germany                                    813.9502
  5. China                                      798.7999
  6. United Kingdom                             584.0200
      ...  < skipping some output for space > ...
174. Dominica                                     0.0110
175. Kiribati                                     0.0000
176. Vanuatu                                      0.0000
177. Botswana                                     0.0000
178. Seychelles                                   0.0000

15 of 193 countries are missing CO2 data for 1960

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 1
Enter a value for n:
Enter a value between 1 and 193: 5
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2022

 The 5 largest CO2 emitters of 2022
    Country                                    Mt of CO2
----------------------------------------------------------
  1. China                                    11396.7774
  2. United States of America                  5057.3038
  3. India                                     2829.6442
  4. European Union (27)                       2761.9071
  5. Russia                                    1652.1773

2.1. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.

  • You should implement a helper function does the sorting part. It should sort the list on CO2 level for a given year. Think about how you to design this function so that a single CO2 sorting function can sort data on CO2 levels for any of the 5 year options.

    Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.

  • You should use Selection Sort to sort the list by CO2 level for the given year.

  • Use methods of the CO2Data class to access appropriate values from each country’s data: Section 1.4

  • Print out each matching country’s position in the ordering, name, and CO2 emissions for the specified year in tabular format, with a heading that is identical to ours. Use formatted print to line up output in columns. For example, to print out the position you could use %3d. (the . is just a period character after the int value). CO2 values should be printed with 4 places beyond the decimal point.

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • If there are fewer countries with valid CO2 data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year. See the output above for an example of this.

2.2. Hints/Tips

  • Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (/usr/local/doc/co2data.csv).

  • You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).

  • Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)

3. Menu Option 2

For this menu option you will:

  1. prompt the user to enter two values:

    1. the number of countries they want to list

    2. the year of Population data that they want to order them by, one of 1960, or 2020.

  2. sort the list of CO2Data objects by their Population value of the specified year (sort in decreasing order, from highest to lowest).

  3. print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country, print:

    1. its position in the ordered list

    2. its name

    3. its Population in the given year.

  4. If there is not valid Population data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are -1!)

Below is example program output for this option.

Note:

  • we are not including the full output from the first query due to space; your program should list all 192 countries with data in the correct order for this example.

  • countries with identical values (e.g. 0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 2
Enter a value for n:
Enter a value between 1 and 193: 193
Enter 1960 or 2020 for the year: 1960

 The 193 Most Populous Countries of 1960
    Country                                  population (in millions)
-----------------------------------------------------------------
  1. China                                       667.0700
  2. India                                       450.5477
  3. European Union (27)                         356.9061
  4. United States of America                    180.6710
  5. Russia                                      119.8970
  6. Japan                                        93.2160
      ...  < skipping some output for space > ...
191. Tuvalu                                        0.0053
192. Nauru                                         0.0044

1 of 193 countries are missing Population data for 1960

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 2
Enter a value for n:
Enter a value between 1 and 193: 10
Enter 1960 or 2020 for the year: 2020

 The 10 Most Populous Countries of 2020
    Country                                  population (in millions)
-----------------------------------------------------------------
  1. China                                      1411.1000
  2. India                                      1380.0044
  3. European Union (27)                         447.4795
  4. United States of America                    331.5011
  5. Indonesia                                   273.5236
  6. Pakistan                                    220.8923
  7. Brazil                                      212.5594
  8. Nigeria                                     206.1396
  9. Bangladesh                                  164.6894
 10. Russia                                      144.0731

3.1. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.

  • You should implement a helper function does the sorting part. It should sort the list on Population level for a given year. Think about how you to design this function so that a single Population sorting function can sort population data for any of the year options.

    Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.

  • You should use Bubble Sort to sort the list by Population in the the specified year.

  • Use methods of the CO2Data class to access appropriate values from each country’s data: Section 1.4

  • Print out each matching country’s information in tabular format, with a heading that is identical to ours.

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • If there are fewer countries with valid population data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year.

3.2. Hints/Tips

  • Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (/usr/local/doc/co2data.csv).

  • You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).

  • Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)

4. Menu Option 3

For this menu option you will:

  1. prompt the user to enter two values:

    1. the number of countries they want to list

    2. the year for which you want CO2/capita data, one of 1960, or 2020.

  2. sort the list of CO2Data objects by their CO2/capita values of the specified year (sort in decreasing order, from highest to lowest).

  3. print out the first number of countries in the list in tabular format, with a header. Your output be in tabular format similar to our example. For each country, print:

    1. its position in the ordered list

    2. its name

    3. its CO2/capita value for the given year

    4. its CO2 value and its population in parenthesis

  4. If there is not valid data for the number of countries the user wants listed, you should also print out a message saying how many of the countries have no data (note: these should all be at the end of your sorted list since their CO2 values are -1!)

Below is example program output for this option. Note:

  • we are not including the full output from the first query due to space; your program should list all 178 countries with data in the correct order for this example.

  • countries with identical values (e.g. 0.0000) can be listed in any order in your output (the exact ordering of objects in the list with duplicate values just depends your particular sorting algorithm).

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 3
Enter a value for n:
Enter a value between 1 and 193: 193
Enter 1960 or 2020 for the year: 1960

 The 193 countries with the largest CO2/capita in 1960
    Country                               CO2/capita  ( Mtons / millions )
---------------------------------------------------------------------------
  1. Luxembourg                            36.6525   (  11.5078 /   0.3140)
  2. Kuwait                                28.9823   (   7.7970 /   0.2690)
  3. United States of America              16.0364   (2897.3153 / 180.6710)
      ...  < skipping some output for space > ...
174. Nepal                                  0.0080   (0.080608 /  10.1051)
175. Kiribati                               0.0000   (0.000000 /   0.0412)
176. Vanuatu                                0.0000   (0.000000 /   0.0637)
177. Botswana                               0.0000   (0.000000 /   0.5027)
178. Seychelles                             0.0000   (0.000000 /   0.0417)

15 of 193 countries are missing data for 1960

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 3
Enter a value for n:
Enter a value between 1 and 193: 5
Enter 1960 or 2020 for the year: 2020

 The 5 countries with the largest CO2/capita in 2020
    Country                               CO2/capita  ( Mtons / millions )
---------------------------------------------------------------------------
  1. Qatar                                 35.5776   ( 102.5012 /   2.8811)
  2. Brunei                                25.3768   (  11.1019 /   0.4375)
  3. Trinidad and Tobago                   25.0303   (  35.0297 /   1.3995)
  4. Kuwait                                22.8804   (  97.7124 /   4.2706)
  5. Bahrain                               21.9771   (  37.3959 /   1.7016)

4.1. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 3.

  • You should implement a helper function does the sorting part. It should sort the list by CO2/population for the given year. Think about how you to design this function so that a single CO2 per capita sorting function can sort population data for any of the year options.

    Be sure that you are sorting data in decreasing order (from highest to lowest). This will make printing out the results much easier.

  • You may use either Selection Sort or Bubble Sort to sort the list.

  • Use methods of the CO2Data class to access appropriate values from each country’s data: Section 1.4

  • Print out each matching country’s information in tabular format, with a heading that is identical to ours.

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • If there are fewer countries with valid population data for the given year than the number specified by the user, print out a message at the end of the table saying how many of the values do not have data for the given year.

4.2. Hints/Tips

  • Use your functions from Lab 8 to get input values from the user for the number of countries to print and for the year.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (/usr/local/doc/co2data.csv).

  • You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • Test the sorting function independently of the full option functionality (the smaller file will be easier to use for this testing).

  • Refer to the CO2Data class documentation for methods that might be helpful for implementing this menu option. (this is a link to documentation on the Lab 8 page)

5. Menu Option 4

This menu option is identical to menu option 4 from Lab 8. However, the list of CO2Data objects may not be in sorted order of country name anymore due to the actions of other menu options by the user. As a result, you need to sort the list by Country name first before doing a binary search to find the matching country.

Here is some example program output for this option:

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit
Select a menu option
Enter a value between 1 and 5: 4

Enter the name of a country: Mexico

Mexico
  1960 pop:    37.771861  2020 pop:   128.932753
  1960 GDP:    13.040000  2020 GDP:  1087.117783
          1960         1980         2000         2020         2022
     63.052300   267.743800   391.725000   442.289100   511.972000

----------  Menu Options -----------
1. n countries w/highest CO2 in a given year (listed high to low)
2. n countries w/highest Population in a given year (high to low)
3. n countries w/highest CO2/Capita in a given year (high to low)
4. print a country's info
5. quit

Select a menu option
Enter a value between 1 and 5: 4

Enter the name of a country: Kenya

Kenya
  1960 pop:     8.120082  2020 pop:    53.771300
  1960 GDP:     0.791265  2020 GDP:   100.666543
          1960         1980         2000         2020         2022
      2.424100     6.189300    10.408800    21.982400    24.851900

5.1. Required Features

  • Satisfies all required features for Option 4 from Lab 8, including that you use binary search on the list of CO2Data objects to find the matching country.

  • You should implement a helper function does the sorting part. It should sort the list by the name of each country in INCREASING order (from lowest to highest alphabetically). This is the same order as the data appear in the file. Your binary search function from Lab 8 will work with no modification if you sort the list of CO2Data objects by country name in increasing order.

  • You may use either Selection Sort or Bubble Sort to sort the list.

5.2. Hints/Tips

  • To implement this method, you should be able to just call your sorting by country name function and then use the rest of your menu option 4 code from Lab 8.

  • Test your sorting function in isolation first, use the smaller data file and debug print statements to verify it is correct, before implementing the rest of the functionality.

Answer the Questionnaire

After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 09) from the dropdown menu on the first question.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, including your vscode editor, the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!