Lab 8: Searching a CO2 Database

Due Monday, April 13, by 11:59pm


Please read through the entire lab before starting!

This is a one week lab. In the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. For this lab you will be doing the TDD and full implementation in a single week. We encourage you to start this lab early.

Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply; feel free to refer to previous lab pages as a reminder.

Goals

  • Write a program that uses linear search

  • Write a program that uses binary search

  • Continue practicing Top-Down Design (TDD), functions, and incremental implementation and testing

  • Continue practicing with a Menu-drive program

  • Connect CS topics with real-world data

1. Lab08 Overview

You will write a menu-driven program that computes the results of some queries on a set of historical CO2 emission data from countries around the world. Climate scientists often use computer programs to simulate the interactions of the earth’s atmosphere, oceans, land surface, and ice, and also to analyze past data in order to understand the effects of human activity on the climate and to make predictions of future climate changes. Your program will use historical CO2 emission data and allow a user to answer some specific questions about the data. Your program will use linear and binary search over a list of CO2 data to answer these queries.

We will be working with a real-world dataset from Climate Watch. The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dollars), and population (in millions of people) data for most countries in the world. However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.

You will also use this data set in the next lab assignment. As a result, some of the functions you write for this lab will be used by the next one too, particularly functions that do the initial processing of reading in data from the file and some other helper functions. This lab and the next will be an example of function reuse across programs.

1.1. Details

In this lab you will focus on searching a list of CO2Data objects, one object for each country read in from the file. The data in the file are in sorted order by country name. Your list of CO2Data objects should keep the data read in sorted by country name. After reading data in program will then answer queries specified by the user on the data using either linear or binary search to find the answers.

You will implement a program called co2.py, which will allow the user to search through this dataset and find particular information.

At a high level, your program will do the following (each of these steps is described in more detail below):

  • Read in the data file. The starting point code has calls to open one of two files you can use.

  • Present the user with a menu of options, specifically:

    1. list all countries w/C02 level above a given value in given year
    2. list country w/lowest and country w/highest GDP in given year
    3. list all countries whose CO2 level decreased in 2022 from its
       largest value over all previous years
    4. print all of a country's information
    5. quit
  • Repeatedly get a menu option from the user, and other information from the user that the option may need, and search the data to answer the query of the selected option. Some options use linear search on different fields in the CO2Data object and others use binary search.

  • Exit the program when the user selects the "quit" option.

1.2. Sample Output

Here are some sample runs of a working program, note output for each menu option and error handling for bad input:

Also note that sample output for each individual menu item is shown in the "Menu Option" sections below.

1.3. Required Functions and TDD

Unlike the previous lab where you did the full TDD yourself, for this lab there are several required functions that you need to implement. The details about these functions are given in the following sections about reading in the data and implementing individual menu options.

There are, however, opportunities for you to do some design as you refine some of these functions. You may find that there are other functions you may want to add to implement sub-steps or to implement common functionality. You are welcome to add any additional function to this assignment.

2. Reading in the Data

Your program should read in data from a file, storing the information in the CO2 input file into a list of CO2Data objects.

As you read in the data, each line of the file corresponds to one country’s information, which your program should represent using a CO2Data object, and store it in a list.

2.1. The CO2 data files

The data we will be working with are stored in a file as comma-separated values.

The starting point code has two definitions for the input file to read from (un/comment out calls to open to use specific file):

  • /usr/local/doc/co2data.csv is a large file of the full data set, the one you should do your main testing with, and the one you should use for your submission. You can read from in its current location…​you do NOT need to copy this file it into your labs directory)

  • small.csv: a smaller file of just 20 country’s data, included in your lab directory. It is useful for initial debugging and testing, particularly to limit the amount of debug output if you add debug print statements. You can cat out this file to view its contents:

    cat small.csv

Each line in the input file contains 10 comma-separated values for a country in the following order:

Name,1960,1980,2000,2020,2022,pop_1960,pop_2020,gdp_1960,gdp_2020

For example, here is the information for two countries from the file (note that Namibia has some missing data (for 1960 and 1980 CO2 levels and for 1960 GDP) that are represented with -1 values (missing values are represented as -1 in the data files):

Namibia,-1,-1,1.6048,3.6818,3.953,0.634138,2.540916,-1,10.56263738
Nepal,0.080608,0.5413,3.0374,14.9024,15.5,10.10506,29.136808,0.508334414,33.43367051

Each line of the file contains the following information for one country, with each value separated by a comma. The values for each country on each line are as follows:

  • The first value is the name of the Country, which may include white space characters and other non-alphabetic characters. For example, European Union (27) is the name of one "country" in this list. Your program will use this value as a str. Note that files are sorted alphabetically by county name (i.e. by the first entry in each row).

  • The next five values are yearly total CO2 emission values for the years 1960, 1980, 2000, 2020, and 2022 (the most recent year in the data set). The amounts are in units of Mt (Megatons of CO2 emissions). Your program will use these as float values.

  • The next two values are the country’s populations for the years 1960 and 2020. These are in units of millions of people. Your program will use these as float values.

  • The final two values are the country’s total GDP for the years 1960 and 2020. These are in units of billions of US dollar equivalents. Your program will use these as float values.

about missing values

There are missing data for some fields of some countries. Missing values are encoded as -1 in the file. Your program should initialize any CO2Data object field with a -1.0 float value for these missing values.

Your program needs to correctly handle any missing values correctly---do not do arithmetic using -1 values, but skip over these values instead.

Any field that stores a numeric value could have a missing value (i.e, any CO2 value, population value, or GDP value could be -1 in the input file).

A country’s name field will never be a missing: no country has a name of -1.

2.2. The CO2Data class

You will create a CO2Data object for each country’s information you read in from the data file, and store these as a list of CO2Data objects in your program (be sure to add them to the list in such a way that ensures you maintain that they are in the same order in the list as they appear in the file (in sorted order by country name).

To create a CO2Data object invoke its constructor passing in argument values from data you read in from the file.

next_item = CO2Data(name, co2_vals, pop_1960, pop_2020, gdp_1960, gdp_2020)
  • name: the name of the country (str)

  • co2_vals: a list containing the five CO2 emission values for 1960, 1980, 2000, 2020, 2022 (a list of float)

  • pop_1960: 1960 population in Millions (float)

  • pop_2020: 2020 population in Millions (float)

  • gdp_1960: 1960 GDP in Billions of US dollars (float)

  • gdp_2020: 2020 GDP in Billions of US dollars (float)

Once you have created a list of CO2Data objects, each initialized with data read in from the file, your program can interact with the objects in the list by invoking some of their method functions.

The method functions in the CO2Data class:

Table 1. CO2Data Methods

Method

Returns

Description

getName()

str

returns the name of the Country

getCO2(year)

float

returns the CO2 emission for passed year (int)
returns -1 if data are missing for the year
returns -2 if the passed year is not a valid year for CO2 data (valid years are: 1960, 1980, 2000, 2020, 2022)

getCO2Vals()

list of float

returns a list of CO2 emission for all 5 years for the country

getGDP(year)

float

returns the GDP for passed year (int)
returns -1 if data are missing for the year
returns -2 if the passed year is not a valid year for GDP data (valid years are: 1960 and 2020)

getPopulation(year)

float

returns the population for the passed year (int)
returns -1 if data are missing for the year
returns -2 if the passed year is not a valid year for population data (valid years are: 1960 and 2020)

print()

None

prints all information for this object (nicely formatted)

The caller of methods is expected to check the return values and handle error return values (handle a -2 return value).

Your program should not use -1 as valid data value, -1 represents a missing value in the data set.

3. Menu and main control flow

After reading the data from a file into the list, your program should enter a loop printing out the menu of options and getting the user’s choice:

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 10
  10 is not a valid choice, try again
Enter a value between 1 and 5: 8
  8 is not a valid choice, try again
Enter a value between 1 and 5: 3

3.1. Required Features

  • Your program should list the 5 menu options in the exact order as the example shown above. Do not choose a different ordering of operations on the data (e.g., option 2 must be the option to list the country with the lowest and the country with the highest GDP in a given year).

  • Your program should gracefully handle, and re-prompt for, invalid menu options entered by the user.

    • You should implement a function, get_value_between(low, high, prompt) that takes two int values for the low and high values in the range,and a string with an instructions message (e.g., "Select a menu option"), and returns a value between low and high inclusive. If the value of the low parameter is larger than the high parameter, the function should just return the value of low and not prompt for any input.

      In the example output above, we called our function like this:
      option = get_value_between(1, 5, "Select a menu option")

      Your program does not need to handle non-integer value input like the user entering hello there at the menu options prompt.

      This function will be useful in other parts of your program.

At this point you can implement and test the main loop of your program, and test implementing option 5. quit.

You could add function stubs for the functions for the other 4 menu options to test out the main control flow of your program. Remember that you may need to go back and modify the set of parameters to these function stubs when you implement each one.

4. Menu Option 1

For this menu option you will prompt the user to enter two values:

  1. the year for which they want to list CO2 values

  2. the CO2 level above which to list countries for the given year

It will then use linear search on the list of CO2Data objects to print out all countries above the given limit for the given year, and their CO2 emission data for that year in tabular format.

Here is an example run of this option (note the error handling of some invalid input values):

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 1
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2001
  2001 is not a valid value, try again
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2000
Enter a CO2 limit value
Enter a value between 0 and 20000: 300000
  300000 is not a valid choice, try again
Enter a value between 0 and 20000: 1000

 Country                                   2000 CO2 emissions
                                           > 1000.0000
-----------------------------------------------------------
 China                                       3649.2009
 European Union (27)                         3601.5088
 Japan                                       1263.7548
 Russia                                      1479.1425
 United States of America                    6010.1359

Total of 5 countries > 1000.0000 in 2000

4.1. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.

  • Add and use a helper function, get_value_in_set that takes a list of year values and a prompt string, and returns the year selected by the user. A call to your function should pass a list of years (1960, 1980, 2000, 2020 or 2022, which are the five years with CO2 emissions values) from which the user selects, and a prompt string (which could be "Enter a year (one of 1960, 1980, 2000, 2020, 2022)". The function should not return until the user picks a valid year from the list (see the error input handling in the example output above). This function will be useful in other menu options.

  • Use your helper function, get_value_between to get a co2 limit value, a value for the lower limit (a value between 0 and 20000).

  • Use linear search of the list of CO2Data objects to find any countries whose CO2 emissions data are above the limit and print out the name and CO2 emission level for the given year as you find matches.

  • Use methods of the CO2Data class to access appropriate values from each country’s data: Section 2.2

  • Print out each matching country’s name and CO2 emissions in tabular format.

    • Print a header with Country and C02 emission for given year with value above the given amount in the header (format should be close to identical to our output).

    • Use formatted print to ensure that country names and printed CO2 levels line up. For example %-34s prints a string in a 34 space width, left justified.

    • CO2 values should be printed with 4 places beyond the decimal point. For example, %12.4f is a placeholder for a float value, printed in a field width of 12 with 4 places beyond the decimal point.

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

4.2. Hints/Tips

  • Look at example in-class code for lists, searching, objects, and output formatting.

  • Test your get_value_in_set function independently by adding some calls to it from main. Try passing different values to ensure that it works correctly.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the CO2Data Class Documentation for methods that might be helpful for implementing this menu option.

5. Menu Option 2

For this option, find the country with the lowest and the highest GDP in the year specified by the user (one of 1960 and 2020) and print out the each country’s name and their GDP value for the year using formatted output.

Here are two runs of this option (note the error handling of some invalid input values):

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 2
Enter one of 1960 or 2020 for the year 2000
  2000 is not a valid value, try again
Enter one of 1960 or 2020 for the year 1960

---- GDP in 1960
lowest:       0.01201 in Seychelles
highest:    543.30000 in United States of America

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 2
Enter one of 1960 or 2020 for the year 2020

---- GDP in 2020
lowest:       0.05505 in Tuvalu
highest:  20893.74383 in United States of America

5.1. Required Features

  • This option must be implemented in a separate function. You may add additional helper functions if you wish.

  • Use your get_value_in_set function to get the year value from the user passing in a list of the two year values (1960 and 2020).

  • You should handle missing values from the data set and ignore them in your output. Missing values are represented with -1 in the data file.

  • Print out the lowest and the highest country’s name and its GDP value to 5 decimal places (%.5f) with a header line with the value (to 5 decimal places) year in a format identical to our output.

5.2. Hints/Tips

  • It is okay to have one search to find the largest and another search to find the smallest GDP for the given year. You may also combine them into a single search. Either way is fine.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (/usr/local/doc/co2data.csv).

  • When using the smaller file in particular, you can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • Refer to the CO2Data Class Documentation for methods that might be helpful for implementing this menu option.

6. Menu Option 3

For this option you will find all countries whose 2022 CO2 emission level is lower than their maximum CO2 emission level over all other previous years. You will print out each countries Name, 2022 level, and max level in tabular format. It should also print the total number of countries for which this is true at the end.

Here is a run of this option (note we are not showing the full output as there are a lot of matches):

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 3

     Country                             2022 level   Previous High
---------------------------------------------------------------------
  Albania                                    4.9547          5.1708
  Andorra                                    0.3686          0.5240
  Angola                                    16.0703         16.7648
  Antigua and Barbuda                        0.6022          0.6192
  Armenia                                    6.4078          7.7198
  Australia                                392.2793        396.6852
  Austria                                   61.4884         66.1717
  Azerbaijan                                38.0627         46.1098
  ...
  United States of America                5057.3038       6010.1359
  Uzbekistan                               120.6102        123.4773
  Venezuela                                 76.8920        142.8349
  Vietnam                                  343.6066        363.3427
  Yemen                                     11.3563         15.7253
  Zimbabwe                                   8.8560         13.8182

Total of 89 countries with decrease in CO2 in 2022

6.1. Required Features

  • This option must be implemented in a separate function. You may add additional helper functions if you wish.

  • You should handle missing values from the data set and ignore them in your output. Missing values are represented with -1 in the data file.

  • Print out each matching country’s name, 2022 CO2 emissions, and max CO2 emissions data in tabular format.

    • Print a header. Its format should be the same as our output.

    • Use formatted print to ensure that country names and printed CO2 levels line up. For example %-34s prints a string in a 34 space width, left justified.

    • CO2 values should be printed with 4 places beyond the decimal point. For example, %12.4f is a placeholder for a float value, printed in a field width of 12 with 4 places beyond the decimal

  • Print out the total number of countries whose 2022 was less than a previous max at the end. See our example output for what this should look like.

6.2. Hints/Tips

  • You could first try to find and print out the max value of each country over all years.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.

  • Refer to the CO2Data Class Documentation for methods that might be helpful for implementing this menu option.

7. Menu Option 4

This menu option takes a country name from the user and searches the list of CO2Data objects using binary search to find a matching country. If found, its information is printed, otherwise a message is printed that the country could not be found in the database.

Here are three examples of this option (note the output when a country with the matching name cannot be found):

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 4

Enter the name of a country: Canada
Canada
  1960 pop:    17.909009  2020 pop:    38.037204
  1960 GDP:    40.461722  2020 GDP:  1645.423408
          1960         1980         2000         2020         2022
    192.716200   442.846900   567.096100   522.845300   547.943900

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 4

Enter the name of a country: Zimbabwe
Zimbabwe
  1960 pop:     3.776679  2020 pop:    14.862927
  1960 GDP:     1.052990  2020 GDP:    18.051171
          1960         1980         2000         2020         2022
      5.943100     9.614000    13.818200     7.849600     8.856000

----------  Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
   largest value over all previous years
4. print all of a country's information
5. quit

Select a menu option
Enter a value between 1 and 5: 4

Enter the name of a country: USA
Sorry, USA is not in the database

7.1. Required Features

  • You must use binary search to find the country with a matching name in your list of CO2Data objects (your list should be in sorted order by country name, matching the order of the file).

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 4.

  • Use the print method of the CO2Data object class to print out its information in the format shown above (the print method prints it out in this format for you!).

  • Use methods of the CO2Data class to access appropriate values from each country’s data: Section 2.2.

  • If the country is not in the database, print out a message saying it is not present. Your program does not have to handle cases when the name of a country is entered using a case that doesn’t match how it appears in the database (e.g., it is fine if your program says that canada is not in the database even though Canada is).

7.2. Hints/Tips

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer (be sure to remove all debug print statements from code you submit.

  • Make sure to test your code for different search cases. The smaller file might be helpful for this at first.

  • Refer to the CO2Data Class Documentation for methods that might be helpful for implementing this menu option.

8. General Tips for the lab

  • Refer to this page as you work: As you implement this lab, keep the lab assignment page open in a browser and refer often to as you go.

    • Look at the requirements and hints for the part you are working on.

    • Look at the example output to see how the option works, and for what your output should look like.

    • Refer to the CO2Data Documentation for method functions that will help you implement your solution.

  • Continue to use TDD: We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions. However, you may not work with a partner on this lab. You don’t need to get approval from an instructor for your design, though we’d be glad to review it with you in lab, office hours, or ninja sessions. You should only begin implementing the other functions after you have a clear plan in place for the structure of main.

  • Use incremental testing and debugging: continue to test and debug functions in isolation.

    • Run your program on the smaller input file first for debugging. You can cat out the file contents and look at the values to see if your program is doing the right thing. (be sure to revert back to using the large file for further testing and for the final version you submit)

    • Add debug print statement to help debug (be sure to remove them once you debug functionality)

  • Create modular, reusable functions: Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. We have given you definitions of functions you are required to write, and a guide for TDD of your program, but for other functions you design think about this goal.

Answer the Questionnaire

After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 08) from the dropdown menu on the first question.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, including your vscode editor, the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!