Lab 8: Searching a CO2 Database
Due Monday, April 13, by 11:59pm
Please read through the entire lab before starting!
This is a one week lab. In the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. For this lab you will be doing the TDD and full implementation in a single week. We encourage you to start this lab early.
Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply; feel free to refer to previous lab pages as a reminder.
Goals
-
Write a program that uses linear search
-
Write a program that uses binary search
-
Continue practicing Top-Down Design (TDD), functions, and incremental implementation and testing
-
Continue practicing with a Menu-drive program
-
Connect CS topics with real-world data
1. Lab08 Overview
You will write a menu-driven program that computes the results of some queries on a set of historical CO2 emission data from countries around the world. Climate scientists often use computer programs to simulate the interactions of the earth’s atmosphere, oceans, land surface, and ice, and also to analyze past data in order to understand the effects of human activity on the climate and to make predictions of future climate changes. Your program will use historical CO2 emission data and allow a user to answer some specific questions about the data. Your program will use linear and binary search over a list of CO2 data to answer these queries.
We will be working with a real-world dataset from Climate Watch. The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dollars), and population (in millions of people) data for most countries in the world. However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.
You will also use this data set in the next lab assignment. As a result, some of the functions you write for this lab will be used by the next one too, particularly functions that do the initial processing of reading in data from the file and some other helper functions. This lab and the next will be an example of function reuse across programs.
1.1. Details
In this lab you will focus on searching a list of CO2Data objects, one
object for each country read in from the file. The data in the file
are in sorted order by country name. Your list of CO2Data objects
should keep the data read in sorted by country name. After reading data in
program will then answer queries specified by the user on the data
using either linear or binary search to find the answers.
You will implement a program called co2.py, which will allow the user
to search through this dataset and find particular information.
At a high level, your program will do the following (each of these steps is described in more detail below):
-
Read in the data file. The starting point code has calls to open one of two files you can use.
-
Present the user with a menu of options, specifically:
1. list all countries w/C02 level above a given value in given year 2. list country w/lowest and country w/highest GDP in given year 3. list all countries whose CO2 level decreased in 2022 from its largest value over all previous years 4. print all of a country's information 5. quit
-
Repeatedly get a menu option from the user, and other information from the user that the option may need, and search the data to answer the query of the selected option. Some options use linear search on different fields in the
CO2Dataobject and others use binary search. -
Exit the program when the user selects the "quit" option.
1.2. Sample Output
Here are some sample runs of a working program, note output for each menu option and error handling for bad input:
Also note that sample output for each individual menu item is shown in the "Menu Option" sections below.
1.3. Required Functions and TDD
Unlike the previous lab where you did the full TDD yourself, for this lab there are several required functions that you need to implement. The details about these functions are given in the following sections about reading in the data and implementing individual menu options.
There are, however, opportunities for you to do some design as you refine some of these functions. You may find that there are other functions you may want to add to implement sub-steps or to implement common functionality. You are welcome to add any additional function to this assignment.
2. Reading in the Data
Your program should read in data from a file, storing the information in the
CO2 input file into a list of CO2Data objects.
As you read in the data, each line of the file corresponds to one country’s
information, which your program should represent using a
CO2Data object, and store it in a list.
2.1. The CO2 data files
The data we will be working with are stored in a file as comma-separated values.
The starting point code has two definitions for the input file to read from (un/comment out calls to open to use specific file):
-
/usr/local/doc/co2data.csvis a large file of the full data set, the one you should do your main testing with, and the one you should use for your submission. You can read from in its current location…you do NOT need to copy this file it into your labs directory) -
small.csv: a smaller file of just 20 country’s data, included in your lab directory. It is useful for initial debugging and testing, particularly to limit the amount of debug output if you add debug print statements. You cancatout this file to view its contents:cat small.csv
Each line in the input file contains 10 comma-separated values for a country in the following order:
Name,1960,1980,2000,2020,2022,pop_1960,pop_2020,gdp_1960,gdp_2020
For example, here is the information for two countries from the file (note
that Namibia has some missing data (for 1960 and 1980 CO2 levels and for
1960 GDP) that are represented with -1 values (missing values are
represented as -1 in the data files):
Namibia,-1,-1,1.6048,3.6818,3.953,0.634138,2.540916,-1,10.56263738 Nepal,0.080608,0.5413,3.0374,14.9024,15.5,10.10506,29.136808,0.508334414,33.43367051
Each line of the file contains the following information for one country, with each value separated by a comma. The values for each country on each line are as follows:
-
The first value is the name of the Country, which may include white space characters and other non-alphabetic characters. For example,
European Union (27)is the name of one "country" in this list. Your program will use this value as astr. Note that files are sorted alphabetically by county name (i.e. by the first entry in each row). -
The next five values are yearly total CO2 emission values for the years 1960, 1980, 2000, 2020, and 2022 (the most recent year in the data set). The amounts are in units of Mt (Megatons of CO2 emissions). Your program will use these as
floatvalues. -
The next two values are the country’s populations for the years 1960 and 2020. These are in units of millions of people. Your program will use these as
floatvalues. -
The final two values are the country’s total GDP for the years 1960 and 2020. These are in units of billions of US dollar equivalents. Your program will use these as
floatvalues.
|
about missing values
There are missing data for some fields of some countries. Missing
values are encoded as Your program needs to correctly handle any missing values correctly---do
not do arithmetic using Any field that stores a numeric value could have a missing value
(i.e, any CO2 value, population value, or GDP value could be A country’s name field will never be a missing: no country has
a name of |
2.2. The CO2Data class
You will create a CO2Data object for each country’s information you read in
from the data file, and store these as a list of CO2Data objects in your
program (be sure to add them to the list in such a way that ensures you
maintain that they are in the same order in the list as they appear in
the file (in sorted order by country name).
To create a CO2Data object invoke its constructor passing in argument
values from data you read in from the file.
next_item = CO2Data(name, co2_vals, pop_1960, pop_2020, gdp_1960, gdp_2020)
-
name: the name of the country (str) -
co2_vals: a list containing the five CO2 emission values for 1960, 1980, 2000, 2020, 2022 (alist of float) -
pop_1960: 1960 population in Millions (float) -
pop_2020: 2020 population in Millions (float) -
gdp_1960: 1960 GDP in Billions of US dollars (float) -
gdp_2020: 2020 GDP in Billions of US dollars (float)
Once you have created a list of CO2Data objects, each initialized with
data read in from the file, your program can interact with the
objects in the list by invoking some of their method functions.
The method functions in the CO2Data class:
Method |
Returns |
Description |
|
str |
returns the name of the Country |
|
float |
returns the CO2 emission for passed year (int) |
|
list of float |
returns a list of CO2 emission for all 5 years for the country |
|
float |
returns the GDP for passed year (int) |
|
float |
returns the population for the passed
year (int) |
|
None |
prints all information for this object (nicely formatted) |
|
The caller of methods is expected to check the return values and handle
error return values (handle a Your program should not use |
3. Menu and main control flow
After reading the data from a file into the list, your program should enter a loop printing out the menu of options and getting the user’s choice:
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 10
10 is not a valid choice, try again
Enter a value between 1 and 5: 8
8 is not a valid choice, try again
Enter a value between 1 and 5: 3
3.1. Required Features
-
Your program should list the 5 menu options in the exact order as the example shown above. Do not choose a different ordering of operations on the data (e.g., option
2must be the option to list the country with the lowest and the country with the highest GDP in a given year). -
Your program should gracefully handle, and re-prompt for, invalid menu options entered by the user.
-
You should implement a function,
get_value_between(low, high, prompt)that takes two int values for the low and high values in the range,and a string with an instructions message (e.g.,"Select a menu option"), and returns a value betweenlowandhighinclusive. If the value of thelowparameter is larger than thehighparameter, the function should just return the value oflowand not prompt for any input.In the example output above, we called our function like this:
option = get_value_between(1, 5, "Select a menu option")Your program does not need to handle non-integer value input like the user entering
hello thereat the menu options prompt.This function will be useful in other parts of your program.
-
At this point you can implement and test the main loop of your program, and
test implementing option 5. quit.
You could add function stubs for the functions for the other 4 menu options to test out the main control flow of your program. Remember that you may need to go back and modify the set of parameters to these function stubs when you implement each one.
4. Menu Option 1
For this menu option you will prompt the user to enter two values:
-
the year for which they want to list CO2 values
-
the CO2 level above which to list countries for the given year
It will then use linear search on the list of CO2Data objects
to print out all countries above the given limit for the given
year, and their CO2 emission data for that year in tabular format.
Here is an example run of this option (note the error handling of some invalid input values):
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 1
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2001
2001 is not a valid value, try again
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2000
Enter a CO2 limit value
Enter a value between 0 and 20000: 300000
300000 is not a valid choice, try again
Enter a value between 0 and 20000: 1000
Country 2000 CO2 emissions
> 1000.0000
-----------------------------------------------------------
China 3649.2009
European Union (27) 3601.5088
Japan 1263.7548
Russia 1479.1425
United States of America 6010.1359
Total of 5 countries > 1000.0000 in 2000
4.1. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option or in your main function. Instead, your code should make a call to your function that implements this feature when the user selects menu option 1.
-
Add and use a helper function,
get_value_in_setthat takes a list of year values and a prompt string, and returns the year selected by the user. A call to your function should pass a list of years (1960, 1980, 2000, 2020 or 2022, which are the five years with CO2 emissions values) from which the user selects, and a prompt string (which could be"Enter a year (one of 1960, 1980, 2000, 2020, 2022)". The function should not return until the user picks a valid year from the list (see the error input handling in the example output above). This function will be useful in other menu options. -
Use your helper function,
get_value_betweento get a co2 limit value, a value for the lower limit (a value between 0 and 20000). -
Use
linearsearch of the list of CO2Data objects to find any countries whose CO2 emissions data are above the limit and print out the name and CO2 emission level for the given year as you find matches. -
Use methods of the
CO2Dataclass to access appropriate values from each country’s data: Section 2.2 -
Print out each matching country’s name and CO2 emissions in tabular format.
-
Print a header with Country and C02 emission for given year with value above the given amount in the header (format should be close to identical to our output).
-
Use formatted
printto ensure that country names and printed CO2 levels line up. For example%-34sprints a string in a 34 space width, left justified. -
CO2 values should be printed with 4 places beyond the decimal point. For example,
%12.4fis a placeholder for a float value, printed in a field width of12with4places beyond the decimal point.
-
-
You should handle missing values (ignore them in your output). Missing values are represented with
-1.
4.2. Hints/Tips
-
Look at example in-class code for lists, searching, objects, and output formatting.
-
Test your
get_value_in_setfunction independently by adding some calls to it frommain. Try passing different values to ensure that it works correctly. -
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the
CO2DataClass Documentation for methods that might be helpful for implementing this menu option.
5. Menu Option 2
For this option, find the country with the lowest and the highest GDP in the year specified by the user (one of 1960 and 2020) and print out the each country’s name and their GDP value for the year using formatted output.
Here are two runs of this option (note the error handling of some invalid input values):
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 2
Enter one of 1960 or 2020 for the year 2000
2000 is not a valid value, try again
Enter one of 1960 or 2020 for the year 1960
---- GDP in 1960
lowest: 0.01201 in Seychelles
highest: 543.30000 in United States of America
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 2
Enter one of 1960 or 2020 for the year 2020
---- GDP in 2020
lowest: 0.05505 in Tuvalu
highest: 20893.74383 in United States of America
5.1. Required Features
-
This option must be implemented in a separate function. You may add additional helper functions if you wish.
-
Use your
get_value_in_setfunction to get the year value from the user passing in a list of the two year values (1960 and 2020). -
You should handle missing values from the data set and ignore them in your output. Missing values are represented with
-1in the data file. -
Print out the lowest and the highest country’s name and its GDP value to 5 decimal places (
%.5f) with a header line with the value (to 5 decimal places) year in a format identical to our output.
5.2. Hints/Tips
-
It is okay to have one search to find the largest and another search to find the smallest GDP for the given year. You may also combine them into a single search. Either way is fine.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. be sure that your submission is reading from the larger file (
/usr/local/doc/co2data.csv). -
When using the smaller file in particular, you can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
Refer to the
CO2DataClass Documentation for methods that might be helpful for implementing this menu option.
6. Menu Option 3
For this option you will find all countries whose 2022 CO2 emission level is lower than their maximum CO2 emission level over all other previous years. You will print out each countries Name, 2022 level, and max level in tabular format. It should also print the total number of countries for which this is true at the end.
Here is a run of this option (note we are not showing the full output as there are a lot of matches):
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 3
Country 2022 level Previous High
---------------------------------------------------------------------
Albania 4.9547 5.1708
Andorra 0.3686 0.5240
Angola 16.0703 16.7648
Antigua and Barbuda 0.6022 0.6192
Armenia 6.4078 7.7198
Australia 392.2793 396.6852
Austria 61.4884 66.1717
Azerbaijan 38.0627 46.1098
...
United States of America 5057.3038 6010.1359
Uzbekistan 120.6102 123.4773
Venezuela 76.8920 142.8349
Vietnam 343.6066 363.3427
Yemen 11.3563 15.7253
Zimbabwe 8.8560 13.8182
Total of 89 countries with decrease in CO2 in 2022
6.1. Required Features
-
This option must be implemented in a separate function. You may add additional helper functions if you wish.
-
You should handle missing values from the data set and ignore them in your output. Missing values are represented with
-1in the data file. -
Print out each matching country’s name, 2022 CO2 emissions, and max CO2 emissions data in tabular format.
-
Print a header. Its format should be the same as our output.
-
Use formatted
printto ensure that country names and printed CO2 levels line up. For example%-34sprints a string in a 34 space width, left justified. -
CO2 values should be printed with 4 places beyond the decimal point. For example,
%12.4fis a placeholder for a float value, printed in a field width of12with4places beyond the decimal
-
-
Print out the total number of countries whose 2022 was less than a previous max at the end. See our example output for what this should look like.
6.2. Hints/Tips
-
You could first try to find and print out the max value of each country over all years.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer. (be sure to remove all debug print statements from code you submit.
-
Refer to the
CO2DataClass Documentation for methods that might be helpful for implementing this menu option.
7. Menu Option 4
This menu option takes a country name from the user and searches the
list of CO2Data objects using binary search to find a matching
country. If found, its information is printed, otherwise a message
is printed that the country could not be found in the database.
Here are three examples of this option (note the output when a country with the matching name cannot be found):
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 4
Enter the name of a country: Canada
Canada
1960 pop: 17.909009 2020 pop: 38.037204
1960 GDP: 40.461722 2020 GDP: 1645.423408
1960 1980 2000 2020 2022
192.716200 442.846900 567.096100 522.845300 547.943900
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 4
Enter the name of a country: Zimbabwe
Zimbabwe
1960 pop: 3.776679 2020 pop: 14.862927
1960 GDP: 1.052990 2020 GDP: 18.051171
1960 1980 2000 2020 2022
5.943100 9.614000 13.818200 7.849600 8.856000
---------- Menu Options -----------
1. list all countries w/C02 level above a given value in given year
2. list country w/lowest and country w/highest GDP in given year
3. list all countries whose CO2 level decreased in 2022 from its
largest value over all previous years
4. print all of a country's information
5. quit
Select a menu option
Enter a value between 1 and 5: 4
Enter the name of a country: USA
Sorry, USA is not in the database
7.1. Required Features
-
You must use binary search to find the country with a matching name in your list of
CO2Dataobjects (your list should be in sorted order by country name, matching the order of the file). -
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 4.
-
Use the
printmethod of theCO2Dataobject class to print out its information in the format shown above (the print method prints it out in this format for you!). -
Use methods of the
CO2Dataclass to access appropriate values from each country’s data: Section 2.2. -
If the country is not in the database, print out a message saying it is not present. Your program does not have to handle cases when the name of a country is entered using a case that doesn’t match how it appears in the database (e.g., it is fine if your program says that
canadais not in the database even thoughCanadais).
7.2. Hints/Tips
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file. You can add debug print statements to print out values to see if you are finding the right answer (be sure to remove all debug print statements from code you submit.
-
Make sure to test your code for different search cases. The smaller file might be helpful for this at first.
-
Refer to the
CO2DataClass Documentation for methods that might be helpful for implementing this menu option.
8. General Tips for the lab
-
Refer to this page as you work: As you implement this lab, keep the lab assignment page open in a browser and refer often to as you go.
-
Look at the requirements and hints for the part you are working on.
-
Look at the example output to see how the option works, and for what your output should look like.
-
Refer to the CO2Data Documentation for method functions that will help you implement your solution.
-
-
Continue to use TDD: We strongly recommend that you create a top-down design with a fully fleshed out
mainand stubs for all other functions. However, you may not work with a partner on this lab. You don’t need to get approval from an instructor for your design, though we’d be glad to review it with you in lab, office hours, or ninja sessions. You should only begin implementing the other functions after you have a clear plan in place for the structure ofmain. -
Use incremental testing and debugging: continue to test and debug functions in isolation.
-
Run your program on the smaller input file first for debugging. You can cat out the file contents and look at the values to see if your program is doing the right thing. (be sure to revert back to using the large file for further testing and for the final version you submit)
-
Add debug print statement to help debug (be sure to remove them once you debug functionality)
-
-
Create modular, reusable functions: Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. We have given you definitions of functions you are required to write, and a guide for TDD of your program, but for other functions you design think about this goal.
Answer the Questionnaire
After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 08) from the dropdown menu on the first question.
Once you’re done with that, you should run handin21 again.
Submitting lab assignments
Remember to run handin21 to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21 after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
First quit any applications you are running, including your vscode editor, the
browser and the terminal. Then click on the logout icon
(
or
) and
choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log
out. It is, however, a good idea to lock your machine while you are gone. You
can lock your screen by clicking on the lock
icon.
PLEASE do not leave a session locked for a long period of time. Power may go
out, someone might reboot the machine, etc. You don’t want to lose any work!