CS21 Lab 8: Climate Database: Searching

Due Saturday, April 6, by 11:59pm

Goals

The goals for this lab assignment are:

  • Practice programming with file I/O

  • Practice using real-world data

  • Practice using binary and linear search

  • Practice writing a menu-driven program

  • Get more practice with top down design and writing functions

  • Get more practice with incremental implementation and testing

Getting and Working on Lab08 Code

Make sure you are working in the correct directory when you edit and run lab 08 files.

Run update21 to get the starting point code of the next lab assignment. This will create a new ~/cs21/labs/08 directory for your lab work. Edit and run Lab 08 programs from within your cs21/labs/08 directory.

Make sure all programs are saved to your cs21/labs/08 directory! Files outside that directory will not be graded.

$ update21
$ cd ~/cs21/labs/08
$ pwd
/home/username/cs21/labs/08
$ ls
(should see your program files here)

Then edit and run program files in your ~/cs21/labs/08 directory:

$ code filename.py
$ python3 filename.py

Programming Tips

As you write programs, use good programming practices:

  • Use a comment at the top of the file to describe the purpose of the program (see example).

  • All programs should have a main() function (see example).

  • Use variable names that describe the contents of the variables.

  • Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.

  • Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.

  • Avoid writing any lines of code that exceed 80 columns.

    • Always work in a terminal window that is 80 characters wide (resize it to be this wide)

    • In vscode, at the bottom right in the window, there is an indication of both the line and the column of the cursor.

Function Comments

All functions should have a top-level comment! Please see our function example page if you are confused about writing function comments.

Lab08 Overview

Climate scientists often use computer programs to simulate the interactions of the earth’s atmosphere, oceans, land surface, and ice, and also to analyze past data in order to understand the effects of human activity on the climate and to make predictions of future climate changes.

In this assigment you will write a program climate.py that reads in climate data from a file, and then allows the user to select a specific query about the data from a menu of options and display the result.

The data we are using was obtained from Climate Watch.

The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dolars), and population (in millions of people) data for most countries in the world.

However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.

We will use this data set in the next lab assignment, too, so you will reuse some of the initial processing of the data and some helper functions you write for this lab when you complete the next one.

General Requirements

  • Your program should have many functions, including: reading in the data from the file, one for performing each operation, one for the main data processing loop, and some helper functions for reading in and checking input values in different ways. We leave many of the function definition and design up to you, but you should define others the way we specify.

  • Your program will use binary search on different data fields; note that the data in the file is sorted by countries' names.

  • You program should be well designed: use good modular design, have complete function comments, use descriptive variable names, have no line wrapping, and be robust to input errors; the requirements for each part of the assignment describe what types of bad input your program does and does not need to handle.

General Hints/Tips

  • Use what you know about good Top-down design and incremental implementation and testing to implement this large program. Do not try to complete it all in one sitting! Rather, implement and test each piece a little bit at a time.

  • Refer to in-class code for examples. You likely need to refer to code from many different weeks depending on the example you are looking for, as we’ve covered functions, while loops, formatted output, file I/0, searching, strings, lists, etc. over many weeks.

  • Use the print() function to add debug statements to help you see what your program is doing as you try to find and fix bugs (and be sure to remove these after you fix the bugs, though!).

Example Run

The link below is output from an example run of a working program that chooses each menu option, some more than one time. Additionally, with the details of each Menu Option in the sections below, we also show example output of just that menu option.

The following are the details of each of the main parts of your program:

1. Create a List of Country Objects

We will provide the definition of a class called Country that is further described below.

The first thing your program will do is read in data from the file and create a list of Country objects. Each country’s information is on a single line of the file. Individual values are comma separated.

The file is in sorted order by country name, and your resulting list should maintain that order (entries in sorted order by country name).

We suggest that you write a get_data function to perform this action. get_data should take the name of the climate data file as its argument and returns a list of Country objects, one per line in the file. Your code can assume that the file exists.

There are two files you can use to run your program:

  • /usr/local/doc/climatewatchdata.csv: is the full set of climate data. Your submitted solution should open and work correctly with this file.

  • small.csv: is a smaller file with information about just 20 countries. It may be useful to use this file when you are first debugging and testing some of your program’s functionality.

For each line of the file you read in, you should:

  1. process the line from the file to extract the 10 values for the country (one string and 9 floats). See the details about the input file’s format below.

  2. create a Country object with these data. See information about the Country class below.

  3. add to the Country object to the list to be returned by the function, making sure that the resulting list will be in sorted order by country name (matching the order of the set of country information in the file).

1.1. Input File Format

Each line in the input file contains 10 comma-separated values for a country in the following order:

Name,1960,1980,2000,2020,2022,pop_1960,pop_2020,gdp_1960,gdp_2020

For example, here is the information for two countries from the file (note that Namibia has some missing data (for 1960 and 1980 CO2 levels and for 1960 GDP) that are represented with -1 values:

Namibia,-1,-1,1.6048,3.6818,3.953,0.634138,2.540916,-1,10.56263738
Nepal,0.080608,0.5413,3.0374,14.9024,15.5,10.10506,29.136808,0.508334414,33.43367051

The values for each country on each line are as follows:

  • The first value is the name of the Country, which may include white space characters and other non-alphabetic characters. For example, European Union (27) is the name of one "country" in this list. Your program will use this value as a str.

  • The next five values are yearly total CO2 emission values for the years 1960, 1980, 2000, 2020, and 2022 (the most recent year in the data set). The amounts are in units of Mt (Megatons of CO2 emissions). Your program will use these as float values.

  • The next two values are the country’s populations for the years 1960 and 2020. These are in units of millions of people. Your program will use these as float values.

  • The final two values are the country’s total GDP for the years 1960 and 2020. These are in units of billions of US dollar equivalents. Your program will use these as float values.

about missing values

There are missing data for some fields of some countries. Missing values are encoded as -1 in the file. Your program should initialize any Country object field with a -1.0 float value for these missing values.

Your program needs to correctly handle any missing values correctly---do not do arithmetic using -1 values, but skip over these values instead.

Any field that stores a numeric value could have a missing value (i.e, any CO2 value, population value, or GDP value could be -1 in the input file).

A country’s name field will never be a missing: no country has a name of -1.

1.2. The Country Class

The starter code imports the Country class that you should use: you will create a list of Country objects, one for each line in the file.

The Country class constructor is invoked passing in the following information:

next_country = Country(name, co2_vals, pop_1960, pop_2020, gdp_1960, gdp_2020)

This creates a new Country object with the following values:

  • name: the name of the country (str)

  • co2_vals: a list containing the five CO2 emission values for 1960, 1980, 2000, 2020, 2022 (a list of float)

  • pop_1960: 1960 population in Millions (float)

  • pop_2020: 2020 population in Millions (float)

  • gdp_1960: 1960 GDP in Billions of US dollars (float)

  • gdp_2020: 2020 GDP in Billions of US dollars (float)

1.3. All Country Class Methods

Here is complete information about the Country class and its method functions: Country Class Documentation

1.4. Example output (reading file)

Here is program output from the first step: reading in the file and creating the list of objects. Your program should print out the number of countries in the file, i.e. the number of elements in the list that is returned from the call to your get_data function:

$ python3 climate.py
There are data for 193 countries in this file

2. Main Loop and the Menu

After your program reads in the file data and creates a list of Country objects, it should call a function that, in a loop:

  1. prints out a menu of options for the user to choose from

  2. performs the operation on the data, and displays the results (some output needs to be in tabular form, details below)

Your program should repeat these steps until the user chooses the menu option to quit.

The six menu options for getting information about the climate data set include:

  1. Print the name and population value of the least populated country and the most populated country in the year given by the user (either 1960 or 2020).

  2. List all countries with a yearly CO2 emissions level above a lower bound level, for a specified year. The user enters values for the lower bound level and for the year (one of 1960, 1980, 2000, 2010, 2020, 2022).

  3. Print the name and CO2 level of the country with largest CO2 per GDP for a given year. The user enters the year (either 1960 or 2022).

  4. Print the name and population of all countries with population larger than a given value for given year. The user enters the value and the year (either 1960 or 2022).

  5. Print out all information about a country given its name entered by the user.

  6. Quit

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this part. One requirement is that you implement a specific helper function, get_value_between, that you will use in several places in your program.

2.1. Example Output: Menu

Here is example output from a working program that reads in the data, prints out the menu, reads in an option from the user with 6 as the quit option, and performs the action (note how it handles bad input values):

$ python3 climate.py
There are data for 193 countries in this file

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 9
  9 is not a valid choice, try again
Enter a value between 1 and 6: 7
  7 is not a valid choice, try again
Enter a value between 1 and 6: -3
  -3 is not a valid choice, try again
Enter a value between 1 and 6: 6
bye bye

2.2. Required Features

  • Your program should list the 6 menu options in the exact order as the example shown above. Do not choose a different ordering of operations on the data (e.g., option 3 needs to show the country with the largest CO2 emissions per GDP for a given year).

  • Your program should gracefully handle, and re-prompt for, invalid menu options entered by the user.

    You should implement a function, get_value_between that takes a string with instructions (like "Select a menu option") and two int values, low and high, and returns a value between low and high inclusive. If the value of the low parameter is larger than the high parameter, the function should just return the value of low and not prompt for any input.

    In the example output above, we called our function like this: option = get_value_between("Select a menu option", 1, 6)

    Your program does not need to handle non-integer value input like the user entering hello there at the menu options prompt.

    This function will be useful for implementing some other parts of your program.

  • Menu option 6 (quit) should be implemented with this step. Print out a good bye message and return from your main menu function back to main.

  • Your main menu looping should work at this point. For menu options not yet implemented, just print out the menu option selected by the user, and then your function should repeat its main actions: print out the menu; get the next selection from the user; repeat.

  • Feel free to add any other helper functions you’d like.

2.3. Hints/Tips

  • Implement and test your get_value_between function independently by adding some calls to it from main to test different values. Then add the call to this function to your main loop function. Try for values other than just 1 and 6, and for different instruction strings as well.

  • Refer to the in-class programs and to previous lab assignments that use while-loops and functions.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

3. Menu Option 1

Implement a function to perform menu option 1: print out information about the country with the lowest and the country with the highest population in a given year.

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option. One requirement is that you implement a specific helper function, get_value_in_set, that you will use in this, and in other, menu options.

3.1. Example Output

Here is some example output from this option:

$ python3 climate.py
There are data for 193 countries in this file

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 1
Enter one of 1960 or 2020 for the year: 1981
  1981 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 1956
  1956 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020

---- Population in 2020 ----
lowest:       0.01083 M in Nauru
highest:   1411.10000 M in China

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 1
Enter one of 1960 or 2020 for the year: 1
  1 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 1960

---- Population in 1960 ----
lowest:       0.00438 M in Nauru
highest:    667.07000 M in China

========  Menu Options: ========
...

3.2. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to your function that implements this feature when the user selects menu option 1.

  • Implement a helper function, get_value_in_set that takes a list of values and a prompt string and returns the user’s entered choice from one of the values in the set. You will use this function in many other menu options. For this one, you will use it to have the user enter a value for one of the two years for which there are population data (either 1960 or 2020).

  • You should handle missing values from the data set and ignore them in your output. Missing values are represented with -1 in the data file. Note: any field in a Country object that stores a numeric value could be missing (i.e, a CO2 value, a population value or a GDP value). A country’s name will note be a missing value.

  • You should print out each country’s name and the population, and a header line with the year in a format similar to our output.

    • Use formatted print to ensure that the population and name values of the two countries with the lowest and highest populations for the given year line up vertically in the output.

    • Population values should be printed with 4 places beyond the decimal point. For example, %12.4f is a placeholder for a float value, printed in a field width of 12 with 4 places beyond the decimal point.

    • Print out a heading for the data returned that includes the year.

  • Use methods of the Country class to access appropriate values from each country’s data: Section 1.3

3.3. Hints/Tips

  • Test your get_value_in_set function independently by adding some calls to it from main. Try passing different values to ensure that it works correctly.

  • Look at example in-class code for lists, searching, objects, and output formatting.

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

4. Menu Option 2

Implement a function to perform menu option 2: list all countries in a given year that have a CO2 level above an amount entered by the user.

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.

4.1. Example Output

Here is some example output from this option:

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 2
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2
  2 is not a valid value, try again
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 1980
Enter a lower CO2 limit value
Enter a value between 0 and 20000: 1000

      Country                1980 CO2 emissions > 1000.0000 Mt
--------------------------------------------------------------
 China                                       1494.4959
 European Union (27)                         4077.5007
 Germany                                     1100.0660
 Russia                                      2129.1103
 United States of America                    4808.5564

========  Menu Options: ========
...

Select a menu option
Enter a value between 1 and 6: 2
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2022
Enter a lower CO2 limit value
Enter a value between 0 and 20000: 1000

      Country                2022 CO2 emissions > 1000.0000 Mt
--------------------------------------------------------------
 China                                      11396.7774
 European Union (27)                         2761.9071
 India                                       2829.6442
 Japan                                       1053.7979
 Russia                                      1652.1773
 United States of America                    5057.3038

========  Menu Options: ========
...

4.2. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to your function that implements this feature when the user selects menu option 2.

  • Use your helper function, get_value_in_set to get a value for the year from the user (one of 1960, 1980, 2000, 2020 or 2022, which are the five years with CO2 emissions values).

  • Use your helper function, get_value_between to get a value for the lower limit (a value between 0 and 20000).

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • You should print out each country’s name and CO2 emissions in tabular format, and a header line with the year in a format similar to our output.

4.3. Hints/Tips

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

5. Menu Option 3

Implement a function to perform menu option 3: print out information about the country with the highest CO2 emissions per GDP for a given year, i.e. the one for which emissions divided by GDP is largest in that year.

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.

5.1. Example Output

Here is some example output from this option:

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 3
Enter one of 1960 or 2020 for the year: 2000
  2000 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020

---------- largest per GDP in 2020
  Iran                                  2.9325/B

========  Menu Options: ========
...

5.2. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 3.

  • Use your helper function, get_value_in_set to get a value for the year from the user (one of 1960 or 2020, the two years with GDP info).

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • You should print out each country’s name and GDP, and a header line with the year in a format similar to our output.

5.3. Hints/Tips

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

6. Menu Option 4

Implement a function to perform menu option 4: print out information about all countries with a population larger than a given value for the given year.

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.

6.1. Example Output

Here is some example output from this option:

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 4
Enter one of 1960 or 2020 for the year: 2322
  2322 is not a valid value, try again
Enter one of 1960 or 2020 for the year: -3
  -3 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020
Enter population lower bound in millions of people
Enter a value between 0 and 8000: -3
  -3 is not a valid choice, try again
Enter a value between 0 and 8000: 66666666666666
  66666666666666 is not a valid choice, try again
Enter a value between 0 and 8000: 300

 Country                   2020  Population > 300.000000 M
----------------------------------------------------------
 China                                    1411.100000
 European Union (27)                       447.479493
 India                                    1380.004385
 United States of America                  331.501080

========  Menu Options: ========
..

6.2. Required Features

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 4.

  • Use your helper function, get_value_in_set to get a value for the year from the user (one of 1960 or 2020, the two years with population info).

  • Use your helper function, get_value_between to get a value for the lower bound for the population (a value between 0 and 8000).

  • You should handle missing values (ignore them in your output). Missing values are represented with -1.

  • You should print out each country’s name and the population in tabular format, and with a header line with the year. Your output should be similar to ours.

6.3. Hints/Tips

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

7. Menu Option 5

Implement a function to perform menu option 5: print out a country’s information.

Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option. One requirement is that you use binary search.

7.1. Example Output

Here is some example output from this option:

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 5

Enter the name of a country: Greece

Greece
  1960 pop:     8.331725  2020 pop:    10.700556
  1960 GDP:     4.335186  2020 GDP:   188.835202
          1960         1980         2000         2020         2022
      9.391500    50.888700   102.973200    55.619800    59.662800

========  Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit

Select a menu option
Enter a value between 1 and 6: 5

Enter the name of a country: hello

Sorry, hello is not in the database

========  Menu Options: ========
...

7.2. Required Features

  • You must use binary search to find the country with a matching name in your list of Country objects.

  • This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 5.

  • You may print out the country’s information by calling print function passing in the Country object. (See the Country Class Documentation)

  • If the country is not in the database, print out a message saying it is not present.

  • Your program does not need to handle user’s entering a country’s name in the wrong case. For example, if the user enters greece instead of Greece, it is fine if your program prints out that greece is not in the database.

7.3. Hints/Tips

  • Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.

  • You may add other helper functions to implement this feature if you’d like.

  • Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.

Answer the Questionnaire

After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 08) from the dropdown menu on the first question.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, including your vscode editor, the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!