CS21 Lab 5: Baby Names

Due before 11:59pm Saturday, March 1st, 2014

This lab assignment requires you to write one large program in Python. First, run update21. This will create the cs21/labs/05 directory and copy over the starting-point files for your programs. Next, move into your cs21/labs/05 directory and begin working on the python program for this lab. We will only grade files submitted by handin21 in this labs directory, so make sure your programs are in this directory!

Baby Names

In this week's lab, you will create an interface that allows to people to search the New York State baby name data base. The data base includes the names of all babies born in New York from 2007-2011 (provided at least 5 babies were born with that name in a county). The users of your program will be able to ask questions like "How many babies were born in New York in 2007?", "How many babies were born with the name Sebastian?", and "What was the most popular girl's name in New York in 2010?"

For this lab, you will write one large program rather than a series of smaller programs. While in past labs, you have written the whole program from scratch, this week you will begin with a partially completed program. In its current state, the program runs, though it doesn't do very much. You will fill in the gaps in program to make it work properly.

We have given you a skeleton for the program in the file names.py. The skeleton provides you with a program that includes 9 functions. You will need to complete each of these 9 functions, but you do not need to write any new functions on your own.

Here is an example of what your program will look like when you're all done:

Welcome to the New York State baby database.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 1
What year are you interested in? 2008
In 2008, 120472 babies were born.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 2
What name are you interested in? Richard
In the database, 1025 babies had that name

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 3
What name are you interested in? Dylan
What year are you interested in? 2007
What sex is the child (M/F)? F
18 babies matched that search criteria.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 3
What name are you interested in? Dylan
What year are you interested in? 2006
Enter an integer between 2007 and 2011.
What year are you interested in? last year
Enter an integer between 2007 and 2011.
What year are you interested in? 2007
What sex is the child (M/F)? boy
Please enter M or F.
What sex is the child (M/F)? M
809 babies matched that search criteria.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 3
What name are you interested in? $$$#@#!@#
What year are you interested in? 2007
What sex is the child (M/F)? M
0 babies matched that search criteria.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 4
What year are you interested in? 2008
What sex is the child (M/F)? M
Michael was the most popular name.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 4
What year are you interested in? 2008
What sex is the child (M/F)? F
Isabella was the most popular name.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

Your choice? 5

The Data
The data base is stored in a list of lists. We have provided you a function that reads in the data. In main(), you will see that the first line says:
dataset = read_names()
The read_names() function reads all of the data for you and there is nothing you need to change in that function. If you to print out the variable dataset, you'd realize that there were a lot of babies born in New York! Here is a snippet of what you'd see on the screen:
[[2007, 'AALIYAH', 'F', 138], [2007, 'AARON', 'M', 459], [2007, 'ABBY', 'F', 16], 
 ...
 [2011, 'ZOE', 'F', 338], [2011, 'ZOEY', 'F', 169], [2011, 'ZOIE', 'F', 5]]
You can read this if you know what you're looking at. Each element of the list of baby names is itself a list. Each of these sub-lists has 4 elements: a year, a name, a sex ('F' or 'M'), and the number of babies with that name and sex born in that year. So, in 2007, there were 138 girls ('F') named Aaliyah born and there were 459 boys ('M') named Aaron born. In 2011, there were 338 girls named Zoe and 169 girls named Zoey.

For any individual sublist, we can extract each of the fields using indexing. Here is an interactive Python session examining row 5092 of the dataset:

>>> row = dataset[5092]
>>> row
[2011, 'RICHARD', 'M', 175]
>>> row[0]
2011
>>> row[1]
'RICHARD'
>>> row[2]
'M'
>>> row[3]
175
The Skeleton Code
The program we have given you (names.py) runs. You can try it now.
egg[05]$ python names.py
Welcome to the New York State baby database.

What would you like to do?
1. Count how many babies were born in a specific year.
2. Count how many people had a specific name in the entire database.
3. Count how many boys or girls had a specific name in a specific year.
4. Find the most popular boy or girl name for a specific year.
5. Quit.

egg[05]$  
Notice that the program prints out a menu and then just quits.

Open the program in either emacs or vi. Read through the main() function we have given you. You'll see that first we call the read_names() function to read all of the names into the variable dataset. Next, we print welcome message. Then, we enter a while loop where we present the user with a menu(). Depending on the choice that the user makes, we decide whether we'd like to call a particular function or just quit. If the user didn't choose to quit, we continue in the while loop. When we run the program now, the program quits right away. Why?

If you continue to look through the program, you'll find that although we gave you a menu() function, there isn't much in it. It's supposed to present the user with a menu, ask them to enter a value between 1 and 5, and then return the user's selection. The menu() function does print out the menu, but then it calls the valid_int() function. As written, the valid_int function always returns 5, which corresponds to our "Quit" choice above. Your task in this lab is to fill in the missing pieces not just in the valid_int() function, but in all of the functions we've given you -- including main().

In each function, we tell you what is missing. For example, in the valid_int() function, the comment describes what the expected behavior of the function is. You have to write the code in that function so that your function matches the description of the behavior. In addition, you need to replace the line where we have written return 5 with a line that returns the valid integer that the user typed in.

How to go about completing the lab

The absolute best way to go about completing the lab is to write one function at a time, test that function, be sure it's working properly, and then move on to the next function. Since not much is going to happen until the user can enter values at the menu, it makes sense to start by writing the valid_int() function. That will allow your users to start entering values at the menu.

Once you've written valid_int(), your users can enter 1 at the menu. We've already given you code in main() that handles the case where the user types 1: first it asks the user to enter a year (and validates this using the valid_int() function you just wrote) and then it calls the babies_born_in() function. Notice that the babies_born_in() function currently returns 1000. That is obviously not going to be the right answer for every name. However, we've given you 1000 because it's a plausible answer it allows you to run the whole program (without it crashing) until such time as you complete the function properly. You'll want to be sure that you replace the line return 1000 with a line that returns the answer you calculate.

Go ahead and finish writing babies_born_in() and then test your program to be sure it works. (As a helpful check, there were 120472 babies born in 2008.)

You're now well on your way! Think about what you'll need to do to complete menu option 2. Try to think about what functions you'll need to call (from the ones we've already given you) and then write only that function or functions necessary for option 2. Run your program and be sure it's working. Don't go on to the next menu item until you're sure the previous one is done! Managing a large program like this can only be done by working step by step through the program and testing as you go along.

3. Hacker's Challenge

These problems are optional bonus problems. They are not required, and you should not attempt it until you are completely finished with the other lab problems. It will also not get you any extra points -- you will only be rewarded with extra knowledge and satisfaction. :)

  1. Display a chart showing how popular a particular name was in each of the 5 years in the data set.
  2. Display the top 10 most popular names across all five years. (See this webpage for hints.)
  3. There are lots of other cool ideas that we didn't even mention here. Implement one or two of your choosing.

Submit

Once you are satisfied with your programs, hand them in by typing handin21 at the unix prompt.

You may run handin21 as many times as you like. We will grade the most recent submission submitted prior to the deadline

Remember: for this lab, programs must appear in your cs21/labs/05 directory. If you create your programs in a different directory, use the unix mv or cp commands to move or copy them into the cs21/labs/05 directory. For example:

 cp  names.py  ~/cs21/labs/05/names.py
Notes
The original data set can be downloaded from here, but the data set used for this lab conflates the counties together and can be downloaded here.