CS35 Lab 11: The Oracle of Bacon

Due by 11:59 p.m., Wednesday, April 25, 2012

Note the due date is a Monday. I encourage you to work with a partner as the concepts can be quite difficult to tackle on this lab.

The Oracle of Bacon searches a movie database to find a path between Kevin Bacon and other actors using a graph defined by the co-star relationship: two actors are connected by an edge if those two actors co-starred in a movie together. An actor's Bacon number is the length of the shortest path between that actor and Kevin Bacon. For example, the Oracle of Bacon might report that Kevin Bacon is connected to Kurt Russell with a length-2 path:

  Kurt Russell was in Mean Season, The (1985) with Andy Garcia
  Andy Garcia was in Air I Breathe, The (2007) with Kevin Bacon

  Kurt Russell has a Kevin Bacon number of 2

Are you a movie buff? Try guess the Bacon Number yourself and test it out. The number of movies you need to report is the link number. Can you find two movie stars with a link number greater than 3? Greater than 5? It is harder than it sounds.

This game extends to other actors. For example, we could ask what the Giant number is for Andre the Giant and any other actor, say Justin Bieber.

  Justin Bieber was in School Gyrls (2009) with Fred Willard
  Fred Willard was in I Could Never Be Your Woman (2007) with Wallace Shawn
  Wallace Shawn was in Princess Bride, The (1987) with Andre the Giant

  Justin Bieber has a Andre the Giant number of 3

In this lab you will first implement several graph algorithms and then use the Graph and other data structures from this course to respond to queries about movies and actors.

As usual, you can get the initial files for this lab by running update35. The new files are listed here, and the files that you will need to edit are highlighted in blue:

The remainder of this document describes your work to complete this lab.



Implementing the graph algorithms

You should start by reviewing our implementation of the Graph class. Our Graph uses an edge-list representation: for each vertex we store a list (an STL vector, actually) of the outgoing edges from that vertex. The adjacency-list representation for all vertices - that is, edges are stored as part of a vertex object so that we can access all outgoing edges for a single vertex efficiently. Our implementation uses a HashTable, with vertices as the keys and their edge-lists as the values. If the details are difficult to follow, you should at least be able to follow the data structure interface. You will not need to implement any Graph methods, but you will need to understand how to use them. Before moving on, create a Graph object in your testGraph.cpp file.

Please complete each function marked with a TODO comment in graph-algorithms.inl. These standard graph search algorithms will help you implement your Oracle of Bacon program later on. Before moving forward, you must test each function you implement using the testGraph program.

Feel free to add additional methods, although you will need to ensure you add their declaration in the header file. At a minimum, your functions are:

As an example, you could construct a graph to represent our search problem of trying to get from Parrish Halle to The Ville (way back in week 5). If you do this correctly, your test may look like this:

$ ./testGraph 
Let's build a map of Swarthmore!

The graph: 
Softball: {Ville, Soccer, Tunnel, Train}
Bridge: {Tennis, Wharton}
Sharples: {Clothier, Tunnel}
Soccer: {Softball, Tennis}
Wharton: {Bridge, Clothier}
Magill: {Parrish, Train}
Tennis: {Soccer, Bridge, Tunnel}
Parrish: {Magill, Clothier}
Train: {Magill, Ville, Softball}
Ville: {Train, Softball}
Tunnel: {Softball, Sharples, Tennis}
Clothier: {Wharton, Parrish, Sharples}

How do I get from Parrish to the Ville?
        Using DFS:
                Parrish
                Clothier
                Sharples
                Tunnel
                Softball
                Ville

        Using BFS:
                Parrish
                Magill
                Train
                Ville

The shortest path length is: 3

The average distance from Parrish to all other locations is: 2.54545



The Oracle of Bacon

Your main work for this lab is to implement oracle.cpp. This program should first read a movie database file, then provide an interactive menu that allows the user to request information from your movie database. Specifically, your program should support the following queries:

  1. Given an actor's name, report all the movies he or she appeared in.
  2. Given a movie title and year, report all the actors that appeared in that movie.
  3. Given two actor's names, report the shortest path between them using the graph defined by the co-star relationship. If no path exists, report that fact to the user.
  4. Given an actor's name, compute the average distance between that actor and every other actor in the given actor's connected component, using the graph defined by the co-star relationship.

Your program should allow the user to repeatedly select a menu option until they select a "Quit" option or type CTRL-D. When possible, you should gracefully handle incorrect input such as when the user enters an actor or movie name not in the database.



The Bacon data files

The data files for this lab are in the /usr/local/doc directory, accessible from the CS lab computers. Some of these files are moderately large. We strongly urge you to test your program using the smaller files , and only use the larger files when you are confident that your program works. The files are:

A typical line in the file looks like:
Kevin   Bacon   A Few Good Men  1992
where each field (name, title, year), is separated by a tab.



Some helpful hints

You might find the following hints more or less helpful:



Going further

As an extension to the Oracle game, implement Djikstra's algorithm as a W getDistance(V src, V dest, Graph* g) function inside graph-algorithms.h and graph-algorithms.inl. To test your implementation, use a variant of the Bacon co-star graph where an edge weight is 2011-y with y being the most-recent year for a movie that two actors co-starred in together.

A similar concept is the Oracle of Baseball which links baseball players through common teams. For example, linking Babe Ruth to David Ortiz reports:


Babe Ruth   played with   Ben Chapman 	for the 1930 New York Yankees
Ben Chapman   played with   Early Wynn 	for the 1941 Washington Senators   
Early Wynn 	played with 	Tommy John 	for the 1963 Cleveland Indians
Tommy John 	played with 	Roberto Kelly 	for the 1988 New York Yankees
Roberto Kelly 	played with 	David Ortiz 	for the 1997 Minnesota Twins 	

Give this Oracle a try too if you enjoy baseball. The terms Bacon Number and Erdős number are based on similar linking numbers in movie and math publication contexts. For more insanity, read about the Erdős-Bacon Number.



Submitting your work

Once you are satisfied with your program, hand it in by typing handin35 at the unix prompt. You and your partner should handin just one copy of your lab.

You may run handin35 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize after handing in some programs that you'd like to make a few more changes to them.

About the Data
The data for the baseball player info is used in the site baseball-reference and is available in an alternate (more verbose) format at baseball-databank.org.

The movie data comes from The Internet Movie Database and is available in an alternate (way more huge) format as part of their alternate interfaces. The data has been cleaned up a bit to remove TV shows and a number of unsavory direct to video releases (e.g., "The Land Before Time XIII: The Wisdom of Friends").