CS35: Lab 10: Graph Algorithms and Applications

Due Friday May 1 at 11:59pm.

Overview

The main goal of this lab is to gain experience working with graph algorithms. A secondary goal is to apply them on several different graph applications. We’re providing you with starter code for the graph data structures, some basic unit tests, and a helper function that loads in a graph from a file. You will implement three graph algorithms and then write three short programs that use these algorithms.

This is an individual lab.

The URL for your repository will be

git@github.swarthmore.edu:cs35-s20/lab10-<username>

where username is your swarthmore user name.

Starting Code

Your starting code can be found in your lab10-<username> repo. The following is a description of the repository’s initial contents; bolded files are those which you must change to complete this lab.

  • adts/: As with your previous labs, the ADT interface declarations are stored here. The most notable additions for this lab are edge.h, graph.h and priorityQueue.h, stlMinPriorityQueue, stlMaxPriorityQueue.

  • directedGraph.h/directedGraph.cpp: A data structure for directed graphs, using adjacency lists.

  • undirectedGraph.h/undirectedGraph.cpp: An undirected graph data structure using adjacency lists.

  • graphAlgorithms.h/graphAlgorithms.cpp: a file containing functions for the three graph algorithms you will implement. This file also contains a helper function loadGraphFromFile which handles reading in a graph from a file.

  • test_data: a directory containing several graphs you can test/run your programs on.

  • tests.cpp: The unit testing program you will use to confirm that your graph algorithms are correct.

  • manualTests.cpp: A sandbox test program for you to use while you develop your program.

  • reachable.cpp: A program which reads in a graph and determines whether there is a path between a source and destination node.

  • communication.cpp: A program which reads in an undirected graph representing a wireless communication network and determines whether there is a short path between two wireless devices.

  • outbreak.cpp: A program which reads in a directed graph modeling potential infectious disease outbreaks and predicts how long it will take for an infectious disease to spread.

Graph File Format

All of the graphs you’ll use this lab will come from files, provided in the test_data directory in your repository. All graph files are written in the following format:

  • The first line contains two integers n and m, representing the number of vertices and edges in the graph respectively.

  • The second line contains either the string weighted or the string unweighted, specifying whether the graph is a weighted or unweighted graph.

  • The third line contains either the string directed or undirected, specifying whether or not the edges of the graph are directed.

  • The next n lines contain a single string (one word, no spaces) representing the name of a vertex

  • The next m lines each contain information about a single edge. First the line lists the source and destination vertices. If the graph is weighted, the line will also have an integer edge weight.

We have provided loadGraphFromFile(<filename>) in graphAlgorithms.h.,cpp to read graph files for you. It may still be useful to understand the format in case you want to write new test files. This function takes a the name of a graph file as input, creates the appropriate graph, and returns a pointer to the new Graph.

Part I. Implementing Graph Algorithms

In graphAlgorithms.cpp, you will implement three different graph algorithms.

Implementing reachableDFS

Given a source node (src), destination node (dest), and a pointer to a graph object (g), this method should return true if a path exists between src and dest. Your algorithm must utilize depth-first search to solve the problem.

Implementing shortestLengthPathBFS

Given a source node (src), destination node (dest), and a pointer to a graph object (g), this method should return a list of all vertices in the shortest length path between src and dest. If no path exists, the method will throw a runtime_error. The list should the be nodes in the path, in order, starting with src. Your algorithm must utilize breadth-first search to solve the problem.

Implementing singleSourceShortestPath

Given a source node (src), and a pointer to a graph object (g), this method should return the shortest cost path from src to each vertex in the graph. The value is returned as a (dynamically allocated) dictionary that maps a vertex (string) to the shortest path cost (int).

Note: For some graphs it might not be possible to reach every node in the graph starting from the source node. In class we interpreted unreachable nodes as having distance "infinity". However, C++ doesn’t have a predefined value for infinity. Instead, we’ll just not include unreachable nodes in the dictionary. The dictionary you return should contain only vertices reachable from the source node.

Also note: we won’t cover this algorithm until Tuesday’s class. We encourage you to start with the first two algorithms, then complete the first two applications (reachable.cpp, communication.cpp), and then implement this algorithm.

Part II. Graph Applications

To gain experience using the algorithms you implemented in part I, you will write three short programs that use these graph algorithms.

reachable.cpp

For your first application, implement a basic program which loads in a graph, checks whether two vertices are reachable in the graph, and prints the results. Use command-line arguments to specify, in order, the name of the file containing the graph, the source vertex, and the destination vertex.

$ ./reachable test_data/basic_8 1 6
You can get from 1 to 6!
$ ./reachable test_data/basic_8 4 2
You can NOT get from 4 to 2!

Use the file I/O function loadGraphFromFile to load the graph from the file into a new Graph object. Assuming you’ve already implemented reachableDFS, this program should not be much code.

Once you have compiled and tested your reachable application, you’ll be ready to move onto the next application!

Communication Networks.

You are in charge of product development at WirelessForAll.org, an NGO which provides affordable communication infrastructure in developing countries. You maintain a network of cheap devices which communicate wirelessly. Each wireless device can communicate only with other devices that are within 100m. However, messages can be relayed if there are device(s) in between. For example, if Alice and Carol are 200m apart, but Bob is 100m between each of them, a message from Alice to Carol can "hop" to Bob’s device and get sent on to Carol.

This communication network can be modeled as an undirected graph, where vertices correspond to wireless devices, and there is an edge between vertices if the devices are within 100m.

Unfortunately, each time a message is relayed, it gets garbled a bit! From experience, you know that once a message gets relayed four times, it becomes too garbled to be useful.

Write a program to help you understand when messages will get garbled. Your inputs should be an undirected, unweighted graph, a source vertex, and a destination vertex. Output (i) whether its possible to send any communication from source to destination, (ii) if so, what is the minimum number of steps it will take, and (iii) what the steps are along the shortest path.

You should not need to design new algorithms beyond those included in graphAlgorithms.h. Assuming you have already implemented and tested the first two graph algorithms (reachableDFS, shortestLengthPathBFS) you should not have to write extensive code for this problem.

Here is some sample output:

$ ./communication test_data/communication200 B01 D47
destination is 2 steps from the source vertex!
here is the path: [B01 O48 D47]
$ ./communication test_data/communication200 B82 A03
destination is 1 steps from the source vertex!
here is the path: [B82 A03]
$ ./communication test_data/communication2000 A01 J98
destination is 4 steps from the source vertex!
here is the path: [A01 A81 C72 X97 J98]
$ ./communication test_data/communication2000 E08 G72
destination is far from the source vertex.
here is the path: [E08 E28 H71 G13 G50 S42 A26 G72]

Modeling infectious diseases in a social network.

As part of a summer research project, you are collaborating with epidemiologists (infectious disease experts) at the CDC who want to model an infectious disease outbreak. The epidemiologists used sophisticated statistical modeling to predict how quickly one person would catch the disease from another. Specifically, the data set consists of a list of entries e.g. (Alice, Bob, 15) which indicates that if Alice was infected with the novel disease, then it would take Bob 15 days to become infected from Alice. However, the epidemiologists have a problem---Bob might get infected sooner if he catches the disease quickly from someone else who catches it from Alice! How can you find the shortest infection path from Alice to Bob?

This input can be modeled as a problem on a weighed directed graph, where vertices represent people, and a directed edge (u,v) with weight w represents that v can become infected directly from u in w days.

Write a program outbreak.cpp that takes a graph containing projected outbreak data, as well as a start person, and uses singleSourceShortestPath to compute and output the minimum time before each other person is infected.

See this example output.

Optional Extension: Social Distancing

One way to minimize the spread of a infectious disease is to practice social distancing. Extend your outbreak.cpp program to model the effects of social distancing. One way to do this is to take each vertex in the graph and remove 50% of its edges. How does that effect the infection time? What if you only removed 10% of the edges? What if you removed 90% of the edges?

You might want to use randomness to select which edges get removed. C++ has a library called random to handle random number generation. To include this library, add #include <random. See this page for information about the random library

Implementation Strategy

This lab involves writing a moderate amount of code. Much of it is modular, allowing you to work on and test small pieces of the assignment at a time. Completing this lab assignment will be much easier if you practice incremental development.

Below is a strategy which should organize your efforts; you are not required to proceed in this order, but it might help.

  1. Begin by reading the graph ADT (adts/graph.h) and understanding the DirectedGraph and Undirected Graph classes. Also read graphAlgorithms.h to make sure you understand what is expected of your graph algorithms.

  2. Run make and ./tests. Note that many unit tests will fail.

  3. Implement reachableDFS and run unit tests again. Expect to have many failed unit tests, but fewer than before.

  4. Implement and test shortestLengthPathBFS. At this point, you should only fail unit tests because you haven’t implemented singleSourceShortestPath.

  5. Implement and test the applications reachable.cpp and communication.cpp.

  6. Implement and test the singleSourceShortestPath algorithm. We’ll see pseudocode for this algorithm in lecture on Tuesday, so it’s productive to implement this later.

  7. Finally, implement and test outbreak.cpp.

Once you are finished, you should be able to compile and run the three graph applications. For your convenience, a few test graphs have been provided in the test_data directory.

Memory

For this lab, your program is required to run without memory errors or leaks. You should use valgrind as you proceed in your testing to track memory errors. When you have a complete first draft of your implementation:

  • Commit and push what you have (in case something goes wrong).

  • Run valgrind ./tests to make sure that the test program does not cause memory errors.

  • Once memory errors are cleared, run valgrind --leak-check=full ./tests to identify and correct memory leaks.

Note: while your program is required to run without memory leaks, this is not the central pedagogical goal of this assignment. As with previous labs, memory leaks will result in minor point deductions unless the leaks are truly egregious.

Coding Style Requirements

As usual, you will also be required to observe good coding practices:

  • Your C++ code must have proper and consistent indentations.

  • You must have proper and consistent usage of spacing and braces for if, else, for, and while conditions as well as class definitions and code blocks.

Summary of Requirements

When you are finished, you should have

  • A completed implementation of the three graph algorithms specified in graphAlgorithms.h.

  • A completed implementation of the three graph applications specified in reachable.cpp, communication.cpp, outbreak.cpp.

  • No memory errors

  • No memory leaks

  • Code which conforms to the style requirements

Sample code for outbreak program.

$ ./outbreak test_data/outbreak20 abagail
infection time from abagail to:
  --> abagail: 0
  --> ariona: 4
  --> bettyjane: 5
  --> chaye: 3
  --> correen: 5
  --> dakayden: 7
  --> hodosh: 5
  --> jhaniyah: 5
  --> krispin: 7
  --> kyris: 4
  --> marzana: 3
  --> nantambu: 5
  --> pyles: 4
  --> quirijn: 4
  --> rhyson: 5
  --> roandy: 7
  --> rumell: 6
  --> sretan: 6
  --> turmaine: 5
  --> yandier: 7
$ ./outbreak test_data/outbreak20 yandier
infection time from yandier to:
  --> abagail: 9
  --> ariona: 6
  --> bettyjane: 9
  --> chaye: 7
  --> correen: 7
  --> dakayden: 7
  --> hodosh: 7
  --> jhaniyah: 8
  --> krispin: 8
  --> kyris: 5
  --> marzana: 7
  --> nantambu: 9
  --> pyles: 6
  --> quirijn: 7
  --> rhyson: 9
  --> roandy: 6
  --> rumell: 7
  --> sretan: 7
  --> turmaine: 8
  --> yandier: 0