A CS2 1/2 Semester Course Project

Building a Web Browser and Search Engine in Java:
a 1/2 Semester Project for a CS2-type Course


This page describes a project that we use in the second half of our Data Structures course (CS35). The project is a web browser with a search engine implemented in Java. We had several pedagogical goals in designing this project and we found that this project easily met these goals. These goals include:
  1. reinforcing the data structures and algorithms presented in lecture.
  2. providing a large, real-world application that students found challenging and fun to implement.
  3. demonstrating the importance of complexity analysis in determining performance efficiency when considering design alternatives.
  4. illustrating the benefits of software engineering techniques.
  5. providing an opportunity for students to work in teams.

We discuss our experiences using this project in the following paper from the Proceedings of the 33rd ACM Technical Symposium on Computer Science Education: Paper(PDF), Talk Slides (gziped postscript)


The project consists of four separate assignments that incrementally build a web browser with a search engine. Each assignment adds to or improves on functionality implemented by the previous part by applying new algorithms and using data structures that are presented in class. The final part, which completes the implementation, uses all of the advanced data structures presented during the semester (binary trees, priority queues, hash tables, and graphs).


The individual parts include:
  1. Part 1: Parsing a webpage and building a word frequency tree of its contents (uses binary search trees)

  2. Part 2: The first implementation of the search engine: Processing User Queries to Find the Most Relevant web Pages (uses priority queues and binary search trees)

  3. Part 3: Adding caching to the search engine and implementing the GUI front-end to the web browser and search engine (uses hash tables, priority queues, and binary search trees)

  4. Part 4: Adding a hyper-link graph to the web browser that displays Shortest Path and reachability information between webpage nodes. And incorporating "link-to" information into the ordering of search results presented by the search engine. (uses graphs, hash tables, priority queues, and binary search trees)

    Extra Credit parts are included with this assignment:

The final version of the web browser has the following capabilities: