CS21 Lab 7: Searching

Due Saturday, April 17, before midnight (US/Eastern local to Swarthmore)

Read through this entire lab before you begin coding!

Goals

  • Continue to gain proficiency at top-down design and bottom-up implementation.

  • Practice reading data from a file.

  • Practice using a list of lists to store data.

  • Understand how to adapt linear search and binary search to real-world data.

Video game dataset

For this this assignment, we will be working with a real-world dataset from CORGIS (which is an acronym for: The Collection of Really Great, Interesting, Situated Datasets). We will be using a dataset that contains records on over a thousand video games released between 2004 and 2008. This includes a variety of information on consoles, formats, sales, review scores, and average playtime. While this dataset contains many features, but we will be focusing on only a small subset of these features.

The video game data is stored in CSV format and is located at: /data/cs21/videogames/video_games.csv

Here are the first five lines and last five lines from this file:

Title,MaxPlayers,Genre,ReviewScore,Sales,UsedPrice,Console,Rating,Year,PlayLength
AC/DC Live: Rock Band - Track Pack,4,Action / Simulation,63,0.19,14.95,X360,M,2008,3.933333333
AC/DC Live: Rock Band - Track Pack,1,Action / Simulation,60,0.19,14.95,PS3,E,2008,3.933333333
Ace Combat 6: Fires of Liberation,1,Action / Simulation,80,0.58,19.95,X360,E,2007,13.75
Ace Combat X: Skies of Deception,1,Action / Simulation,75,0.32,22.95,PSP,E,2006,4.116666667
...
Zendoku,1,Role-Playing (RPG) / Strategy,68,0.04,16.95,DS,E,2007,4
Zoids Assault,1,Role-Playing (RPG) / Strategy,46,0.07,17.95,X360,M,2007,12.05
Zoo Keeper,1,Action,74,0.1,14.95,DS,T,2004,5
Zoo Tycoon DS,1,Simulation / Strategy,44,0.92,12.95,DS,T,2005,-1
Zubo,1,Adventure / Role-Playing (RPG) / Strategy,75,0.05,14.95,DS,E,2008,15

Notice that the file is sorted by game title.

Each line contains 10 features. The following table provides more detailed information about each of these features.

Table 1. Features of Games Data

Feature

Type

Description

Notes

0

string

Game Title

1

integer

Max number of players

2

string

Genre

Listing multiple, seperated by slashes

3

integer

Review Score

4

float

Sales

Measured in millions of dollars

5

float

Price

Average "used" price (in dollars)

6

string

Console

7

string

Rating

8

integer

Release Year

9

float

Play Length

Average time in hours to play through the game (-1 if play through time is not available)

In this lab we will focus on features 0, 6, and 8 (plus 3, 4, and 9 if you do the optional challenge). In the next lab, which adds sorting, we will incorporate more functionality for the other features.

Using search to explore video game data

You will implement a program called search-games.py, which will allow the user to search through this dataset and learn particular information.

Users may search by console and release year:

Or users may search by game title:

Here are some examples of handling valid input:

Program Requirements

  • All numeric data in the file should be type cast to the correct type when stored in the program.

  • Compares data in lowercase format (so that user input is case insensitive) but displays results with their original case.

  • Repeatedly presents a menu to the user with three choices: to search by game title, to search by console/year, or to quit.

  • Ends when the user chooses to quit.

  • Ensures that the user provides a valid menu choice and a valid year.

  • Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.

  • Uses linear search to find matches by console/year.

  • Uses binary search to find matches by game title.

Tips

1. Continue to use TDD

We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions. However, you may not work with a partner on this lab. Also you don’t need to get approval from an instuctor for your design, though we’d be glad to review it with you in lab, office hours, or ninja sessions. You should only begin implementing the other functions after you have a clear plan in place for the structure of main.

2. Create modular, reusable functions

Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of searchs (by year/console and by title), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of search results and displays them. Then you could call this function for either type of search.

3. Displaying matches

We want the results of our searches to be nicely displayed in the terminal window. However, some game titles names are quite long. We can use slicing to truncate these strings to a fixed maximum length, allowing us to create an easy to read table format. This section of the online textbook explains how to slice a string. For instance, the following example shows how to slice out the first 10 characters from a string:

>>> alphaString = "abcdefghijklmnopqrstuvwxyz"
>>> alphaString[0:10]
'abcdefghij'

Notice how this nicely works, even if we provide an upper bound that is longer than the string we’re trying to slice (unlike when dealing with lists):

>>> shortString = "xyz"
>>> shortString[0:10]
'xyz'

The version of linear search that we discussed in class stops as soon as it finds the item it is looking for, and returns the index of that item or -1 if not found. For this problem, your linear search will need to accumulate a list of matching results and return that.

In addition, the basic version of linear search assumes that you are searching a flat list. We will be searching a list of lists, so will need to adapt linear search to use double indexing to get at the data within each sublist.

Because the data is sorted by game title, we can use binary search to very quickly locate a particular game. However, game titles can be somewhat long or may include a subtitle. Additionally, the data may contain multiple copies of the same game, if it was released for different consoles under the same name.

For example, there were apparently 16 "LEGO" games released on 5 different consoles.

...
Legendary                                 <- this has a different prefix
LEGO Batman: The Videogame
LEGO Batman: The Videogame
LEGO Batman: The Videogame
LEGO Batman: The Videogame
LEGO Batman: The Videogame
LEGO Indiana Jones: The Orig.
LEGO Indiana Jones: The Orig.
LEGO Indiana Jones: The Orig.
LEGO Indiana Jones: The Orig.
LEGO Star Wars II: The Orig.
LEGO Star Wars II: The Orig.
LEGO Star Wars II: The Orig.
LEGO Star Wars: The Complete Saga         <- suppose binary search finds this
LEGO Star Wars: The Complete Saga
LEGO Star Wars: The Complete Saga
LEGO Star Wars: The Complete Saga
Lemmings                                  <- this has a different prefix
...
Finding the first match

First write a binary search function that searches for a partial match to a game title and returns the index of the first match that it finds (or -1 otherwise). Make sure your binary search function uses startswith (see here) rather than == so it doesn’t miss any partial matches (e.g. using "LEGO" to match on "LEGO Star Wars: The Complete Saga").

However, there might be more matches that are just before or just after that initial index you locate. So next you’ll need to search in the vicinity of that location to find all of the other potential matches.

Finding nearby matches based on an initial seed

We have conveniently provided you with a function to help with finding nearby matches based on an initial seed. This is called lookForwardBackward() and you can import it from the cs21s21 library. This function takes four parameters and returns a list of matching items that it found looking either forward or backwards. Those parameters are:

  • A list with the data to search over

  • An integer where the search should start from (this is the index that your "binary search" function will return)

  • The partial match (string) that the function should be searching for (e.g. "LEGO" in the above example)

  • A search direction. This is an integer which should have the value 1 (to search forward) or -1 (to search backwards)

You will need to import the function at the top of the program (from cs21s21 import lookForwardBackward) and call it twice for each binary search (once to get the forward matches and again to get the backward matches).

A call to the lookForwardBackward() function might look something like this:

backwardMatches = lookForwardBackward(currData, currIndex, title, -1)

Extra Challenge

Here are TWO optional extensions! Neither is required, but either or both might be fun and rewarding.

Optional: Finding the top games

After completing a search by year/console, inform the user of which is the "best" game according to each of three measures: highest reviews, most sales, and longest average playtime. Be sure to ignore any games for which those fields are missing (recorded as -1).

Here is an example run showing how this extra challenge should work:

Rather than importing the provided lookForwardBackward function, try writing your own. This should work the same way as the provided version: it will look linearly forwards or linearly backwards based on an initial seed index that it takes as a parameter. The function will also need to take the search string as an input parameter (as well as potentially other information depending on your design). This function should return a list of all the partial matches based on that initial seed index.

The program can then add that list together with the entry at the initial seed to compute the total list of partial matches which are then displayed to the user.

Answer the Questionnaire

Please edit the Questions-07.txt file in your cs21/labs/07 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​.

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.