CS21 Lab 8: Sorting

Due Saturday, April 24, before midnight (US/Eastern local to Swarthmore)

Read through this entire lab before you begin coding!

Goals

  • Continue to gain proficiency at top-down design and bottom-up implementation.

  • Continue practicing using a list of lists to store data.

  • Understand how to adapt selection sort to real-world data.

  • Understand how to adapt selection sort to sort from highest to lowest instead of lowest to highest.

Video game dataset

For this assignment, we will continue to work with the video game data we used in Lab 7. The video game data is stored in CSV format and is located at: /data/cs21/videogames/video_games.csv

As a reminder, here are some details about the data set, beginning with the first five lines and last five lines from this file:

Title,MaxPlayers,Genre,ReviewScore,Sales,UsedPrice,Console,Rating,Year,PlayLength
AC/DC Live: Rock Band - Track Pack,4,Action / Simulation,63,0.19,14.95,X360,M,2008,3.933333333
AC/DC Live: Rock Band - Track Pack,1,Action / Simulation,60,0.19,14.95,PS3,E,2008,3.933333333
Ace Combat 6: Fires of Liberation,1,Action / Simulation,80,0.58,19.95,X360,E,2007,13.75
Ace Combat X: Skies of Deception,1,Action / Simulation,75,0.32,22.95,PSP,E,2006,4.116666667
...
Zendoku,1,Role-Playing (RPG) / Strategy,68,0.04,16.95,DS,E,2007,4
Zoids Assault,1,Role-Playing (RPG) / Strategy,46,0.07,17.95,X360,M,2007,12.05
Zoo Keeper,1,Action,74,0.1,14.95,DS,T,2004,5
Zoo Tycoon DS,1,Simulation / Strategy,44,0.92,12.95,DS,T,2005,-1
Zubo,1,Adventure / Role-Playing (RPG) / Strategy,75,0.05,14.95,DS,E,2008,15

Notice that the file is sorted by game title.

Each line contains 10 features. The following table provides more detailed information about each of these features.

Table 1. Features of Games Data

Feature

Type

Description

Notes

0

string

Game Title

1

integer

Max number of players

2

string

Genre

Listing multiple, seperated by slashes

3

integer

Review Score

4

float

Sales

Measured in millions of dollars

5

float

Price

Average "used" price (in dollars)

6

string

Console

7

string

Rating

8

integer

Release Year

9

float

Play Length

Average time in hours to play through the game (-1 if play through time is not available)

Using sorting to explore video game data

You will implement a program called sort-games.py, which will allow the user to search through this dataset and learn particular information. To start, copy your Lab 07 search-games.py file into a new sort-games.py that you will use for Lab 08:

cp ~/cs21/labs/07/search-games.py ~/cs21/labs/08/sort-games.py

Users may get a list of games filtered by Console and Genre and sorted by review score from highest to lowest. Users can also get a list of Genres in the data (sorted alphabetically) if they need help crafting a search. Genre searches should include partial matches (so the results for the query "Racing" should include "Alone in the Dark" because its listed Genre is "Action / Adventure / Racing / Driving")

Users may get a list of games ranked by sales figures by year:

Program Requirements

  • All numeric data in the file should be type cast to the correct type when stored in the program.

  • Compares data in lowercase format (so that user input is case insensitive) but displays results with their original case.

  • Repeatedly presents a menu to the user with six choices: to search by console/year, to search by game title, to sort best reviewed games, to sort best selling games, to list the possible genres, or to quit.

  • Ends when the user chooses to quit.

  • Ensures that the user provides a valid menu choice.

  • Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.

  • Uses linear search to find matches by console/year/genre.

  • Uses binary search to find matches by game title.

  • Searches are limited to returning a maximum of 25 results.

  • Uses selection sort to sort results (you must write this yourself and cannot use the built-in .sort() method)

  • Finds a list of genres present in the data and presents them to the user in alphabetical order (you cannot hardcode the genres)

  • You do not need to verify the user typed in a valid console or title. If the user types in a console or title that doesn’t exist in the data set, your program will report that no results were found.

Tips

Extracting the Genre Data

To implement menu item #5, you’ll need to store a list of possible genres somewhere. You can accomplish this by iterating through all the game entries, and dividing up the 'genre' field into its component parts using the <str>.split() function (notice that multiple genres are seperated by '/'), clearing away any whitespace with <str>.strip(), and accumulating the results. (Be sure not to add any repeat genres to your list!)

Sorting results

In order to display the games with the highest review score or the highest selling games, you should first use a search method to narrow down the amount of data you need to sort. You already have search functions written from Lab 7, so reuse your code if you can. For example, to display the highest selling games by year, first find all the games for that particular year (you already did that as part of Lab 7) and then sort the resulting list by the Sales value.

If two (or more) games have the same review score or sales numbers, it does not matter which order you display them. You don’t need to make sure the games remain in alphabetical order.

Sorting from largest to smallest

The selection sort algorithm that we covered in class finds the smallest item and puts it first. Then it finds the second smallest item and puts it second, etc. This gives you a list that is sorted from smallest to largest. If you want to build a list that is sorted from largest to smallest, you have two options:

  • You can change the selection sort algorithm so that it finds the largest element and puts it first, then finds the second largest element and puts it second, etc.

  • You can use your existing selection sort algorithm, then use the reverse() method on lists to reverse the order of the elements:

lst = [1, 2, 3]
lst.reverse()
print(lst)  # prints [3, 2, 1]

Extra Challenge

This is an optional extension and is not required.

Which console has games with the highest average review scores across the most categories? To answer this first find the list of genres and list of consoles. Then compute the mean review score within each genre for each of the five console. Next, use that to sort the five consoles (in rank order from 1-5) for each genre. Finally, sum up the number 1st place finishes for each console. Which one has the highest?

For another bonus, try to come up with a different way to rank consoles. Describe and implement your alternative system and see if it leads to a different result than the above.

Answer the Questionnaire

Please edit the Questions-08.txt file in your cs21/labs/08 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​.

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.