CS21 Lab 9: Sorting, YouTube

Due Saturday night, April 13


Make sure all programs are saved to your cs21/labs/09 directory. Files outside this directory will not be graded.

$ update21
$ cd ~/cs21/labs/09

Topics for this assignment

Overview

This is a two part assignment:

  1. Sort Visualization - In visualize_sort.py, you will write a sorting algorithm with the goal of visualizing how the algorithm sorts a list of numbers.

  2. Analyzing YouTube Data - In youtube.py, you will write a program that analyzes data about YouTube channels and prints out channels that have the most subscribers, uploads, and total video views.

Note: Feel free to use whatever sorting algorithm you want for each part of the assignment; however, you must use both SelectionSort and BubbleSort at least once each.


1. Visualizing a Sorting Algorithm

Part 1 of this assignment is more open-ended than our typical lab programs. Your task is to write a program, visual_sort.py, that displays a visualization of how one particular sorting algorithm works. This visualization can work by printing strings to the terminal, or — for the extra challenge — by creating an animation using Zelle graphics. Your program should allow its users to see how a particular sorting algorithm changes a list from unsorted to sorted, one swap at a time.

Requirements

Here are three examples of visualizing selection sort:

We would like to see what you can come up with. Note that the graphics example is not the expectation - this is more of a challenge/extension if you are interested. The goal is to help your users understand the inner workings of your chosen algorithm, however you see fit. Along the way, hopefully you will also deepen your own understanding.

TIP: it is a good idea to pause your program after each major step. In the graphics example above, you can use getMouse() or getKey() to make the program wait until the user is ready to proceed. For the command-line versions, you can use input() e.g.,

#output some result

input("Hit Enter to Continue...")

#resume execution of program

It doesn't matter what the user inputs, so we did not bother to save the value in a variable.


2. Analyzing YouTube data

In the second part of this lab, you will write a program to analyze YouTube Data. YouTube is a video-sharing website where people can upload videos they created and/or view others' videos. Over 400 hours of content are uploaded to YouTube each minute, and each day one billion hours of content are watched. Video producers make make money by having YouTube run advertisements with the videos they produce on their channel. The more subscribers a channel has and the more times consumers view a channel's videos, the more money the videos will earn.

Your task is to write a program to read in a file of data about YouTube channels and print out the top channels in terms of video uploads, total subscribers, and total views. Here is a snippet of a data file containing information about YouTube Channels:

Channel name,Video Uploads,Subscribers,Video views
10-Minute Crafts,180,346691,114871583
10-Minutes Amazing Life,193,170335,63251474
1000virtudes,59,770249,170131284
18th Asian Games 2018,306,598154,104006040
1Kilo Oficial,94,4795224,887547430
1MILLION Dance Studio,1202,11058659,2616585255
1theK (원더케이),12942,12918410,10657097331
20sarasa(にーさら),475,1525288,1225995975
20th Century Fox,1847,3113635,1773496880
...

Note: the first line is a "header line" and not actual information about a YouTube channel.

File Format:

Each line of the YouTube channel file contains information about a different YouTube channel. For each channel, there are four pieces of information as follows:

Channel Object:

The ytchannel library contains a class for managing the YouTube channel data:

from ytchannel import *

This library contains a single Channel class which encapsulates the information we know about a YouTube channel. You can create a new Channel object by providing a channel name, number of video uploads, number of subscribers, and number of video views. It is expected that the channel name is a string and the number of uploads, number of subscribers, and number of views are all integers. The following Channel methods allow you to access the information of a single Channel object:

As an example, here is a snippet of code that creates a Channel object and calls its methods:

>>> ch = Channel("Gritty Fans", 4000, 89000, 5235262346436)
>>> print(ch.getName())
Gritty Fans
>>> print(ch.getSubscribers())
89000
>>> print(ch.getUploads())
4000
>>> print(ch.getViews())
5235262346436
>>>

Requirements:

Some sample output lies below.

$ python3 youtube.py
Welcome to my YouTube Channel program!
Enter the name of the YouTube data file: /usr/local/doc/youtube/ytDataMed.csv
How many top channels do you want to see? 6

Top channels by number of uploads:
channel                   uploads   
AP Archive                422326    
Various Artists - Topic   207072    
Various Artists - Topic   203934    
AlHayah TV Network        129941    
Ennahar tv                121387    
ABP NEWS                  109223    

Top channels by number of subscribers:
channel                   subscribers
Canal KondZilla           39409726  
EminemMusic               30470865  
JuegaGerman               28889480  
EminemVEVO                26650488  
VEGETTA777                23775389  
VanossGaming              23590547  

Top channels by number of views:
channel                   views     
Canal KondZilla           19291034467
ABS-CBN Entertainment     17202609850
EminemVEVO                11317532576
Maroon5VEVO               10355362290
Markiplier                10053970560
VanossGaming              9880562011

Hints and Tips

3. Submit

Once you are satisfied with your programs, fill out the questionnaire in QUESTIONS-09.txt. Then run handin21 a final time to make sure we have access to the most recent versions of your file.

Acknowledgments

The YouTube dataset comes from a publicly available data set Top 5000 YouTube Channels Data hosted on the data science website Kaggle.