Search engines, such as Google, YouTube and Apple iTunes, have had a huge impact on how people find and use information. In this course, we will explore how these text and multimedia information retrieval (IR) system are designed and implemented.
The first half of the class will be devoted to developing traditional IR skills such as web-crawling, text & multimedia processing, boolean & vector-space modeling, classification, clustering, and recommendation.
The second half of the course will be devoted to creating SWAMOODIE: a Swarthmore Music Discovery Engine. This will be a collaborative class project in which groups of student will design and develop individual component of this large-scale music IR system. In the final weeks we will combine these components and (if all goes well) have a powerful new tool that helps people find music.
Room: Science Center 240
Time: Tuesday & Thursday 9:55pm–11:10pm
Text: Manning, Raghavan, & Schutze. Introduction to Information Retrieval (2008).
Wiki: mugwort.cs.swarthmore.edu/67wiki (link no longer available)
|WEEK||DAY||ANNOUNCEMENTS||TOPIC & READING||LAB|
|1||Jan 20||Introduction to IR
Chapter 1 (Both)
|Jan 22||Web-Crawling and Basic SQL
Chapter 20 (Rich)
|2||Jan 27||Advanced SQL and Database Design
Ends (Jan 30)
|Basic IR Models (boolean, vector-space, TF-IDF)
Chapter 1,6 (Rich)
|3||Feb 03||Lab 3
|Feb 05||Performance Evaluation
Chapter 9 (Doug)
|Feb 12||Document Classification
Chapter 13,14,15 (Doug/Rich)
|Feb 19||Lab 5
|6||Feb 24||Document Clustering
Chapter 16,17 (Doug/Rich)
In-Class Exam (Rescheduled for Monday March 23, 7pm)
|8||Mar 17|| Audio Signal Processing Tutorial
|Mar 19|| Music Classification Lab
|9||Mar 24||SWAMOODIE Planning Day
Five Approaches to Collecting Tags for Music
Turnbull, Barrington, Lanckriet (2008)
|Mar 26||Recommender Systems
Music Recommendation and Discovery in the Long Tail Celma (2009)
Chapter 2 ONLY (Joon)
|10||Mar 31||SWAMOODIE Architecture Session|
|Apr 02||Search Engine Architecture
The anatomy of a large-scale hypertextual Web search engine Brin, Page (1998)
Focus on Section 4 (Doug W.)
|11||Apr 07||Hubs and Authority
Authoritative Sources in a Hyperlinked Environment Kleinberg (1999)
|Apr 09||Page Rank
The pagerank citation ranking: Bringing order to the web Page, Brin, Motwani, Winograd (1998)
Autotagger: A model for predicting social tags from acoustic features on large music databases. Bertin-Mahieux, Eck, Maillet, Lamere (2009)
|Apr 16||HCI and Visualization
MusicSun: A new approach to artist recommendation Pampalk, Goto (2007)
Skim paper, also check out Pandora, Last.fm, Musicovery, Echotron, & other music discovery websites (Ashley)
|13||Apr 21||Other Topics:
Combining Data Sources (Brian)
Text-based Multimedia IR (Meggie)
Social Tags and IR (Jeff)
|14||Apr 28||Swamoodie Final Presentations|
Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.
All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were.
``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2007-2008, Section 7.1.2)
Please see me if there are any questions about what is permissible.