CS67: Text and Multimedia Information Retrieval

Announcements | Schedule | Grading | Integrity | Links
 

Announcements

Introduction

Search engines, such as Google, YouTube and Apple iTunes, have had a huge impact on how people find and use information. In this course, we will explore how these text and multimedia information retrieval (IR) system are designed and implemented.

The first half of the class will be devoted to developing traditional IR skills such as web-crawling, text & multimedia processing, boolean & vector-space modeling, classification, clustering, and recommendation.

The second half of the course will be devoted to creating SWAMOODIE: a Swarthmore Music Discovery Engine. This will be a collaborative class project in which groups of student will design and develop individual component of this large-scale music IR system. In the final weeks we will combine these components and (if all goes well) have a powerful new tool that helps people find music.

Class information

Professors: Douglas Turnbull / Richard Wicentowski
Office: Science Center 255 / 251
Phone: (610) 597-6071 / (610) 690-5643
Office hours: TBA or by appointment

Room: Science Center 240
Time: Tuesday & Thursday 9:55pm–11:10pm
Text: Manning, Raghavan, & Schutze. Introduction to Information Retrieval (2008).
Wiki: mugwort.cs.swarthmore.edu/67wiki (link no longer available)

Schedule

WEEK DAY ANNOUNCEMENTS TOPIC & READING LAB
1 Jan 20   Introduction to IR
Chapter 1 (Both)
Lab 1
(1/29)
Jan 22   Web-Crawling and Basic SQL
Chapter 20 (Rich)
2 Jan 27   Advanced SQL and Database Design
(Doug)
Lab 2
(2/4)
Jan 29 Drop/Add
Ends (Jan 30)
Basic IR Models (boolean, vector-space, TF-IDF)
Chapter 1,6 (Rich)
3 Feb 03   Lab 3
(2/17)
Feb 05   Performance Evaluation
Chapter 9 (Doug)
4 Feb 10  
Feb 12   Document Classification
Chapter 13,14,15 (Doug/Rich)
Lab 4
(2/24)
5 Feb 17  
Feb 19   Lab 5
(3/19)
6 Feb 24   Document Clustering
Chapter 16,17 (Doug/Rich)
Feb 26  
7 Mar 03  

Mar 05

In-Class Exam (Rescheduled for Monday March 23, 7pm)

 

Mar 10

Spring Break

Mar 12

8 Mar 17   Audio Signal Processing Tutorial
(Doug T)
 
Mar 19   Music Classification Lab
(Doug T)
 
9 Mar 24   SWAMOODIE Planning Day
Five Approaches to Collecting Tags for Music
Turnbull, Barrington, Lanckriet (2008)
(Doug T)
 
Mar 26   Recommender Systems
Music Recommendation and Discovery in the Long Tail Celma (2009)
Chapter 2 ONLY (Joon)
 
10 Mar 31   SWAMOODIE Architecture Session  
Apr 02   Search Engine Architecture
The anatomy of a large-scale hypertextual Web search engine Brin, Page (1998)
Focus on Section 4 (Doug W.)
 
11 Apr 07   Hubs and Authority
Authoritative Sources in a Hyperlinked Environment Kleinberg (1999)
(Nick)
 
Apr 09   Page Rank
The pagerank citation ranking: Bringing order to the web Page, Brin, Motwani, Winograd (1998)
(Malcolm)
 
12 Apr 14   Autotagging
Autotagger: A model for predicting social tags from acoustic features on large music databases. Bertin-Mahieux, Eck, Maillet, Lamere (2009)
(Derek)
 
Apr 16   HCI and Visualization
MusicSun: A new approach to artist recommendation Pampalk, Goto (2007)
Skim paper, also check out Pandora, Last.fm, Musicovery, Echotron, & other music discovery websites (Ashley)
 
13 Apr 21   Other Topics:
Combining Data Sources (Brian)
Text-based Multimedia IR (Meggie)
Social Tags and IR (Jeff)
 
Apr 23    
14 Apr 28   Swamoodie Final Presentations  
Apr 30    

Grading

Your overall grade in the course will be determined as follows:
35%Lab Assignments
20%Final Exam
35%Swamoodie Project
10%Class Participation

Academic Integrity

Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were.

``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2007-2008, Section 7.1.2)

Please see me if there are any questions about what is permissible.

Links that are related to the course may be posted here. If you have suggestions for links, let me know.

Course Wiki

Python links

Related Courses