CS97: Computer Perception

Announcements | Schedule| Project | Grading | Integrity | Links



This course focuses on computer perception: using computers to analyze images, sounds, and videos. We will specifically focus on object recognition and multimedia retrieval, but will also look at segmentation, localization, clustering, tracking and other perception tasks.

The first third of this class will be a lecture style format that introduces some fundamental topics and tools. The remaining two thirds will be a seminar style format in which students will present academic papers and conduct research.

Class information

Professor: Douglas Turnbull
Office: Science Center 255
Phone: (610) 597-6071
Office hours: TBA or by appointment

Room: Science Center Conference Room
Time: Tuesday, Thursday 9:55pm–11:10pm
Text: None, but lots of suggested references and weekly readings...


1 Sep 01   Motivation & Organization
Duda, Hart, & Stork, Ch 1 (Handout)
Brainstorm Project Ideas, Find a Partner
Sep 03   Math Foundations (Linear Algebra, Probability, Statistics, Machine Learning)
Duda, Hart, & Stork, Appendices A1-A4 (Handout)
Vasconcelos' Review Notes on Linear Algebra & Probability
2 Sep 08  
Sep 10  
3 Sep 15   Audio Processing Exploration Lab 1, Problems 1-3
(Soft Deadline)
Sep 17   Color Indexing
by Swain & Ballard (1991)
(Image - Max)
4 Sep 22   Mel Frequency Cepstral Coefficients for Music Modeling
by Logan (2000)
(Music - Geoff - Slides)
Lab 1 Due
Sep 24   Distinctive Image Features from Scale-Invariant Keypoints
Lowe (2004/1999)
(Image - Ryan)
Hand in 3 Paper Summaries
5 Sep 29   Shape matching and object recognition using shape contexts
Belongie, Malik, Puzicha (2002)
(Image - Cyrus)
Proposal Due
Oct 01   Musical Genre Classification of Audio Signals
by Tzanetakis and Cook (2002)
(Music - Derek)
6 Oct 06   A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures
by Berenzweig, Logan, Ellis, Whitman (2004)
(Music - Maria)
Oct 08   Video google: A text retrieval approach to object matching in videos
by Sivic, Zisserman (2003)
(Video - Joel)
Proposal Update Due,
Hand in 4 Paper Summaries

Oct 13

October Holiday

Oct 15

7 Oct 20   A robust mid-level representation for harmonic content in music signals
by Bello, Pickens (2005)
(Music - Dan)
Oct 22   Speaker Verification Using Adapted Gaussian Mixture Models
by Reynolds, Quatieri, Dunn (2000)
(Speech - Colin)
8 Oct 27   Robust real-time object detection
by Viola, Jones (2002/2001)
(Image - Janis)
Oct 29   Class Rescheduled - Please attend Sorelle Friedler's Talk (Oct 23) or Barath Raghvan's Talk (Nov 6).  
9 Nov 03   Large-scale multimodal semantic concept detection for consumer video
by Chang, Ellis, Jiang, Lee, Yanagawa, Loui, Luo (2007)
(Video - Jake)
Nov 05   Early Integration of Vision and Manipulation
by Metta, Fitzpatrick (2002)
(Human Perception, skim first 6 pages, read the rest - Rachel)
Handin 5 Paper Summaries
10 Nov 10   Literature Reviews Begin
Music Autotagging (Derek)
Easy As CBA: A Simple Probabilistic Model for Tagging Music, Hoffman, Blei, Cook (2009)
Nov 12   Image Matching in Videos (Jake & Colin)
Learning a Sparse Representation for Object Detection, Agarwal, Roth (2002)
Manuscripts Due
11 Nov 17   Features for Image Similarity (Janis)
Features for Image Retrieval: An Experimental Comparison, Deselaers, Keysers, Ney (2008)

Peer Review of Manuscripts Due
Nov 19   Image Description (Rachel & Maria)
Improving Image Retrieval Performance by Using Both Color and Texture Features, Zhang (2004)
12 Nov 24   Multimodal Emotion Classification (Joel & Ryan)
Facial Emotion Recognition Using Multi-modal Information De Silva, Miyasato, Nakatsu (1997)

Video Classification (Max & Dan)
Recent trends in video analysis : a taxonomy of video classification, Roach, Mason, Xu, Stentiford (2002)

(Optional) 2nd Round of Manuscript Reviews

Nov 26


13 Dec 01   Finding Inorganic Objects in X-Rays (Cyrus & Geoff)
Supervised Learning of Semantic Classes for Image Annotation and Retrieval Carneiro, Chan, Moreno, Vasconcelos (2007)

2nd Round Reviews Due
Dec 03   Swarthmore Computer Perception Conference
Each group will have 15 minutes to present followed by 3 minutes of Q&A

Session 1: Image Analysis
Rachel/Maria, Janis, Geoff/Cyrus
14 Dec 08   Session 2: Video and Audio Analysis
Colin/Jake, Max/Dan, Joel/Ryan, Derek
Final Paper Due on Thursday 12/10 at 5pm - Please email me a pdf copy.


This course is structured like a graduate seminar course where each student will be graded based on both their contribution to the seminar and their research project.
Course Work 40%
Lab 1 10%
Assigned Paper Presentation 10%
Weekly Notes 20%
Project 60%
Proposal 5%
Proposal Update 5%
Literature Review Presentation 10%
Manuscript 10%
Manuscript Reviews 5%
Conference Presentation 10%
Final Paper 15%
You will automatically get an A if you get your research paper accepted to a top-tier, peer-reviewed academic conference.

Weekly Notes

For every academic paper that we read for class, you should prepare a 1-page summary. The format should be as follows:

Academic Integrity

Academic honesty is required in all work you submit to be graded. With the exception of your lab partner on lab assignments, you may not submit work done with (or by) someone else, or examine or use work done by others to complete your own work. You may discuss assignment specifications and requirements with others in the class to be sure you understand the problem. In addition, you are allowed to work with others to help learn the course material. However, with the exception of your lab partner, you may not work with others on your assignments in any capacity.

All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates which parts of the assignment you received help on, and what your sources were.

``It is the opinion of the faculty that for an intentional first offense, failure in the course is normally appropriate. Suspension for a semester or deprivation of the degree in that year may also be appropriate when warranted by the seriousness of the offense.'' - Swarthmore College Bulletin (2007-2008, Section 7.1.2)

Please see me if there are any questions about what is permissible.

Links that are related to the course may be posted here. If you have suggestions for links, let me know.

Machine Learning and Pattern Recognition

Image and Audio Processing

Matlab - General Info

Matlab - Computer Perception

Other Software (Weka, Matlab, etc.)