CS67 Multimedia Information Retrieval Demo Douglas Turnbull & Rich Wicentowski Fall 2009 -------------------------------------------- LAST TIME 1) We have a "time series" of 13-dimensional vectors each of which are a compact representation of a short-term spectral shape. 2) Each spectrum encodes information about the "timbre" of music. -> Timbre is the color of a sound -> Signature of an instrustment or human voice -> also encodes information about harmonic and inharmonic nature of sound -> broadband "noise" of a snare drum 3) Often, we ignore the temporal component of the time series so that we can think of the data as a "bag of feature vectors" -------------------------------------------------------------------- MUSIC CLASSIFICATION (by Genre) We will look at two ways to classify music by genre using MFCCs. 1) "Audio Term" Histograms 2) "Gaussian" Similarity -> both use a K nearest Neighbor Classifier -> using pre-computed MFCCs from 10 seconds of audio per song -> stored in ir80_mfcc.mat -------------------------------------------------------------- 1) "Audio Term" Histogram (audioTermKNN.m) a) Use kmeans to cluster "out of sample" MFCCs to find 64 "centers" -> think of each center as "audio term" (e.g., acoustic word) b) For each song in dataset, construct a tf vector over "audio terms" -> map each MFCC vector to closest center -> provides us c) Transform term-document matrix using SMART variants (i.e., tf-idf) d) Use a nearest neighbor classifier to classify music by genre e) Compute accuracy and confusion Matrices to evalate performance. 2) "Gaussian Similarity" (gaussSimKNN.m) -> probabilistic approach -> see Mandel & Ellis MIREX 05 a) For each song, model the distribution of MFCC using Gaussian Distribution. -> use mean() and cov() in matlab to get mean vector and covariance matrix b) Calculate the symmetric KL divergence between each pair of songs c) Switch divergence matrix to similarity matrix d) Use neartest neighbor classifier to classifiy music by genre e) Computer accuracy and confusion matries to evaluate performance. --------------------------------------------------------------------- Questions: 1) What is best overall recorded performance for each of the two approaches. 2) Which approach is faster to compute? 3) What improvement could you make to either approach?