CS33 -- Laboratory 11

Due: Tuesday, Nov. 23 by 11:59 pm

You are encouraged to work with a partner.

Work in a subdirectory called initslab11, where inits are your initials. Once you have completed the C program and made script files to demonstrate your testing, remove any extraneous files so that your directory contains just the .c file(s) and the script files. Then make a tarball of that directory and send it to me as an attachment to the email whose subject line should be CS33 Lab11. Send the email to cfk@cs.swarthmore.edu.

Each program must follow these following guidelines:


LAB 11: Reading ID3v2.2 Tags

WARNING!!! This program will probably take you longer than the average program in the class. There is lots of text to read below, and the program has lots of little details that you can mess up. START EARLY AND WORK WITH A PARTNER!


This lab was designed by Rich Wicentowski. There is only one program, but the program is complex. I once again encourage you to work with a partner.

The goal of this assignment is to read in an MP3 file and display the contents of its ID3 tag. For those of you who are unfamiliar, MP3 files are files that contain compressed audio and may optionally contain information specifying the artist, song title, album name, etc. This additional non-audio data comprises the ID3 tag.

In 1996, the first specification for ID3 tags, ID3v1, was created. While this specification was simple, there were complaints about the format. One problem with the specification was the small amount of space (only 128 bytes) that was made available for storing non-audio data. Artist names, song titles, and album names all had to be 30 characters or less. This meant that you couldn't store "Bruce Springsteen and the E-Street Band" as the artist for their "Live/1975-85" album. Since the maximum length of the artist name was 30 characters, you'd only be able to store "Bruce Springsteen and the E-St".

In 1998, the ID3v2 specification was created. This allowed much more non-audio data (256 megabytes instead of 128 bytes) to be stored, with no limits on the number of characters that could be stored for any of the data. One new piece of non-audio data that can be stored is an image (of the album cover, for example). Along with information such as artist and album name, the program you write will be able to extract this image.

The most popular version of the ID3 specification is ID3v2.3. However, if you import a CD in iTunes, iTunes uses the older ID3v2.2 specification. iTunes can read the new specification, and can even convert your MP3 to use any one of different specifications; but, if you don't do anything to your MP3's after you've imported them, you've got ID3v2.2 tags. (At least, this was true for me in 2009 on both Windows and Mac, and I'm fairly certain I haven't messed with any settings.)

Since specifications have important differences between them, and since it doesn't make sense for you to write a data extractor for each of the available ID3 versions, you're going to write the extractor for the ID3v2.2 specification. This means that if you use iTunes to import CDs, you've already got a bunch of MP3s which you can try this on. If you don't use iTunes (or you don't use the MP3 format), you're not out of luck: I have provided you with ten MP3 files to play with. Six of them are copyrighted by major record labels, and to avoid their wrath, I have snipped them down to 30 seconds each. The seventh song ("Happy Together") is performed by a friend of Rich's band, and they allow you to download their songs from their MySpace page, so you've got the song in its entirety.

You can find the sample music files in: /home/cfk/pub/cs33/music/ If you have not already done so, copy them to /scratch/username/cs33/ where username is your username on the CS machines. The files are located there to avoid issues with your quota, and they will be deleted shortly after the assignment is due.

You can work out the details of how to do the extraction by reading the ID3v2.2 reference. But, you'd probably do much better reading my distilled version below.


Your program will be called id3.c. To run your program, you will type ./id3 <filename.mp3> on the command line in your lab11 directory, substituting <filename.mp3> with the path to the MP3 file. When you run the program, you will print out the information in the ID3 tags in your lab11 directory, skipping over some fields which you won't worry about, and write the image file to the disk in your /scratch/username/cs33/ subdirectory for later viewing. I'll show you how to name executable files soon.

You will need to store lots of variable-sized strings and byte-arrays in this assignment. All strings and arrays should be dynamically allocated. There are almost no cases in the assignment where using a statically allocated array makes sense, and even if you think it does, use a dynamically allocated array anyway (for practice). You won't have to allocate any multi-dimensional arrays.

C reference

You should probably not really read this part now. Instead, skim the titles of the bullet points and come back to them when you're writing your program and need to know how to do one of these things.

Here are the new C things you'll need to know for the lab:

  1. Command-line arguments: So far, whenever we've written the main function, we've specified that it took no parameters. However, main can be written to take two parameters, as follows:
    int main(int argc, char **argv) {
      ...
      return 0;
    }
    
    There is always at least one command-line argument since the name of the program you ran is stored in argv[0]. So, if you typed ./id3 Happy_Together.mp3, argc would equal 2, argv[0] would equal ./id3, and argv[1] would equal Happy_Together.mp3

    Your program will exit with an informative error message if the user runs the program without specifying an MP3 file as an argument.

  2. Printing error messages and exiting gracefully: In the previous sentence, I mentioned that you should display an informative error message if the user does not specify an MP3 file as a command-line argument. Throughout the program, you'll do similar things. For example, if the MP3 file contains ID3v2.3 tags, you'll tell the user that your program can only handle ID3v2.2 tags, then exit the program.
  3. Reading binary data from a file: To read the ID3 information from the MP3 file, you'll need to use four new functions, and two new data types.
  4. Writing binary data to a file: To write binary data to a file, you'll once again employ file pointers and unsigned chars. You'll also need fopen and fclose, and a new function, fputc. The following code opens two files, one for reading and one for writing. It reads the ten bytes from one file, and writes them to the second file. It carelessly does no error checking.
      unsigned char uc, result;
      FILE *inptr, *outptr;
      int i;
    
      inptr = fopen("first", "rb");
      outptr = fopen("second", "wb");  /* "w" == "write" */
    
      for (i = 0; i < 10; i++) {
        uc = fgetc(inptr);
        result = fputc(uc, outptr);  /* if result == EOF, there was an error */
      }
    
      fclose(inptr);
      fclose(outptr);
    

ID3 reference

You should definitely read this part now! And no skimming! There are lots of important details. If you're unclear, read the relevant portion of the ID3v2.2 reference. If you're still unclear, come find me.

ID3v2 header

When you open the MP3 file, the ID3 tag, if it exists, will be at the beginning of the file and will start with the following header of 10 bytes: ID3abcdefg, where ID3 are literally the characters 'I', 'D', and '3'. If the file does not begin with these three letters, it does not contain an ID3v2 tag.

The letters a-g are each one byte (bytes 3-9, counting from 0) and represent the following:

ID3v2 frames

After the header come a series of frames. Each frame begins with a frame header consisting of a frame name (three characters == three bytes) followed by the size of the frame (three bytes). You will only handle text frames and the PIC (picture) frame. All text frames have a name starting with the letter 'T' (and no frames start with a 'T' unless they are text frames). You will process frames one at a time until either:

Here is how you will deal with the frames:


Congratulations, you made it all the way down here! There are no specific functions that you have to write, but I would expect that writing the following functions would be helpful: You'll probably want a function called readID3 which reads the header and then loops calling a function called readFrame. The readFrame function determines the type of frame and either calls processTextFrame (if it's a text frame), processPictureFrame (if it's a picture frame), or ignores (and skips past) the frame.

When you're done with memory, be sure you free it. When you're done with a file, be sure you fclose it.

NOTE: When you run the program from your cs33/lab11 directory on an MP3 in the /scratch/your-username/cs33 directory, the output image will also be saved in the /scratch directory. To verify that your image saved properly, you can run display filename from the command prompt.

To view the actual binary contents of the MP3 file, you can use the program hexedit. For example, this command will open the specified MP3 file:

hexedit Happy_Together.mp3
Use page-up and page-down to look around the file, and Ctrl-C to quit. Despite the message, pressing F1 does not seem to provide Help.
Here is a sample run from my lab11 subdirectory. I am using my user name as a student which is chas. First my scratch directory before I start: licorice[chas]$ pwd /scratch/chas/cs33 licorice[chas]$ ls 01-Jackie.mp3 Can-Utility_and_the_Coastliners.mp3 Happy_Together.mp3 02-Act_Of_Love.mp3 Even-Keel.mp3 Shankill_Butchers.mp3 10-Girl_Sailor.mp3 Goodbye.mp3 Around_The_Sun.mp3 Happy_Alone.mp3 licorice[chas]$ Now my lab11 directory: icorice[cfklab11]$ pwd /home/chas/cs33/week11/cfklab11 licorice[cfklab11]$ ls id3* id3.c id3version.c licorice[cfklab11]$ Now I will run id3 in my lab11 directory with the full path to an mp3 file: licorice[cfklab11]$ ./id3 /scratch/chas/cs33/Around_The_Sun.mp3 TT2: Around The Sun TP1: R.E.M. TAL: Around The Sun TRK: 13/13 TYE: 2004 TCO: (17) COM: (SKIPPING 16 bytes) TEN: iTunes v7.5 COM: (SKIPPING 104 bytes) COM: (SKIPPING 130 bytes) PIC: (WRITING /scratch/cs33/chas/Around_The_Sun.JPG) getting some output in my lab11 terminal window and writing to my scratch directory. Just to check, notice the the ls in my scratch directory now contains a JPG file that was not there before: licorice[chas]$ pwd /scratch/cs33/chas licorice[chas]$ ls 01-Jackie.mp3 Around_The_Sun.mp3 Happy_Alone.mp3 02-Act_Of_Love.mp3 Can-Utility_and_the_Coastliners.mp3 Happy_Together.mp3 10-Girl_Sailor.mp3 Even-Keel.mp3 Shankill_Butchers.mp3 Around_The_Sun.JPG Goodbye.mp3 licorice[chas]$

Note: As of 1 Oct, the final exam for CS33 has been scheduled by the registrar for Friday 12/17/2009, 9:00am-12:00pm in SC240. Plan to attend.