Lab 09: Reading ID3v2.2 Tags

Due 11:59pm MONDAY November 28, 2011

Introduction

The program handin33 will only submit files in the cs33/lab/09 directory. (You should run update33 first to set up the directory and create any necessary files.)

This program will probably take you longer than the average program in the class. There is lots of text to read below, and the program has lots of little details that you can mess up. Start early and consider working with a partner. You will have 10 days to complete the lab, but this includes the 4 days of Thanksgiving break, so plan accordingly.

Your program must follow these following guidelines:


There is only one program this week, but the program is complex. I once again encourage you to work with a partner.

The goal of this week's assignment is to read in an MP3 file and display the contents of its ID3 tag. For those of you who are unfamiliar, MP3 files are files that contain compressed audio and may optionally contain information specifying the artist, song title, album name, etc. This additional non-audio data comprises the ID3 tag.

In 1996, the first specification for ID3 tags, ID3v1, was created. While this specification was simple, there were complaints about the format. One problem with the specification was the small amount of space (only 128 bytes) that was made available for storing non-audio data. Artist names, song titles, and album names all had to be 30 characters or less. This meant that you couldn't store "Elvis Costello and the Attractions" as the artist for their album "This Year's Model". Since the maximum length of the artist name was 30 characters, you'd only be able to store "Elvis Costello and the Attract".

In 1998, the ID3v2 specification was created. This allowed much more non-audio data (256 megabytes instead of 128 bytes) to be stored, without the unreasonably small restrictions on the number of characters that could be stored for any of individual piece data. (The new format allows each piece of data to be 16 megabytes; plenty big for the artist's name.) One new piece of non-audio data that could be stored was an image (of the album cover, for example). Along with information such as artist and album name, the program you write will be able to extract this image.

The most popular version of the ID3 specification is ID3v2.3. However, if you import a CD in iTunes, iTunes uses the older ID3v2.2 specification. iTunes can read the new specification, and can even convert your MP3 to use any one of 5 different specifications; but, if you don't do anything to your MP3's after you've imported them, you've got ID3v2.2 tags. (At least, this was true for me using iTunes 10.5.1 on a Mac, and I'm fairly certain I haven't messed with any settings. To convert a song from one ID3 format to another, control-click/right-click on the file name in iTunes and choose "Convert ID3 Tags".)

Since specifications have important differences between them, and since it doesn't make sense for you to write a data extractor for each of the 5 available ID3 versions, you're going to write the extractor for the ID3v2.2 specification. This means that if you use iTunes to import CDs, you've already got a bunch of MP3s which you can try this on. If you don't use iTunes (or you don't use the MP3 format), you're not out of luck: I have provided each of you with seven MP3 files to play with. Since these songs are copyrighted, and I'd like to avoid the wrath of the RIAA, I have snipped them down to 30 seconds each.

You can find the sample music files in the following directory, where username is your username on the CS machines: /scratch/richardw/music/username/. The files are located there (instead of via update33) to avoid issues with your quota, and they will be deleted shortly after the assignment is due.

You can work out the details of how to do the extraction by reading the ID3v2.2 reference. But, you'd probably do much better reading my distilled version below.


Your program will be called id3.c. To run your program, you will type ./id3 <filename.mp3> on the command line, substituting <filename.mp3> with the actual name of the MP3 file. When you run the program, you will print out the information in the ID3 tags, skipping over some fields which you won't worry about, and write the image file to the disk for later viewing.

You will need to store lots of variable-sized strings and byte-arrays in this assignment. All strings and arrays should be dynamically allocated. There are almost no cases in the assignment where using a statically allocated array makes sense, and even if you think it does, use a dynamically allocated array anyway (for practice). You won't have to allocate any multi-dimensional arrays.

C reference

You should probably not really read this part now. Instead, skim the titles of the bullet points and come back to them when you're writing your program and need to know how to do one of these things.

Here are the new C things you'll need to know for the lab:

  1. Command-line arguments: So far, whenever we've written the main function, we've specified that it took no parameters. However, main can be written to take two parameters, as follows:
    int main(int argc, char **argv) {
      ...
      return 0;
    }
    
    • argc stores the number of command-line arguments.
    • argv stores each of the arguments. Notice that the type of argv is char **. This is because argv is a dynamically allocated two-dimensional array of characters. Since a string in C is a one-dimensional array of characters, another interpretation is that argv is a one-dimensional array of strings.
    There is always at least one command-line argument since the name of the program you ran is stored in argv[0]. So, if you typed ./id3 Pump-It-Up.mp3, argc would equal 2, argv[0] would equal "./id3", and argv[1] would equal "Pump-It-Up.mp3".

    Your program will exit with an informative error message if the user runs the program without specifying an MP3 file as an argument.

  2. Printing error messages and exiting gracefully: In the previous sentence, I mentioned that you should display an informative error message if the user does not specify an MP3 file as a command-line argument. Throughout the program, you'll do similar things. For example, if the MP3 file contains ID3v2.3 tags, you'll tell the user that your program can only handle ID3v2.2 tags, then exit the program.
    • Normal print statements have their output sent to standard out. However, it is typical for error messages to be sent to standard error. To do this, you need the fprintf function. Here is an example which you should be able to adapt as necessary:
        for (i = 0; i < 5; i++) {
          fprintf(stderr, "Message %d sent to standard error.\n", i);
        }
      
    • We talked previously about how when your program exits, main typically returns 0 to indicate successful completion. After a program-ending error occurs, you'll want first print an error message to standard error, and then you'll want to exit the program and indicate a failure. To do this, you use the function exit which takes as a parameter the return code you'd like to send. To indicate failure, you'll use exit(1);. Adapting the code above:
        for (i = 0; i < 5; i++) {
          fprintf(stderr, "Message %d sent to standard error.\n", i);
          if (i == 3) {
            exit(1);  /* Exit, indicating a failure */
          }
        }
      
  3. Reading binary data from a file: To read the ID3 information from the MP3 file, you'll need to use four new functions, and two new data types.
    • The first data type you'll be using is called a file pointer, and you declare it as follows:
        FILE *fptr;    /* fptr is a file pointer */
      
      The file pointer contains information such as how to find the file on disk, and where in the file the next byte you'll read will come from.
    • You'll be reading individual bytes from the file. There is no "byte" data type in C, but the char type is defined to be only 1 byte, so you can use that to store a byte of data. One problem is that the char type is signed (which means it stores a two's complement value) and you don't want to interpret the bytes you read from the file as signed values. So, you'll use the unsigned char type:
        unsigned char uc;
      
      You use unsigned characters just as you would regular characters.
    • You'll need to associate the file pointer with a particular file. You'll do this by using the fopen function as follows:
        fptr = fopen(filename, "rb");  /* filename is a string storing the name of a file */
      
      If fptr equals NULL after calling fopen, there was an error. If that happens, print an informative error message and exit. The string "rb" indicates that the file will be open for reading (r) and that the contents of the file should be interpreted as binary data (b), not text data.
    • You'll need to read data from the file one byte at a time. To do this, you'll use the fgetc function as follows:
        uc = fgetc(fptr);
      
      If fgetc(fptr) reaches the end of the file, the return value is a constant called EOF. For example:
      int main() {
        FILE *fptr;
        char c;
      
        fptr = fopen("test.c", "r");
        while ((c = fgetc(fptr)) != EOF)
          printf("%c", c);
        }
      
        return 0;
      }
      
      Unfortunately, EOF is signed and it's value in C is -1. To ensure that you match EOF properly while reading into an unsigned char, you might want to replace the line above as follows, where uc is an unsigned char:
        while ((uc = fgetc(fptr)) != (unsigned char)EOF)
      
      Remeber that each time you call fgetc, the file pointer moves one byte forward in the file.
    • A file pointer is a pointer to a dynamically created structure stored on the heap. However, when you're done using a file, you don't use free on the pointer name to deallocate the space. Instead, you use the fclose function:
        fclose(fptr);
      
      The fclose function ensures that the operating system is told that the file is no longer in use, and ensures that if the file was being written to, that the write completes before the memory is freed.
    • To find out how many bytes you are from the start of the file (helpful for knowing when you should stop reading), use the function ftell:
         int position;
      
         position = ftell(fptr);  /* number of bytes read from the start */
      
  4. Writing binary data to a file: To write binary data to a file, you'll once again employ file pointers and unsigned chars. You'll also need fopen and fclose, and a new function, fputc. The following code opens two files, one for reading and one for writing. It reads the ten bytes from one file, and writes them to the second file. It carelessly does no error checking.
      unsigned char uc, result;
      FILE *inptr, *outptr;
      int i;
    
      inptr = fopen("first", "rb");
      outptr = fopen("second", "wb");  /* "w" == "write" */
    
      for (i = 0; i < 10; i++) {
        uc = fgetc(inptr);
        result = fputc(uc, outptr);  /* if result == EOF, there was an error */
      }
    
      fclose(inptr);
      fclose(outptr);
    

ID3 reference

You should definitely read this part now! And no skimming! There are lots of important details. If you're unclear, read the relevant portion of the ID3v2.2 reference. If you're still unclear, come find me.

ID3v2 header

When you open the MP3 file, the ID3 tag, if it exists, will be at the beginning of the file and will start with the following header of 10 bytes: ID3abcdefg, where ID3 are literally the characters 'I', 'D', and '3'. If the file does not begin with these three letters, it does not contain an ID3v2 tag.

The letters a-g are each one byte (bytes 3-9, counting from 0) and represent the following:

ID3v2 frames

After the header come a series of frames. Each frame begins with a frame header consisting of a frame name (three characters == three bytes) followed by the size of the frame (three bytes). You will only handle text frames and the PIC (picture) frame. All text frames have a name starting with the letter 'T' (and no frames start with a 'T' unless they are text frames). You will process frames one at a time until either:

Here is how you will deal with the frames:


Congratulations, you made it all the way down here! There are no specific functions that you have to write, but I would expect that writing the following functions would be helpful: You'll probably want a function called readID3 which reads the header and then loops calling a function called readFrame. The readFrame function determines the type of frame and either calls processTextFrame (if it's a text frame), processPictureFrame (if it's a picture frame), or ignores (and skips past) the frame.

When you're done with memory, be sure you free it. When you're done with a file, be sure you fclose it.

NOTE: If you run the program from your cs33/labs/09 directory on an MP3 in the /scratch/richardw/music/username/ directory, the output image will also be saved in that directory. To verify that your image saved properly, you can run display filename from the command prompt.

To view the actual binary contents of the MP3 file, you can use the program hexedit. For example, this command will open the specified MP3 file:

hexedit Pump-It-Up.mp3
Use page-up and page-down to look around the file, and Ctrl-C to quit. Despite the message, pressing F1 does not seem to provide Help.