CS97 Lab 0: Memory-mapped file I/O

Due: 11:59 p.m., Sunday, 04 September.

This is the first lab in a four-lab series in which you will implement several data access techniques, evaluate their performance, and write a simple query cost analyzer to choose the best (estimated) access technique for simple persistent data. The goal of this lab is to familiarize yourself with memory-mapped file I/O in C/C++, which will save you effort for the later labs in the series.



Memory-mapped file I/O

Memory-mapped file I/O is a common I/O technique for high performance data systems because it allows programs to efficiently edit parts of a file, share file data among concurrent threads or proccesses, and more-precisely control when parts of a file are read and written to disk than other I/O techniques. The idea is simple: rather than use a system call for each read or write to a file, instead load part (or all) of the file into memory. The program can read and edit that memory directly (without requiring system calls), and synchronize (write) the file back to disk with just a single system call as needed.

The basic procedure for using memory-mapped file I/O is:

  1. Open the file, obtaining a file descriptor.
  2. Using the file descriptor, map part (or all) of the file to memory, obtaining a pointer to the memory-mapped region.
  3. Use the pointer to directly read and edit the memory-mapped region, synchronizing the data back to the file as needed.
  4. After all changes to the file are complete and synchronized, unmap the memory-mapped region.
  5. Using the file descriptor, close the file.



Using mmap to read and write structured binary file data

In this lab you will use memory-mapped file I/O to read and write structured binary file data. Run update97 to copy the initial files to your cs97/labs/0 directory. These files are:

Your task is to write a single program in main.cc that uses memory-mapped file I/O to do each of the following:

  1. Print the monster contained in maru.dat to standard output.
  2. Create a new monster and place that monster directly after the two existing monsters in twomonsters.dat.
  3. In a new file legions.dat, place 100 copies of your new monster sequentially starting from the beginning of the file.

If you corrupt the data files and want to restore them to their initial state, you can do this by removing the data files from your directory (rm *.dat) and re-running update97.



Helpful C system calls

Unlike the later labs, completing this assignment is more like discovering and following the right recipe than applying awesome problem-solving skills. (I'm sorry!) I hope the time you invest learning memory-mapped file I/O now helps you avoid debugging nightmares on later labs.

Here are some useful C system calls that you'll probably need. Check their man (manual) pages for details about how to use them and which header files you need to include:

One annoyance of memory-mapped file I/O is that you can't use mmap to extend a file, i.e., it is a bug to map a page past the end of the file and write into that page. If you want to use mmap to edit past the end of the file, you first need to extend the file past the point you intend to write and then map the desired pages. The recommended method to extend the file is to lseek to the end of the file and then write "\0" bytes past the point you want to map. A non-recommended but simpler method is to use ftruncate to extend the file. (This method might ruin your performance measurements on the later labs.)

A helpful trick: mmap returns a void* pointer. Casting this pointer to a different type allows you to easily treat the mapped memory region as an array of that type. For example, if you have a Monster bob then

  Monster *m = (Monster*) mmap(...);
  m[41] = bob;
would write the 42nd Monster in the memory-mapped region to the value of Monster bob (if the 42nd Monster is in the mapped region).



Finding man pages

Finding the man page for a C topic is often easier than searching the web for the topic; if you know the topic's name, you can usually just type man <topicname> in your terminal window. For some topics or if you don't know the topic name, however, finding the man page can be difficult. For instance, Unix contains multiple man pages for open and the default open page is not for the C library function.

There are at least three good techniques to find the right man page for a topic:

  1. Use man -k <keyword> to search all pages for a given keyword. Be prepared for a lot of output; you'll probably want to pipe the output to less or more.
  2. If you know the man page for a related topic, you might be able to find the page for your desired topic by checking the related topic's "SEE ALSO" section.
  3. Search the web. A search for "C++ <topicname>" will sometimes even find the man page for the topic.

To check the non-default man page for a topic, you can specify the manual section to search. For example, the C open function is in the "3posix" section of the manual; to see that man page you would type man -S3posix open.



Submitting your work

When you are done, run handin97 lab0 to submit your work. You can submit as many times before the late deadline as necessary, but only your last submission will be graded.

handin97 lab0 will submit all files in your cs97/labs/0 directory; please remove the additional .dat files you've created before submitting your lab.

Finally, note that there is an unusual late policy for the labs in this course.