This is the first lab in a four-lab series in which you will implement several data access techniques, evaluate their performance, and write a simple query cost analyzer to choose the best (estimated) access technique for simple persistent data. The goal of this lab is to familiarize yourself with memory-mapped file I/O in C/C++, which will save you effort for the later labs in the series.
Memory-mapped file I/O is a common I/O technique for high performance data systems because it allows programs to efficiently edit parts of a file, share file data among concurrent threads or proccesses, and more-precisely control when parts of a file are read and written to disk than other I/O techniques. The idea is simple: rather than use a system call for each read or write to a file, instead load part (or all) of the file into memory. The program can read and edit that memory directly (without requiring system calls), and synchronize (write) the file back to disk with just a single system call as needed.
The basic procedure for using memory-mapped file I/O is:
In this lab you will use memory-mapped file I/O to read and write structured binary file data. Run update97 to copy the initial files to your cs97/labs/0 directory. These files are:
Your task is to write a single program in main.cc that uses memory-mapped file I/O to do each of the following:
If you corrupt the data files and want to restore them to their initial state, you can do this by removing the data files from your directory (rm *.dat) and re-running update97.
Unlike the later labs, completing this assignment is more like discovering and following the right recipe than applying awesome problem-solving skills. (I'm sorry!) I hope the time you invest learning memory-mapped file I/O now helps you avoid debugging nightmares on later labs.
Here are some useful C system calls that you'll probably need. Check their man (manual) pages for details about how to use them and which header files you need to include:
open(<filename>, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH)
One annoyance of memory-mapped file I/O is that you can't use mmap to extend a file, i.e., it is a bug to map a page past the end of the file and write into that page. If you want to use mmap to edit past the end of the file, you first need to extend the file past the point you intend to write and then map the desired pages. The recommended method to extend the file is to lseek to the end of the file and then write "\0" bytes past the point you want to map. A non-recommended but simpler method is to use ftruncate to extend the file. (This method might ruin your performance measurements on the later labs.)
A helpful trick: mmap returns a void* pointer. Casting this pointer to a different type allows you to easily treat the mapped memory region as an array of that type. For example, if you have a Monster bob then
Monster *m = (Monster*) mmap(...); m[41] = bob;would write the 42nd Monster in the memory-mapped region to the value of Monster bob (if the 42nd Monster is in the mapped region).
Finding the man page for a C topic is often easier than searching the web for the topic; if you know the topic's name, you can usually just type man <topicname> in your terminal window. For some topics or if you don't know the topic name, however, finding the man page can be difficult. For instance, Unix contains multiple man pages for open and the default open page is not for the C library function.
There are at least three good techniques to find the right man page for a topic:
To check the non-default man page for a topic, you can specify the manual section to search. For example, the C open function is in the "3posix" section of the manual; to see that man page you would type man -S3posix open.
When you are done, run handin97 lab0 to submit your work. You can submit as many times before the late deadline as necessary, but only your last submission will be graded.
handin97 lab0 will submit all files in your cs97/labs/0 directory; please remove the additional .dat files you've created before submitting your lab.
Finally, note that there is an unusual late policy for the labs in this course.