CS 45 — Lab 4: Simulating a Page Table

Due: Sunday, April 5 @ 11:59 PM

1. Overview

For this lab, you’ll be translating a sequence of virtual memory accesses into their corresponding physical memory addresses. You’ll build a simulator that will keep a single-level page table for one process and update the page-to-frame memory mappings as the process accesses virtual memory addresses. You’ll be simulating the page table and physical memory rather than using Linux because:

  1. The details of memory hardware are nasty, particularly for architectures like x86 that have been around for a long time and support lots of backwards-compatibility features.

  2. If your non-simulated implementation breaks, you’re unlikely to get much feedback, making development very frustrating.

  3. Building an abstract simulator should cover the important high-level concepts without bogging us down in architectural details.

The provided starter code has three main components:

  • frames.h & frames.c: Simulates physical memory (manages physical frames).

  • pages.h & pages.c: Simulates virtual memory mappings (page table) and tracks the page replacement policy.

  • simulator.c: The main file that reads a sequence of addresses, performs memory mappings, and prints the results.

Your simulator will take several parameters to vary the characteristics of the memory system you’re simulating, including the size in bytes of a page/frame, the size in bytes of the virtual address space, the size in bytes of the physical address space, and the page replacement policy to use.

1.1. Goals

  • Design and implement a single-level paging system.

  • Compare and contrast page replacement policies.

  • Use the mmap() function to map a file into a process’s virtual address space.

  • Reason about the sizes and number of bits used to track pages and frames.

1.2. Handy References

1.3. Lab Recordings

Week 1

Week 2

2. Requirements

2.1. Memory and Files

Your simulator should free all memory, close all files it’s no longer using, and get a clean bill of health from valgrind: no invalid memory accesses, uninitialized variables, or memory leaks.

Running valgrind shouldn’t be an afterthought! Use it as you incrementally test your code, and it’ll help you to catch memory problems sooner.

2.2. Address Translation

Your simulator should read a sequence of virtual memory addresses from a memory-mapped file. The file will contain addresses that are stored as 64-bit integers in a little-endian binary format (the native format for integers on x86).

The simulator already reads in the sizes of a page/frame, virtual address space, and physical address space (all in bytes) as command line parameters. It passes those values to the physical and virtual memory initializers, which you will write. Those initializers should set up the memory based on the provided sizes. You may assume that:

  1. The page/frame size will evenly divide the VAS and PAS sizes.

  2. The page/frame size will be a power of two.

  3. The VAS size will be a power of two.

  4. Your simulator will not be asked to translate a virtual address that is larger than the provided VAS size.

2.2.1. Mapping the Input File

The mmap() function will allow you to place the contents of an input file directly into your process’s virtual address space. That way, you won’t need to call functions like read() or fread() to access the file, you’ll be able to access it as if it were an array loaded into memory.

When you map the file, you’ll want to map it as private (only accessible to the simulator process) and read only (your simulator doesn’t change the contents). While mmap() allows you to express where in the virtual address space you’d like the file to be mapped, for our purposes we don’t actually care — it’s easier to just let the OS decide for us.

PLEASE look over the manual page for mmap(). It will tell you how to express all of the preferences stated above. It will also describe how you can check the return value for errors. I’m happy to answer mmap() related questions, but I’d also like you to experiment a bit with memory mapping the file.

2.3. Reporting Results

Your simulator can print any debugging information you’d like to the standard error stream, but standard out should be reserved for your results, which should be formatted as follows: for each input virtual address, you should print the corresponding full physical address that it maps to, followed by a newline (\n). Each address should be printed as a 64-bit hexadecimal number with a "0x" prefix and any necessary leading zeroes to pad the length of the address to 16 hex digits. The starter code has an example of the printf format you should use.

Finally, after all the addresses have been translated and printed, you should print just a single number on a line by itself: the total number of page faults that occurred during your translations.

2.4. Free Frame Management

When initializing physical memory, frame 0 should be permanently reserved for the OS. You should initialize all other frames as free. When you detect a page fault (you are asked to map a page that is not in physical memory), you should first check physical memory to see if any frames are available. If so, you should give the faulting page the lowest numbered free frame and mark the frame as in-use.

2.5. Page Replacement

When you detect a page fault after all frames are full, you’ll need to evict a mapped page to make room for the one that’s faulting. Your simulator should support three page replacement policies to choose a victim page: LRU, Clock, and a third policy that you invent. The selected policy will be passed in as an integer command line parameter using constants that are defined in pages.h: POL_LRU (1), POL_CLOCK (2), and POL_CUSTOM (3).

  • For LRU, you should track the full order of page usage and evict the least recently used page.

  • For clock, you should initialize the clock hand to page 0.

  • For your custom policy, you can implement any non-trivial policy you want. To be eligible for extra credit, your custom policy must be deterministic (i.e., not random — every run should produce the same results if given the same inputs).

2.6. Data Storage and Function Interface

You’re welcome to add or modify structs, but you should not change any of the function prototypes in pages.h and frames.h. Any global state or helper functions you add to pages.c and frames.c should be static (only accessible in the file it’s defined in). Adding helper functions in those files is strongly encouraged, but the simulator should ONLY be able to interact with physical memory using the functions declared in frames.h and the page table using functions declared in pages.h.

To be clear:

  • pages.c should not include frames.h

  • frames.c should not include pages.h

  • data should not be shared between either module and the main simulator unless it’s through the provided function interface, which you should not change.

3. Checkpoint

There is no checkpoint for this lab. Don’t let that stop you from beginning the lab early though!

I would suggest that before our next lab meeting, you:

  1. Add enough to translate() to see the sequence of virtual addresses that are arriving for translation (for debugging purposes).

  2. Sketch out a design for LRU (data structures and the operations you’ll need).

4. Tips & FAQ

  • When getting started, you should sketch out your translate() function using the functions that are available to you in pages.h and frames.h, even if you haven’t implemented them yet.

  • The starter code provides several example sequences in the input/ directory that you can use for testing. The notes file contains more information about them. If you’re using these tests, make sure your VAS is at least 65536 bytes:

    ./simulator 4096 65536 <PAS> <policy> <input file>
  • You can test your own memory address sequences by using the write_bytes.py python script to create a new mmapable input file. To use it, add a sequence of memory addresses in the list at the top and then run it with one argument: the filename you’d like to write.

  • Even though you’ll only be using one page replacement algorithm at a time, it’s probably easiest to keep state for all the algorithms in your page struct rather than trying to dynamically allocate just one or the other (e.g., using lots void *'s, casting, and/or function pointers).

5. Bonus Competition

This portion of the lab is not required. It’s for fun and a small amount of extra credit.

For a chance at extra credit and eternal CS 45 glory, you may optionally nominate your custom page replacement algorithm to compete against other submissions.

5.1. Rules for Participation

  1. To participate, you must define a third replacement policy, numbered 3, and write a brief description of the intuition behind your policy in a file named competition.txt. Just a couple of sentences is fine. If no competition.txt file is present, your submission will not enter the competition.

  2. You may NOT look ahead in the stream of memory accesses. That is, your simulator must only consider the address it’s currently translating plus any past information that it has seen, but it may not look at the addresses that will be accessed in the future.

  3. You may NOT submit your LRU or Clock implementation as your competition submission. It needs to be a new algorithm of your design!

  4. Your policy must be deterministic. That is, given the same inputs it must always produce the same output.

5.2. Prizes

I’ll run your new policy on two address streams: one that has good locality, and another that has poor locality. For each of the two inputs, I’ll declare a winner and runner-up based on the total number of page faults. The winner gets 1 extra lab point, and the runner-up gets 1/2. In the interest of balance, a group can only win one of the two files, with the good locality file being determined first.

6. Submitting

Please remove any excessive debugging output prior to submitting.

To submit your code, commit your changes locally using git add and git commit. Then run git push while in your lab directory.