CS45 Lab 4: Paging Experiments

Code Due: Sunday April 6 before 11:59pm
Report Due: Wednesday April 9 at the beginning of class

Content:

Problem Introduction
This assignment is a study of Linux's implementation of virtual memory. It is a two part assignment, the first part involves implementing a simple system call to obtain page fault statistics for individual processes, groups of processes, and for the system as a whole. The second part involves evaluating Linux's page replacement policy using your system call (and other utilities) to obtain performance measures of workload tests you design and run. You will write a 3-5 page report describing your experiments and results.

This project requires a fair amount of time in designing, writing, and running user-level experiments, and very little time writing and testing your system call. I'd recommend leaving yourself at least one full week, and better yet, 1.5 weeks, for designing, writing, and running experiments and for writing your final report.

Project Details
Part 1: Adding a new system call
Start by setting up a git repo for your lab 4 work (remember to add three users: you, partner, and your shared cs45X user). Then, copy over my starting point files and add them to your repo:
cp ~newhall/public/cs45/lab04/* .
The starting point contians files with #includes and a Makefile for a test program.

Implement a new system call:

int pgfltstats(pid_t pid, int flag, pf_info_struct *info)

The pf_info_struct should contain fields for:

  1. total number of minor page faults
  2. total number of major page faults
  3. number of processes included in these counts

If pid is:

You can use fields in task_struct for obtaining the values you need. Make sure to check for error conditions and return appropriate error values as in the previous assignments.

You can use vmstat, ps and /proc/pid/ files to help you determine if your system call is implemented correctly (look at vmstat, ps, and proc man pages for more info). Here is an example of printing out the uid of every process in the system, and then sorting the results by pid:

ps axo pid,ruid,time,cmd | sort -k 2
You may also want to use your procinfo system call from lab 2 (perhaps modifying it to include uid information).

Part 2: Experiments Testing the VM system and a Written Report
For this part, you will use your system call (and other utilities) to collect information about page faults and other performance data for different programming loads. I want you to (through experiments) determine what page replacement algorithm Linux likely implements. I'm sure you can figure out what Linux's page replacement policy is just by reading Linux documentation and/or reading Linux source code. However, the point of this assignment is to see if you can verify experimentally what you know (or suspect) about Linux's page replacement policy.

Kernel code you may want to browse:

Think about running experiments to answer the questions about how the policy works for different types of workloads, and what that tells you about the policy. In designing your experiments, you should write user-level test programs that access memory in different patterns, and run different experiments using them to collect performance data to help you answer questions like:

  1. When does Linux's page replacement policy work well (what types of workloads)?
  2. When does it not work well?
  3. How well does it work for a mixed workload?
  4. etc.
And what do the answers to these questions tell you about Linux's page replacement policy? For example try and answer questions like:
  1. is it most like FIFO?
  2. or MRU?
  3. or LRU?
  4. or Working Set?

    Why or why not? And, in what way(s)?

The idea is to design experiments that support your answers to these questions.

Getting Started With Testing

To get started, you will likely want to read kernel code and other Linux documentation to help you develop some initial hypotheses. You can test negative hypotheses as well as positive ones (i.e. "I don't think Linix implements the X policy, and I'm going to test this by ..."), but you should have some positive hypotheses as well; there is a difference between negative hypotheses that can help support your positive ones, and bad hypotheses (and bad experiments) that don't really show anything.

Next,think about how to design experiments to test your hypotheses:

Look at the details in the Report section below to help guide your hypothesis and experiments design and testing.

To write test programs that use enough memory to trigger paging, look at the /proc/meminfo file. This give the total amount of free memory (MemFree, in Kbytes). If you are running an experiments, make sure the the process(es) allocate enough memory to trigger page replacement (they need to allocate enough virtual memory space so that it won't all fit into free physical memory). However, be careful that you do not allocate too much memory, otherwise, you can run out of swap space and the kernel will start killing process. The amount of free swap space can be found in /proc/meminfo (SwapFree, in Kbytes).

Note: it will take several rounds of trial and error to get your experiments set so that they are really testing the thing you want to test. As a result, start early, run some initial tests, see what they show (or don't show) and then re-design to get tests that show what you want.

Also, each individual experiment should be run several times; a single run of an experiment is not definitive. You should run several iterations of each experiment and take the average (and compute stddev) over this set.

You may also want to periodically re-boot and start with a clean system from time to time as you go: RAM will get warmed up after the first run of something (the a.out file can still be in RAM from the previous run), and this may or may not be important to your experiments.

Report

You should write 3-5 page report (do not give me anything longer than 5 pages) describing your experiments and results.

In particular, your report should have the following:

  1. Introduction. Describe at a high-level what you are doing and why, and describe what your high-level hypothesis is (i.e. "We started out guessing that Linux's page replacement policy is X, but our experiments will show that it is in fact, more like Y" or "we start out making no assumptions about Linux's page replacement policy, and our experiments will answer the question of is it more like X, Y or Z.").

  2. Short paragraph describing how you implemented your system call to get get per-process, per-owner, and system wide page fault counts? Also, describe other Unix utilities you used to collect performance data for your experiments.

  3. Main part: describe your experiments and results: For each experiment, you should:
    1. Clearly state what your hypothesis is
      (i.e. "Application that does X should be a bad/good case for Linux's page replacement policy because ...)
    2. Explain how your experiment is testing that hypothesis
      (We are testing this hypothesis by running application P, which does ..., N times on and collecting X,Y, and Z metrics using our system call, vmstat, time,... With these measures, we can see whether or not ... because ...).
    3. Present your Results.
    4. Explain your results!
      (Our results show that ... This does/does not match our expected result, because ... (if it doesn't, think about some other tests you could run to explain why it doesn't, or look for data you do have to explain it)
    Quality of Experiments is much, much better than quantity; a few well thought out and well done experiments is sufficient.

  4. Conclusion. It should include a statement of the major result(s) of your experiments (which policy(ies) is Linux's most like? and in what way(s)?). Also, tell me what you found to be the most difficult part of the hypothesis testing phase of this assignment.


Some useful system calls and utilities for testing purposes


Latex
You can use any word processing software you'd like to write your report. However, if you want to learn latex (word processing the Unix way), there are some starting points for a report (paper) and a report using bibtex for references (bibtex) in the following directory:
/home/newhall/public/latex_examples

For this assignment, the example in the paper subdirectory is likely a good starting point. In the paper subdirectory is an README file with information on how to create a .dvi .ps and .pdf file from the latex source file (example.tex). In addition, there is a makefile that contains rules for building all of these. The example.tex file, contains examples of importing postscript figure files, creating sections and subsections, using different fonts, creating lists, and referencing figures within a document. Probably the easiest thing to do is to grab the contents of this directory as a starting point for your report, then edit example.tex with your report.

Also, off my Unix and C help pages, I have much more documentation about creating documents, and figures, graphs: Tools for Documents and Figures

Submit
  1. Code: using cs45handin, submit a tar ball with the following:
    1. README file with (1) your names, (2) how many late days you have used on this assignment, and how many you have used total, (3) a description of how to run your test programs (show example command lines, or how to run bash scripts).
    2. copies of all .c and .h files you added or modified in the kernel to implement your system call
    3. copies of all your test programs used in experiments, and the test program you used to test your system call for correctness.

      also include any bash scripts you wrote for running experiments.

  2. Report: turn in a hardcopy of your report in class on the same day that you submit your tar ball. You should also email me a pdf file; but I want a print out handed in in class too.

    NOTE: If you turn in your report later than the beginning of class on the due date, then your assignment is late (even if your tar ball was submitted before the due date).



Demo

You and your partner will sign up for a 20 minute demo slot. You should show me that your system call is correct, robust, and implements all required features (and think about verification of this to demo me). In addition, I'd like to see you run one or more of your test programs that you ran as part of your experiments.