On this page:
15.1 Language
15.2 Memory Management
15.2.1 Runtime Representation
15.2.2 Memory Layout and Allocation
15.2.3 Contexts and Stack
15.2.4 Collecting Garbage
15.3 Testing and Debugging
15.4 Submission Steps and Deadlines
15 Memory Manager

This assignment has you implement an automated memory management system for an existing interpreter.

15.1 Language

You are provided with a mostly-complete interpreter for Vulture, which has the following concrete and abstract syntax:

exp := (<op> <exp> <exp>)

    |  (let (<id> <exp>) <exp>) [single-binding let]

    |  (if <exp> <exp> <exp>)

    |  true

    |  false

    |  <number>

 

    |  (lam (<id>) <exp>) [single-argument functions]

    |  (<exp> <exp>)

 

    |  (pair <exp> <exp>)

    |  (left <exp>)

    |  (right <exp>)

    |  (set-left <exp> <exp>)

    |  (set-right <exp> <exp>)

 

    |  (do <exp> <exp> ...)

 

    |  (gc <exp>)

 

op   := + | - | < | >

id   := any symbol

data Op: | op-plus | op-minus | op-greater | op-less end data Expr: | e-num(n :: Number) | e-bool(b :: Boolean) | e-op(op :: Op, left :: Expr, right :: Expr) | e-if(cond :: Expr, consq :: Expr, els :: Expr) | e-let(x :: String, bind :: Expr, body :: Expr) | e-lam(arg :: String, body :: Expr) | e-app(f :: Expr, arg :: Expr) | e-id(x :: String) | e-pair(first :: Expr, rest :: Expr) | e-left(pair :: Expr) | e-right(pair :: Expr) | e-set-left(pair :: Expr, newl :: Expr) | e-set-right(pair :: Expr, newr :: Expr) | e-do(exprs :: List<Expr>) | e-gc(expr :: Expr) end

15.2 Memory Management

The most important difference between other interpreters you’ve seen and the Vulture interpreter is that, given enough allocations, the interpreter simply runs out of space! You will write both the functions that allocate on the heap, and a function to collect garbage on the heap when it runs out of space.

15.2.1 Runtime Representation

First, Vulture defines a data structure called Heap for keeping track of stop & copy metadata, along with an Array representing memory:

data Heap: | heap(memory :: Array, offset :: Box<Number>, front-half :: Box<Boolean>) end

Note that the offset and front-half fields are boxes, not just numbers. In addition, the memory field is a mutable array. You should use the provided set-box and get-box functions to manipulate offset and front-half, and the operations get-now and set-now on arrays to mutably update the heap when necessary. Your memory manager should only create new heaps at the beginning of the run of a program, and shouldn’t create new ones on the fly.

15.2.2 Memory Layout and Allocation

There are four kinds of values, each with its own representation. All of the values are stored as a string tag followed by a number of values in adjacent indices of memory.

So, for example, in this heap:

[array: "pair", 3, 5, "num", 1, "clos", 1, "x", 0, "y", e-id("x")]

the first location 0 holds a pair, whose left element is the number at location 3 (the number’s value is 1), and whose right element is the closure at location 5. The closure at location 5 has a parameter "y" (which it doesn’t use), and closes over an evironment where "x" points back to the pair at location 0. Its body is the expression e-id("x").

Your first task is to write an allocator function for each of the value types in the language. The interpreter calls these functions when creating values, and it’s the job of the memory management system to fill them in:

alloc-num :: (Heap, Stack, Number -> Loc)

alloc-bool :: (Heap, Stack, Bool -> Loc)

alloc-pair :: (Heap, Stack, Loc, Loc -> Loc)

alloc-closure :: (Heap, Stack, Env, String, Expr -> Loc)

The data definition for the heap, along with a description of where to allocate, is described below. When an alloc- function is called, it puts the value’s data in the heap as described above, and returns the location of the beginning of the data (the place where the string tag, like "num" or "clos" is stored). If there isn’t enough space on the heap, an out-of-memory exception should be thrown. Note that the heap runs out of space when the current live half of the heap is full; the allocator should never fill up more than half of the heap with live data.

15.2.3 Contexts and Stack

Vulture’s interpreter is defined with Contexts as in the explicit stack interpreter. They differ in that in the positions that held values before now hold locations, which reference positions on the heap:

data Context: | c-op-left(op :: Op, right :: Expr) | c-op-right(op :: Op, left :: Box<Loc>) | c-if-cond(thn :: Expr, els :: Expr) | c-app-f(arg :: Expr) | c-app-arg(f :: Box<Loc>) | c-let-bind(x :: String, body :: Expr) | c-pair-left(right :: Expr) | c-pair-right(left :: Box<Loc>) | c-left-pair | c-right-pair | c-set-left-pair(newl :: Expr) | c-set-left-newl(pair :: Box<Loc>) | c-set-right-pair(newr :: Expr) | c-set-right-newr(pair :: Box<Loc>) | c-do(exprs :: List<Expr>) | c-binding(x :: String, v :: Box<Loc>) end

Note that these location are stored in Boxes, which gives the garbage collector a point of indirection for updating the stack after moving values around on the heap. This is discussed more in the next section.

Other than this difference, the contexts are handled much the same as in the explicit stack assignment. The provided interpreter is provided for you to explore for more detail about how they are used.

15.2.4 Collecting Garbage

For collecting garbage, you must implement a stop and copy collector. Your garbage collector will have the following signature:

collect-garbage :: (Heap, Stack -> Nothing)

This function will be called in two different situations:

Since collect-garbage has a return type of Nothing, it must do all of its work by mutating the heap.

The locations on the Stack provided to collect-garbage make up the set of locations to start from (the root set) in finding the live data on the heap. When called from a gc expression, this will be the current stack. In the alloc- functions, you may need to do some extra work with the stack to keep track of some values. With that in mind, the locations on the stack define the root set, and from this root set, collect-garbage should proceed to copy all the live data from one half of the heap to the other.

The algorithm you write should:

Example

For a heap of (total) size 40, here’s what it might look like before and after a garbage collection. We assume as input to garbage-collect a stack that only has a single binding in the root set, which points to location 5:

[list:c-binding("f", box(5))]

So, we assume that stack, and the following heap, are passed to collect-garbage:

front is live, offset is 16

      0          1          2          3          4          5          6          7          8          9

     --------------------------------------------------------------------------------------------------------------

0   | "pair"     3          5          "num"      1          "clos"     1          "x"        0          "y"

10  | <expr>     "pair"     0          14          "num"      72         0          0          0          0

20  | 0          0          0          0          0          0          0          0          0          0

30  | 0          0          0          0          0          0          0          0          0          0

40  |

The live data is the closure itself, and the pair and number value that are reachable from it (at locations 0 and 3), so they need to be copied over. The pair and number at locations 11 and 14 are garbage, and should be collected. Here’s what a post-collection heap might look like:

back is live, offset is 31

      0          1          2          3          4          5          6          7          8          9

     --------------------------------------------------------------------------------------------------------------

0   | "fwd"      26         5          "fwd"      29         "fwd"      20         "x"        0          "y"

10  | <expr>     "pair"     0          14         "num"      72         0          0          0          0

20  | "clos"     1          "x"        26         "y"        <expr>     "pair"     29         20         "num"

30  | 1          0          0          0          0          0          0          0          0          0

40  |

You don’t have to produce this exact output; there are several possible traversal orders that will work. Those values are copied over to the back half of the heap (starting at location 20), with updated pointers that reference the copied values in the new live half of the heap. The runtime "offset" is 31, and "front-half" should now be false. Note also that some residue is left on the front half of the heap: there are forwarding pointers with tag "fwd" left there. You might find the representation of forwarding pointers that this suggests to be useful.

Finally, your collector needs to ensure that any references on the stack are updated to handle any moved data. So the initial stack in this example, which held a single environment with a single binding, needs to have the Box within it updated so the binding points to 20, the new location of the closure that was originally at location 5. Note that we use Boxes so that you can perform mutable update on the bindings in the environment to achieve this extra level of indirection. You shouldn’t construct a new stack in garbage-collect, but should use stateful operations on boxes to make sure all of the bindings point to the right locations.

15.3 Testing and Debugging

There is no provided reference implementation of the garbage collector – it’s up to you how to design the traversal and copying algorithms, and make sure you sufficiently test the collector.

A few notes to help you test:

For debugging, the support code provides a function called print-heap that you can use to pretty-print a heap in the grid style above. Feel free to copy, adapt, and extend this function if you want more or different information printed.

Similar to the last assignment, the interpreter is defined by a single-step function step, so you can easily walk through the evaluation of a program and print the heap at each point by using step.

15.4 Submission Steps and Deadlines

The provided files are:

By midnight Saturday, April 25, submit 10 interesting test cases in whatever style you think is most useful. Partners should collaborate on testing strategy, but still submit separate tests.

By midnight Monday, April 27, submit reviews for tests you received.

By midnight Wednesday, May 6, submit a complete bundle of tests and GC implementation.