Garbage Snake
Due on Monday, April 17th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, we will extend the Foxsnake compiler to a new language called Garbage Snake. Garbage Snake introduces mutation (the ability to change some variables or memory locations). Garbage Snake also introduces a garbage collector, a tool which automatically manages the allocation of memory in user programs. The Python runtime, for instance, has a garbage collector while the C runtime does not.
Transitioning to Garbage Snake
When you git pull, you’ll discover the following updates:
- src/languages/garbageSnake/, the directory containing our lexer, parser, and ASTs.
- resources/gc.hand- resources/gc.c, which will contain the code for our garbage collector.
As usual, we will begin by changing the LANGUAGE variable in language.mk from foxsnake to garbageSnake.  After doing so, run make clean, make tests, and ./tests.byte.  Everything should run just fine, although you’ll get a few warnings due to the one new AST node that has been introduced.
The Garbage Snake Language
Garbage Snake is syntactically identical to Foxsnake except for the addition of an expression which can change the value a tuple element. The garbage collector will run automatically in the background as our program runs and does not affect the syntax of the language.
Concrete Syntax
As the addition to Garbage Snake is very small, we show only the new syntactic form here:
<expression> ::=
  | <expression>[<expression>] := <expression>
  ...
All other syntax is the same as in Foxsnake.
Abstract Syntax
As usual, you can see the definitions of expressions for Garbage Snake in src/language/garbageSnake/expressions.ml.  The only addition is the ESet constructor to accommodate the new expression above (and its A-normalized version, CSet).
Semantics
The new expression is the first form of mutation in our language!  The := syntax allows us to change the value in a particular position of a tuple; this update will be visible to anyone with a reference to the tuple.  The update always returns the value which was assigned to the tuple (much like print).  For instance,
let t = (1,2,3) in
let x = (t[0] := 5) in
t[0] + t[1] + x
evaluates to 12.  First, a tuple (1,2,3) is allocated on the heap and a pointer to it is assigned to the variable t.  Then, t[0] := 5 changes the first element of the tuple to a 5, so all references to it now point to heap memory containing (5,2,3).  The assignment returns the 5, so x is bound to that value.  Finally, t[0] + t[1] + x evaluates to 5 + 2 + 5: the t[0] value has changed to a 5, the t[1] value is still the 2 from before, and x was bound to the 5 produced by the update.
Note that this new update syntax allows us to create cycles in our memory graph, which leads to interesting scenarios for garbage collection! For instance, we can write
let t = (1,0) in
let x = (t[1] := t) in
t
to produce a tuple whose second element points to the tuple itself. Make sure to try cases like this in your garbage collector once basic functionality appears to be working!
Errors
Errors for the tuple update syntax are the same as errors for the tuple access syntax:
- If the first expression evaluates to a non-tuple value, then you should indicate that a tuple was expected (error code 4).
- Otherwise, if the second expression evaluates to a non-integer index, then you should indicate that an integer was expected (error code 1).
- Otherwise, if the index is out of bounds (negative or too large), you should indicate that the index is invalid (error code 5).
Implementing Mutation
If you like, you can set the rest of this lab aside for now and implement the tuple mutation feature described above. This is not the main part of the lab; we’re only adding this feature to test how your future garbage collector reacts to cycles in data. Fortunately, you can use the code you’ve written for tuple access as a guide to implement this feature quickly. You’ll want to do so before you start on your garbage collector; whether you’d prefer to implement this feature first or read the rest of the lab first is up to you.
Supporting Garbage Collection
The bulk of the lab assignment is the implementation of a Mark-Compact garbage collection algorithm.  To save us a great deal of tedium and frustration, we will write this garbage collector in C rather than x86 assembly.  The new files resources/gc.h and resources/gc.c will be included in any program your compiler builds and you will make use of this code to automatically reclaim any memory that the program is no longer using.
This section of the lab describes the changes we will make to our compiler to support garbage collection. Later in this lab, we will describe the mark-compact algorithm we will use. We will also discuss an implementation plan for approaching the problem.
Cleaning Up the Stack
Our garbage collector will work by looking through the stack and heap for pointers to heap objects and then reclaiming memory used by heap objects for which we don’t have pointers. In order to do this correctly and safely, we need to be sure that the stack contains only valid heap pointers. If old pointers are left on the stack, they may prevent the reclamation of memory or even crash the program (if the garbage collector tries to use an old pointer value). For instance, consider this code:
let a =
  let x = 4 in
  let t = (1,2) in
  x
in
some_function a
In the above code, we may have the following stack memory assignments: a maps to [ebp-4], x maps to [ebp-4] (because a and x never exist at the same time), and t maps to [ebp-8].  This means that the pointer to the tuple (1,2) will be stored in [ebp-8] and then left there even after the variable t goes out of scope.  The garbage collector will not be able to tell that t is no longer bound, so the pointer in [ebp-8] will prevent the tuple (1,2) from being collected.
A more extreme problem can arise due to the left-over pointers that previous function calls leave in stack memory: future function calls may allocate that stack memory for their local variables and then cause a GC call before that memory is used, causing the garbage collector to use a left-over (and possibly invalid) pointer during collection; this could lead to a segmentation fault!
We can solve this problem just by making sure there’s never any junk on the stack. To do this, we just need to zero out some memory.
- At the start of each function (including snake_main) when we moveespto allocate stack memory, we should write zeros to all of the memory betweenebpandesp.
- Whenever we stop using a local variable (at the end of handling an ALet), we should write a zero to its memory location.
The above steps allow us to maintain an invariant that any memory in the stack which is not actively in use will contain a zero.  This way, our garbage collector will only consider valid pointers.  There are a number of ways you might choose to zero out memory.  One of them is an instruction, rep stosd, which works much like rep movsd but copies ecx words of memory into the address in edi from the eax register (rather than from the address in esi).  If you set eax to zero, you can use rep stosd to zero out arbitrary memory without writing a loop yourself.
Heap Layout
To avoid complexity, we will use an algorithm similar to the LISP2 variation of mark-compact. This requires us to have enough memory to store some garbage collection data about each object on the heap. To make sure we reserve enough memory, we will store this information in the heap itself!
We will modify tuples and closures so that the second word of memory is reserved for the garbage collector.  Everything other than the first word moves to the right to make room.  For instance, the tuple (2,5) will be represented as follows:
| tag & size (32 bits) | GC word (32 bits) | 2(32 bits) | 5(32 bits) | 
|---|---|---|---|
| 0x00000002 | 0x0000000 | 0x00000004 | 0x0000000a | 
As another example, the closure for a four parameter function which currently carries two arguments 6 and 8 might look like this:
| tag & arg count (32 bits) | GC word (32 bits) | param count (32 bits) | code address (32 bits) | 6(32 bits) | 8(32 bits) | 
|---|---|---|---|---|---|
| 0x80000002 | 0x00000000 | 0x00000004 | 0x080484f8 | 0x0000000c | 0x00000010 | 
The GC word will be 0x00000000 except when the garbage collector is running.
Global Variables
The addition of tuples in Egg-Eater prompted the addition of a global variable heap_cursor.  This variable contains a pointer to the next free byte of heap memory.  If we are to manage memory properly, we will need additional information.  These variables will be defined within your assembly code, but they are declared (as extern) within resources/gc.c so our garbage collection code can make use of them.  These variables are:
- start_of_stack: a pointer to the bottom of your program’s stack. This pointer will be set by your- snake_maincode. We will use this to keep track of the first stack frame that we control; this will make it easy to ignore stack frames from the C runtime.
- end_of_stack: a pointer to the top of your program’s stack. This will be set by your assembly code immediately before the garbage collector runs.
- start_of_heap: a pointer to the start of your heap. This will be initialized by your- snake_maincode just as- heap_cursoris currently initialized. We will continue to move- heap_cursoras we allocate memory, but- start_of_heapwill not move.
- end_of_heap: a pointer to the end of you heap. This will also be initialized by- snake_mainby adding a second parameter to that function.
Calling the Garbage Collector
So far, we have simply incremented heap_cursor whenever we needed more memory.  If the program allocates enough memory to move the heap_cursor beyond the end of the heap, it simply crashes.  In this lab, our program will instead call the garbage collector.
There are two places in the compiler which allocate more memory: the creation of a tuple and the creation of new closures. We will modify the assembly generated in those cases to
- Calculate how much memory is needed in advance (if you haven’t already done so)
- Check to see if that much memory is available (using heap_cursor,end_of_heap, and the size you calculated)
- If not enough memory is free, call gcwith the needed amount of memory
- In either case, proceed as before to update heap_cursorand allocate memory
If the heap is full and gc is called, the gc function may change the value of heap_cursor and free up enough memory for the program to use.  If not enough memory can be freed (because it’s actually all in use), the gc function will terminate the program with exit code 7, which we will take to mean “out of memory”.  (We could instead choose to try to allocate a larger heap at this point.  For this lab, we choose not to try to be that clever.)
Make sure to read the documentation in gc.h to understand how the gc function is expected to behave!
Mark-Compact Garbage Collection
We will now discuss the variation of mark-compact algorithm we will use for our compiler. This algorithm consists of five phases:
- Mark, which identifies all heap objects which are still reachable.
- Forward, which determines new memory locations for all reachable heap objects (but does not yet move them).
- Update, which changes all pointers on the heap to refer to the new memory locations (even though the objects haven’t moved yet).
- Compact, which moves all reachable heap objects into their new locations (so that the updated pointers are now correct).
- Remark, which zeroes the GC words of reachable heap objects (to be ready for the next garbage collection pass).
There are more phases in this version of mark-compact than are strictly necessary, but this separation makes it easier to approach each phase one at a time.
Phase 1: Mark
The first phase of this algorithm requires us to identify which heap objects are reachable; any unreachable heap objects can be discarded. A heap object is reachable if
- There is a pointer to it on the stack, meaning that some parameter or local variable currently refers to it; or
- There is a pointer to it from another object on the heap, and that other heap object is reachable.
We can use a depth-first graph traversal algorithm similar to the depth-first search from CS35. Do not create your own stack data structure; instead, you can rely on the program call stack to perform your search recursively. Such a recursive approach has potential downsides in a language like C, but we’ll ignore them for this lab.
To find all of the pointers to our heap, we can just scan the stack from top to bottom.  Between start_of_stack and end_of_stack, our stack has the following kinds of information on it:
- Values in our binary representation
- Stack pointers (i.e. old ebpvalues)
- Return addresses (pushed by callinstructions)
On the other hand, our heap pointers have the following properties:
- Last two bits are 01
- Greater than or equal to start_of_heap
- Less than heap_cursor
Non-pointer values will not end in the bits 01, so we can easily avoid e.g. treating a boolean like a heap pointer.  Pointers onto our stack and pointers to code will not be within the bounds of our heap, so we can use start_of_heap and heap_cursor to ignore those values as well.
Using these observations, we will perform a depth-first traversal starting with each heap pointer on the stack.  We will mark the GC word of each object we reach during this traversal with a non-zero value (e.g. 0x00000001).  Since all GC words are 0x00000000 at the start of garbage collection, any heap object which still has a zero GC word after the mark phase is unreachable and can be destroyed.
Phase 2: Forward
After we have identified reachable heap objects, we will use the GC words to pick new locations for them. To do this, we will use the following algorithm:
- Create a new pointer next_heap_objectto the start of the heap.
- Create a new pointer next_live_destinationto the start of the heap.
- While the next_heap_objectpointer points to a location less than theheap_cursor:- Calculate the size of this heap object.
- If this heap object is reachable:
        - Set its GC word to the next_live_destinationpointer
- Increase next_live_destinationby the size of this object.
 
- Set its GC word to the 
- Increase the next_heap_objectpointer by the size of this object.
 
By the time we are finished forwarding, each reachable heap object’s GC word will contain a new heap address (in machine representation – it won’t have the 01 bits on the end).  Our goal is to move all of the heap objects to that location, but we have to make sure that every pointer is updated as well!
Phase 3: Update
Before we move the heap objects, we want to update any pointers to them. If we moved the objects first, we’d have no memory-efficient way to determine how to transform old pointers to new pointers. Before we move the heap objects, though, each heap pointer points almost exactly to the new memory address!
For instance, if we have a heap pointer value 0xfff78045, this refers to memory address 0xfff78044 (after we strip the 01 from the end).  Because every heap object’s GC word is four bytes from the beginning of the object, the new pointer to replace 0xfff78044 is at memory location 0xfff78048.  We can read that value, add 01, and use this result as the new pointer.
Since there may be heap pointers in the stack or within heap objects themselves, we must perform this update on all heap pointers we find in either of these places.  Note that this is not a depth-first search!  Instead, we can simply iterate over all stack values between start_of_stack and end_of_stack (non-recursively) and then iterate over all heap objects between start_of_heap and heap_cursor (as we did in the Forward Phase above).
Phase 4: Compact
Once we have updated all of the pointers, the task remains to move each heap object to its new location.  Once we have done this, all of the new pointers we set in the Update Phase will point to the same data as they did before the garbage collection began.  This phase is comparatively simple (if a little delicate): we must iterate over each object from start_of_heap to heap_cursor (as we did in previous phases), copying those objects into the memory location described by their GC words.
Phase 5: Remark
The garbage collection algorithm relies upon unreachable objects having the value 0x00000000 in their GC words.  To make sure this is true in the future, we must now set the GC word of each heap object back to zero.  We can use the same recursive depth-first traversal we used in the Mark Phase to do this.
Tidying Up
At the end of this process, our gc function will need to perform a few tasks:
- Set the heap_cursorto the end of the allocated heap memory. This way, future allocations can begin by using the memory that we’ve just freed up.
- Check to see if we have the desired amount of memory free.  The caller to gcpasses an argument indicating how much memory is needed; if we don’t have at least that much distance betweenheap_cursorandend_of_heapnow, then there just isn’t enough memory to do what the caller wants to do and we should stop the program with an “out of memory” error.
- Optional but recommended: mark all of the unused heap memory (between heap_cursorandend_of_heap) with some unused binary representation value (such as0xbadbadff). This will help you during debugging; if you see that value appear, you know that you’ve accessed uninitialized heap memory. A “real” garbage collector would not include this step – it’s a waste of runtime – but we can use it to diagnose problems during development.
Implementing the Garbage Snake Compiler
This implementation task is not exceedingly complex, but there are a lot of small details and moving parts to observe. When writing your garbage collector, try to move as slowly as possible! It takes a lot longer to find bugs in code like this, so it’s to your benefit to take extra time during implementation to try to create as few bugs as possible in your first draft.
Here’s a strategy you can use to guide your implementation:
- 
    Implement the tuple update language feature ( t[i] := v) if you haven’t done so already. Write a couple tests for it.
- 
    Modify your compiler to zero out unused stack memory. This will not involve significant changes to compiler.ml. After you are finished, your unit tests should all pass.
- 
    Modify your compiler to use the new heap layout that includes a GC word as the second word of each heap object. This will mostly involve editing the assembly code you generated for Egg-Eater and Foxsnake to allocate new indicates and use new memory addresses. Make sure to update print.cas well; the exercise of doing so will give you practice with the new heap layout. After you’re finished, all of your unit tests should run correctly. Make sure your unit tests are complete enough to find mistakes you might’ve made here: use asymmetric operations (like subtraction), use more than one parameter in a few functions, etc.
- 
    Add the global variable declarations for start_of_heap,end_of_heap,start_of_stack, andend_of_stackto your generated assembly code. Addgc.cto the list of compiled files inbuilder.mland make sure all of your tests still pass.
- 
    Set the global variables properly. This will require you to add a new parameter to the snake_mainfunction to represent the end of the heap as well as modify yoursnake_mainassembly code to copy the argument into theend_of_heapvariable. You should setstart_of_heap,end_of_heap,heap_cursor, andstart_of_stackat the beginning ofsnake_main. Theend_of_stackvariable will be set just before we call the garbage collector.You might also want to take this opportunity to set your heap to something small (like 64words) so it’ll be easy to look at the whole heap and test the garbage collector as you write it.Once you’ve made these changes, make sure your tests still pass. 
- 
    Modify your code for CTupleto call thegcfunction whenever there isn’t enough heap memory to create a new tuple. For now, leavegcas it is: whenever the garbage collector is called, your program will just stop with an “out of memory” error. Make sure to setend_of_stackbefore you push the argument forgconto the stack. See below in this lab write-up for hints as to how to test your garbage collector. Your previous unit tests should all pass.
- 
    Likewise modify your code for CAppl, calling thegcfunction whenever a new closure must be constructed but there isn’t enough memory to do so. Again, make sure to setend_of_stackand see below for some hints on testing. Your unit tests should still pass.
- 
    Implement the mark-and-compact garbage collector. You should implement and then manually test each phase of the collector individually. It’s pretty hard to write automated tests for the garbage collector, so we’ll have to stick to manual tests until you have a working first draft, but you can use gdband other debugging tools (see below) to determine if e.g. your Mark Phase is working before you write the other phases.You’ll want to write several helper functions for your garbage collector. It’s helpful, for instance, to have a function which determines the size of a heap object given a pointer to it. You could also write functions to perform each phase of garbage collection and even break the recursive phases down into more helpers. Don’t try to write your garbage collector as one giant C function! 
- 
    Write unit tests to verify that your garbage collector is working as intended. Cases you’ll want to test include: - Allocating and deallocating a lot of memory to make sure the garbage collector runs and frees up memory correctly.
- Allocating and deallocating even more memory to make sure the garbage collector runs more than once in the execution of a given program (to verify that GC invariants are correctly re-established by the e.g. Remark Phase).
- Allocating a lot of memory but not deallocating it to make sure the garbage collector terminates the program with an “out of memory” error when appropriate.
- Doing all of the above with both tuples and closures to make sure your code works for both cases.
- Trigger garbage collection when at least one cycle exists in the heap.
- Increasing your heap size in driver.cback up to at least1048576words to make sure your garbage collector works well on larger heaps.
 Again, see below for some hints about testing. 
Once you’ve completed the above steps, you’ll have a garbage collector for your programming language’s runtime!
Testing the Garbage Collector with Garbage Snake Code
To test our garbage collector, we will need to write Garbage Snake code that uses up memory in specific patterns. In particular, we want to be able to either allocate and hold a lot of memory or be able to allocate and release a lot of memory; we also want to be able to do so with tuples, closures, or both. Here are some suggestions for how to do so.
- 
    The following code allocates lots of memory in the form of tuples but then releases that memory very quickly. The cycle_tuple_memoryfunction allocates one tuple at each call and will make \(O(2^n)\) calls, where \(n\) is the argument initially passed to it. The allocated tuples are kept on the stack and not passed between calls, however, and so at most \(n\) of them will be reachable at one time. You can use this code to trigger your garbage collector in a situation where you should not run out of memory.def cycle_tuple_memory n = let x = (4, 5) in if n < 1 then 1 else cycle_tuple_memory (n-1) + cycle_tuple_memory (n-1) end cycle_tuple_memory 20
- 
    The following code allocates lots of memory in the form of tuples and then holds that memory. The use_tuple_memoryfunction allocates one tuple at each call and builds a tree of those tuples. The tree will be a full binary tree of height \(n\), meaning that it will have \(O(2^n)\) nodes, all of which will be reachable (even though the stack only grows to at most \(O(n)\) stack frames at a time). You can use this code to trigger your garbage collector in a situation where you should run out of memory.def use_tuple_memory n = if n < 1 then false else (use_tuple_memory (n-1), use_tuple_memory (n-1)) end use_tuple_memory 20
- 
    The following code allocates lots of memory in the form of closures and then releases that memory. It uses the same strategy as the cycle_tuple_memoryfunction but builds closures instead of tuples.def f x y z = z end def cycle_closure_memory n = let c = f 4 5 in if n < 1 then 1 else cycle_closure_memory (n-1) + cycle_closure_memory (n-1) end cycle_closure_memory 20
- 
    The following code allocates lots of memory in the form of closures and holds that memory. It uses the same strategy as the use_tuple_memoryfunction but builds closures instead of tuples.def f x y z = z end def use_closure_memory n = if n < 1 then false else f (use_closure_memory (n-1)) (use_closure_memory (n-1)) end use_closure_memory 20
To create both types of memory, you could write a program which mutually recurses between creating tuples and closures.
Debugging the Garbage Collector
The initial gc.c file includes a macro called debugf which will help you debug your mark-compact garbage collector.  The debugf macro works just like the C printf function: it accepts a format string and a series of values and prints them.  For instance, you can write
debugf(
  "Marking heap object at memory location %p: writing %d to address %p",
  heap_ptr, mark_value, gc_word_ptr);
The value of debugf over printf is that the gc.c file contains two C preprocessor macros, one of which is commented out, that control whether debugf does anything or not.  You can have the second macro uncommented:
// This macro disables all debugf statements.  (They become no-ops.)
// #define debugf(fmt, ...) ;
// This macro enables all debugf statements.  (They become printf statements.)
#define debugf(fmt, ...) printf(fmt, ##__VA_ARGS__); fflush(stdout)
This will cause debugf to behave just like printf.  This is useful because you can have print debugging as your garbage collector runs, but it will break all of your unit tests (since they don’t expect to get the debugf output).  When you want to run your unit tests, use the first macro (as in the gc.c you are given in the starter code) like so:
// This macro disables all debugf statements.  (They become no-ops.)
#define debugf(fmt, ...) ;
// This macro enables all debugf statements.  (They become printf statements.)
// #define debugf(fmt, ...) printf(fmt, ##__VA_ARGS__); fflush(stdout)
This will cause every debugf call to do nothing.  You don’t need to comment out those debugf calls or do anything to them; changing the macro as above is enough to silence all of this output.  This way, you can switch to silent debugf to run your unit tests and then switch back to noisy debugf to debug your garbage collector.
Examining the Heap
gdb provides a few mechanisms for examining heap memory.  While debugging gc or one of the helper functions you will write, you can use gdb commands like print/x start_of_heap to print the start_of_heap variable in hexadecimal.  Sometimes, however, you want to see the entire heap.  The provided gc.c file contains a function dump_heap which will show the entire contents of the heap (as described by start_of_heap and end_of_heap).  This function uses debugf and so can be silenced using the macros above.
You may wish to consider dumping the heap before and after each phase of the garbage collector or even e.g. after you move each heap object. This combined with step debugging should provide you ample tools to understand what your garbage collection code is doing and why. As always, please ask if you have any questions or need help figuring out your bugs!
Some Notes About C
We are writing our garbage collector in C to avoid the error-prone and tedious nature of writing large amounts of assembly code. But while our assembly code has been able to treat every pointer as a simple number, C has a type system which assigns different meaning to pointer variables than to numeric values and which is insistent about keeping those types separate. This is a boon when it helps us avoid simple mistakes in our code (like assigning an integer to an integer pointer variable by accident) but a problem when we’re doing complex things with our own garbage collector (like assigning an integer to an integer pointer variable on purpose).
Here are some notes about C which may be helpful as you develop your garbage collector:
- 
    C has a built-in type named unsigned int. For instance, you can declareunsigned int n = 0;This variable is interpreted as an unsigned value: the highest bit is just another digit rather than a sign bit. This is useful if you are e.g. trying to check or mask off the bit which indicates whether a heap object is a tuple or a closure, since you don’t want to think of that bit as a sign bit. 
- 
    The &operator in C is a binary and operator; it translates to theandinstruction on x86 architectures, for instance. We can use this to mask and check bits. For instance,unsigned int value = ...; // some value in our binary representation unsigned int tag_bits = value & 0x3; // the last two bits of the valueThe &operator has very low precedence. This means you’ll want to use it in parentheses if you put it in a condition:unsigned int value = ...; // that value again if ((value & 0x3) == 0x1) // if the value is tagged as a heap pointer { ... }
- 
    C has a syntax for casting values. A cast forces the compiler to interpret a value in a particular way even if the type system disagrees. For instance, we may have a pointer to stack memory that we want to examine. Consider the following: unsigned int* stack_memory_ptr = ...; unsigned int stack_value = *stack_memory_ptr; // dereference pointer if ((stack_value & 0x3) == 0x1) { // then this is a heap pointer! unsigned int* heap_ptr = (unsigned int*)stack_value; ... }The above code forces the contents of the stack_valuevariable to be interpreted as anunsigned int*value. The compiler would normally raise an error here; the cast forces the compiler to allow us to reinterpret the value this way.
- 
    C has a concept of pointer arithmetic: addition, for instance, works differently on pointers than on values. For instance, consider this code: unsigned int x = 4; // the number 4 unsigned int* y = (unsigned int*)x; // the memory address 4 x++; // increase x to 5 y++; // increase y by the size of one unsigned integer (4 bytes) printf("%d, %p\n", x, y); // prints "5, 0x00000008"This can be convenient, since it allows you to move a pointer one word at a time. For instance, the following code could loop over your stack memory: for (unsigned int* p = (unsigned int*)end_of_stack; p < start_of_stack; p++) { ... }Note that we are starting at the end_of_stackand going up to thestart_of_stacksince the end of stack memory (top of the stack) is at a lower memory address than the beginning of stack memory (bottom of the stack). We would not need such a reversal for the heap memory. Here,p++increases the pointer by the size of anunsigned int, which is 4 bytes in our case as we are targeting 32-bit x86.
- 
    The C standard library has two functions for copying large amounts of memory: memcpyandmemmove, both included fromstring.h. For instance, I might copy 20 words fromsrc_ptrtodst_ptrby writingmemmove(dst_ptr, src_ptr, sizeof(unsigned int)*20);Note the following details: - The destination comes first and the source comes second.
- The third argument is the number of bytes to copy; to copy 20 words, I can determine the number of bytes in an unsigned int(assuming 32-bit x86) and then multiple.
- The name memmoveis a misnomer: the source memory is not really “moved” any more than themovinstruction “moves” a value.
 You will want to use memmovefor this lab; never usememcpy. The difference between these two functions is thatmemmoveis guaranteed to work correctly in the event that these memory locations overlap whilememcpyis not. For instance, suppose I have a dynamic array pointerunsigned int* pof lengthint lengthand I want to shift all of the elements from index 1 to the end to the left; that is, the array[4, 6, 8, 10]would become[6, 8, 10, 10]. The following code will work:memmove( p, // destination: start of array p p + 1, // source: one element from the start of p sizeof(unsigned int)*(length-1)); // size: one word for each element to copyBecause the source and destination regions overlap, however, memcpyis not guaranteed to work correctly. The only advantage ofmemcpyis that it might be faster if the regions don’t overlap. Just usememmoveeverywhere and avoidmemcpyentirely for this lab. You can also just write a loop and copy things yourself if you like.
Summary
To complete this assignment, you’ll need to
- Add tuple mutation to your compiler
- Implement a mark-compact garbage collection algorithm and add it to your compiler
Of course, the second part is most of the work.
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!
