Compilers

Foxsnake

foxsnake
Foxsnake

Due on Monday, April 3rd at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, we will extend the Egg-eater compiler to a new language called Foxsnake. Foxsnake introduces first-class functions, partial application, and anonymous functions.

Transitioning to Foxsnake

When you git pull, you’ll discover the following updates:

As usual, we will begin by changing the LANGUAGE variable in language.mk from eggEater to foxsnake. Once you have done so, your code will not compile! We will explain this below.

The Foxsnake Language

The syntax of Foxsnake function declarations and function calls is incompatible with the Diamondback syntax and the Foxsnake abstract syntax reflects this. We first present the syntax and semantics of Foxsnake and then discuss how to update the Egg-eater compiler incrementally to accommodate these changes.

Concrete Syntax

Foxsnake’s concrete syntax includes all of the features of Egg-eater, but function declarations and calls are written differently. Foxsnake also includes anonymous function syntax. For clarity, we present the full Foxsnake syntax here:

<program> ::=
  | <declaration_list> <expression>
  | <expression>

<declaration_list> ::=
  | <declaration> <declaration_list>
  | <declaration>

<declaration> ::=
  | def <identifier> <parameter_list> = <expression> end

<parameter_list> ::=
  | <identifier> <parameter_list>
  | <identifier>

<expression> ::=
  | <integer>
  | true
  | false
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | if <expression> then <expression> else <expression>
  | inc(<expression>)
  | dec(<expression>)
  | print(<expression>)
  | isbool(<expression>)
  | isint(<expression>)
  | istuple(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | <expression> < <expression>
  | <expression> > <expression>
  | <expression> = <expression>
  | <expression> && <expression>
  | <expression> || <expression>
  | <expression>[<expression>]
  | <expression> <expression>
  | (<expression>)
  | (<expression>, <expression_list>)
  | fun <identifier> -> <expression>

<expression_list> ::=
  | <expression> , <expression_list>
  | <expression>

The notable changes from Egg-eater are:

Abstract Syntax

As usual, you can see the definitions of expressions for Foxsnake in src/language/foxsnake/expressions.ml. Make sure to read over those definitions. In particular, note that EAppl now only carries two expressions; multiple arguments are passed by chaining EAppl constructors (e.g. EAppl(EAppl(EVar f, EVar x), EVar y)).

Semantics

Unlike our previous languages, function declarations in Foxsnake create new variables named after those functions. These variables are bound to closures which define the function and the arguments which have been passed to that function so far. For instance,

def f x y =
  x + y
end

let g = f in
g 2 4

evaluates to 6. The variable f is bound to the closure of function f with no arguments. These closures will be values in Foxsnake and, as such, can be assigned to other variables (such as g). Arguments can then be applied to the closure to call the function.

Foxsnake supports partial application, so the code

def f x y =
  x + y
end

let increment = f 1 in
(increment 3, increment 7)

evaluates to the tuple (4, 8); the inc closure carries the argument 1 for parameter x and this argument is used each time inc is called in the tuple.

Foxsnake’s anonymous functions work much as in OCaml. The above code could be rewritten

let f = (fun x -> fun y -> x + y) in
let increment = f 1 in
(increment 3, increment 7)

which would produce the same result.

While we will represent closures in a fashion similar to tuples, closures are not tuples; they are a new kind of value. That is,

let identity = fun x -> x in
istuple(identity)

will evaluate to false.

Errors

The introduction of higher-order functions causes some errors which are easily recognized by Egg-eater to become extremely difficult in Foxsnake. You are no longer required to recognize the following compile-time errors:

You are also now responsible for an additional type of runtime error:

For instance, exit code 6 would result from the code (1 + 2) (3 + 4) because the evaluation of the first expression, 3, is not a function. This runtime error should be reported in addition to the runtime errors from previous labs.

Implementing the Foxsnake Compiler

To implement higher-order functions in Foxsnake, we plan to modify the compiler to process the user’s program as follows:

  1. Parse the source file (as we have been doing)
  2. Check the well-formedness of the AST (with functions as simple variables now)
  3. Closure-convert the surface AST (to eliminate anonymous functions)
  4. A-normalize the resulting AST (as we have been doing)
  5. Compile the A-normalized AST
    • binding each function name to a global closure value
    • creating new closures as necessary

We will implement these stages in a different order in order to ensure that we can evaluate our progress as we go.

Getting Started

To begin your work on Foxsnake, you should get your code to a point where it compiles, even if this means losing some functionality. The following is recommended:

Once you have done this, all of your unit tests which don’t use function calls should be able to pass. You will eventually need to update the syntax of your test files to match the new syntax of Foxsnake.

Step 1: Closures

Once you’ve gotten your code to compile, you’re ready to add closures to your compiler. A closure must be able to store:

We will lay closures out in memory in that order. As with tuples, we will always store closures in the heap (since they require several bytes of memory). To distinguish closures from tuples, we will set the first bit of the first word of the closure to 1. Consider the following example:

(32 bits) (32 bits) (32 bits) (32 bits) (32 bits)
0x80000002 0x00000004 0x080484f8 0x7FFFFFFF 0x0000001e

This sequence of words represents a closure in the Foxsnake heap. They are:

Creating closures for declared functions

The first step toward implementing closures in Foxsnake is to create a set of global variables containing the zero-argument closures for the functions declared by the program. We are creating global variables here for two reasons:

In the Egg-eater lab, you created a single global label heap_cursor to track heap memory. You should now create an additional label for the closure of each declared function in the program. For instance, for the program

def f x =
  x + 1
end

def g x y =
  x * y
end

f (g 2 3)

the data section of your assembly might look something like this:

section .data
align 4
heap_cursor:
  dd 0
align 4
closure_of_f:
  dd 0x80000000, 1, _f
align 4
closure_of_g:
  dd 0x80000000, 2, _g

Note that these initial closures always carry zero arguments. The second word indicates the number of parameters the function declares while the third word is the label given to the user-declared function. The label, named by itself, refers to the memory address of the labeled code.

Checkpoint

Once you’ve completed the above steps, you should be able to write a program which declares a function but does not use it. Your compiler should work correctly on this program and produce a data section similar to the above.

Step 2: Binding Variables to Closures

Once you’ve declared the existence of these closures, your next step should be to have some means to refer to them. When a Foxsnake program refers to a function by its name (e.g. f), you should be able to retrieve that closure. Our compiler’s environment currently only has the ability to refer to stack offsets, so we will have to make it a bit more sophisticated.

Flexible environments

First, change your environment so that it contains a mapping from strings to arguments (instead of a mapping from strings to integers). You will need to update environment.ml and compiler.ml to do this. When you’ve finished, the code should behave in exactly the same way as it did before; you can use your unit tests to confirm this. The objective here is to ensure that the environment continues to have the same functionality but its data structure is now more flexible.

Our next step will be to add bindings for function closures. You’ll likely want to add a function to environment.ml to add an arbitrary string-to-argument binding to your environment to make this easier, but you don’t need to do it yet.

Bindings for closures

Next, we must ensure that each function declaration results in a single variable binding. So far, we’ve written our compiler to start with an empty environment in the main expression and an environment containing parameters for each function body expression. We will add to both of these environments a binding for each declared function. For instance, in the program

def f x =
  x + 1
end

f 0

the main expression should be compiled with an environment which binds f to its closure. For instance, the assembly code for evaluating f in the above program might be

mov eax, closure_of_f + 1

Note the use of + 1. The label closure_of_f should be aligned to a 4 byte boundary, so the last two bits of its address will be 00. If we add 1 to this, we get a value which conforms to the Foxsnake binary representation for a memory pointer!

To handle the above, you probably want to add a new assembly argument to assemblyLanguage.ml. The name XLabelLocationOffset is recommended as it is consistent with the previous names we have chosen.

In addition to updating compiler.ml with these new bindings, you will need to update wellformedness.ml so that it knows that these variable names are legal. Make sure to update both files before proceeding.

Checkpoint

Although you cannot yet call these functions, it should now be possible for you to produce the closure as a value. Write a program which declares a function f; then, allow the main expression for that program to be just f. Using the new version of printer.c you received as part of the starter code for Foxsnake, you should get a result that looks something like this:

<closure@080484f8>[0/1](?)

Step 3: Growing Closures

Once we have established closures in the compiler’s environment, we should work toward growing them: creating new closures which carry additional arguments. The compilation of CAppl should generate assembly code that follows approximately this algorithm:

The above algorithm handles the case of growing closures as well as the case of calling functions (which is step 4 below). For now, focus on the first part. For the case in which you would call the function, you can simply generate an appropriate label and no instructions.

The edx register

Implementing the closure-growing algorithm is a bit tricky with only two registers. Feel free to add edx to your registers data type and use it for this lab.

Copying memory with rep movsd

Writing a loop to copy memory in x86 assembly could be a bit tricky and rather error-prone. Thankfully, the processor comes with an instruction which makes it easy to copy several words of memory from one location to another.

To use this instruction:

The rep movsd instruction will proceed by copying one word from esi to edi, moving esi and edi one word forward in memory, decrementing ecx, and continuing until ecx is 0. For instance, consider the following code:

mov esi, ebp
add esi, 8
mov edi, [heap_cursor]
mov ecx, 3
rep movsd

At the start of rep movsd:

After rep movsd is complete:

To use this instruction, you can simply add an appropriate XRepMovsd constructor to your instruction type in assemblyLanguage.ml. You are not required to use rep movsd, but it will likely make your code much cleaner and easier to debug. It’s also considerably more efficient for the CPU.

Checkpoint

Once you’ve finished the code which grows closures, you should be able to see the closures growing for functions with several parameters. Try writing a program which declares a function f of six parameters. Then, call f with an argument and see if the result is an apporpriate closure. Call that closure with another argument to see if it gets copied in correctly as well.

Step 4: Calling Functions

Once you’ve managed to make your closures grow appropriately, move on to implementing the function-calling part of the algorithm above. Probably the trickiest part of this is to copy the arguments from the closure onto the stack in the right order, although this is easier than it seems.

Remember: arguments are pushed onto the stack in reverse order only because stack memory grows downwards; the first argument to a function is lowest in memory. The closures also keep their arguments in ascending memory order, so it is enough simply to copy those arguments from the closure onto the stack and then copy the last argument after them.

Checkpoint

At this point, your unit tests from Egg-eater should all pass! You’ll need to update the syntax of function declarations and function calls if you haven’t already done so. You’ll also need to change any tests that relied on zero-parameter functions. In those cases, just add a parameter that you don’t use and then call the function with a dummy value, like 0.

What’s more: you should now be able to pass functions as arguments to other functions! Try writing some programs which make use of your new compiler features. For instance, you could write an ntimes function that takes parameters f, n, and x and calls f on x a total of n times.

Step 5: Closure Conversion

The only remaining feature we must address is the anonymous function. We will convert anonymous functions like fun x -> x into named functions (of the form EFunction) by searching the AST for them, picking a fresh name to use, creating a declaration with that name, and replacing the anonymous function with a reference to that new name as a variable. For instance, we might transform

def twice f x =
  f (f x)
end

twice (fun a -> a * 2) 5

into

def twice f x =
  f (f x)
end

def $lambda$1 a =
  a * 2
end

twice $lambda$1 5

Non-Local Variables

When we closure-convert anonymous functions, we must be careful to handle non-local variables correctly. A non-local variable is one which is used within the function but not defined by it. For instance, in the program

let x = 2 in
(fun y -> x + y) 4

the variable x is non-local to the anonymous function. When we closure-convert this code, we will add the non-locals as parameters to the resulting declaration:

def $lambda$1 x y =
  x + y
end

let x = 2 in
$lambda$1 x 4

To do this correctly, you will want to implement a function for calculating the free variables in an expression. The non-local variables of an anonymous function are the free variables of its body minus the parameter it declares.

Some of the work of closure-converting Foxsnake programs is provided in the file closureConversion.ml. Your job is to address the other cases and then call upon this code in the appropriate place in your compiler.

Miscellaneous details

Make sure to update the code for istuple as well as the code that validates that an argument is a tuple for e.g. tuple indexing. The Egg-eater version of this code can simply check the tag bits of the argument, but Foxsnake uses the same tag bits for tuple pointers and closure pointers. You’ll need to check both the tag bits and the first bit of the heap value to determine if an argument is a tuple or not.

Checkpoint

At this point, we have added all three features of Foxsnake: first-class functions, partial application, and anonymous functions. Write some unit tests to be confident of your implementation and then congratulate yourself!

Summary

To complete this assignment, you’ll need to

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!