Compilers

Boa

boa
Boa

Due on Monday, February 13th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, we will extend the Adder compiler to a new language called Boa. The Boa language, in addition to the features of Adder, includes binary operators and conditional expressions. To implement Boa, we will use the strategy of A-normalization that we discussed in class. Since we are extending your previous compiler, you will be using the same Git repository as you did for the previous assignment.

Transitioning to Boa

You will find that some new files have been added to your repository. In order to see these files, you should git pull. They are:

The Makefile for your project has already been configured to allow you to switch the AST, parser, etc. to the Boa language. Just open the file language.mk and change the LANGUAGE variable in that file from adder to boa. Then, run make clean.

If you now run make, you will discover many compiler warnings; this is because we have not yet written code to handle the new forms of Boa AST. As long as make runs correctly, you are now ready to work on your assignment.

The Boa Language

As in the previous lab, we introduce the language you will compile in three parts: the concrete syntax, the abstract syntax, and the semantics. The Boa language extends Adder to include both binary operators (such as +) and if-then-else expressions similar to OCaml’s.

Concrete Syntax

The concrete syntax of Boa is:

<expression> ::=
  | <integer>
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | if <expression> then <expression> else <expression>
  | inc(<expression>)
  | dec(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | (<expression>)

Syntactic precedence is as in OCaml. To clarify: multiplication has higher precedence than addition and subtraction, which have higher precedence than if-then-else and let. For instance, the expression if x then 4 else 2 + 6 parses as if x then 4 else (2 + 6); the expression (if x then 4 else 2) + 6 must be written with parentheses to have that meaning.

Abstract Syntax

The abstract syntax of Boa has two parts: the surface syntax (corresponding to the concrete syntax above) and the A-normal form. Both are defined in the file src/language/boa/expressions.ml. The expr data type defines the abstract syntax for the surface language. The A-normal form is defined in terms of three other types: a_expr (for ANF expressions), c_expr (for complex expressions), and i_expr (for immediate expressions). Each of the four expression data types is prefixed with a unique letter (E, A, C, or I) to make it easy to see the type to which they belong.

show Functions

In examining the ASTs, you will likely notice an OCaml syntax we did not discuss. The unary_operator type, for instance, now looks like this:

type unary_operator =
  | OpInc
  | OpDec
[@@deriving eq, ord, show]
;;

The [@@deriving ...] part of the declaration is called a PPX extension and works sort of like a template: it represents code that will be generated for you by the compiler. This PPX extension doesn’t affect the meaning of the type and you can ignore it if you wish.

The important part for you is that this PPX extension generates a show function for the type. For instance, the above declaration creates a function named show_unary_operator with type unary_operator -> string, which means you can write e.g.

print_endline (show_unary_operator OpInc)

and the string “OpInc” will be printed. This will be especially helpful in your unit tests. We didn’t use this extension for assembly code because the output is always in an OCaml-like form, which doesn’t match the concrete syntax for x86 assembly. It is useful for debugging and testing purposes, though!

Mutually Recursive Types

Careful readers may notice the use of and in the A-normalized expression types:

type c_expr =
  ...

and a_expr =
  ...
;;

In OCaml, this allows the two types to refer to each other; that is, these are mutually recursive types. This allows c_expr to refer to a_expr, which otherwise wouldn’t be allowed.s

Semantics

As in Adder, every Boa expression evaluates to a single number and we will use the driver to print that number. The inc, dec, let, variable, and constant expressions have the same meaning as before. Binary operators evaluate their arguments and then compute a result (e.g. + adds numbers together). Conditionals in Boa operate as they do in C: the conditional part is evaluated, followed by either the then part (if the conditional evaluated to non-zero value) or the else part (if the conditional evaluated to 0) but never both.

Examples

Here are some examples of Boa programs and how they run.

Implementing the Boa Compiler

There are two major steps to this assignment:

  1. Write code to A-normalize a Boa expression.
  2. Update compiler.ml to compile ANF expressions and include functionality for the new features of Boa.

Below are some tips to bear in mind as you proceed.

A-Normalization

Your task in aNormalization.ml is to implement the a_normalize_expression_inner routine. Make sure to read the other code in this file, though; you’ll need to use it! Also, make sure to look at freshening.ml so you can make use of it to create new names for your variables.

Because Merlin relies upon compiled versions of your code to produce error messages and because none of your code is using this file yet, you may initially have some difficulty getting correct error messages for aNormalization.ml. One quick way to fix this is to put open ANormalization;; in your compile.ml and then run make.

New Assembly Instructions

Your next goal is to update compiler.ml, but we won’t be able to add the new features of Boa without improving our model for x86 assembly language. Again, a Guide to x86 Assembly from the University of Virginia may be helpful here. You’ll probably want to the following constructors to your instruction type:

When you add these instructions, you’ll have to add them to your code_of_instruction function as well.

Recall that, in x86 assembly, we use cmp to compare two arguments and then decide what to do about it as a separate instruction. For instance, the code if 6 then 4 else 8 might be written in x86 assembly like so:

  mov eax, 6
  cmp eax, 0
  je label_else
  mov eax 4
  jmp label_endif
label_else:
  mov eax 8
label_endif:

Here, the je will jump when the comparison succeeds (meaning that the conditional was 0), leading 8 to be stored in eax. If the comparison succeeded (and the conditional was non-zero), the je instruction would have no effect and the code would keep running, storing 4 in eax. Then, it would jump to the end of the instructions shown here, which prevents the 8 from overwriting the 4.

As you compile your code, you will need to generate new names for each label. We can use the same freshening function that we used in the previous part to do this.

Updating Your Compiler

Now we need to put it all together. A-normalization is a pretty significant change, so we’ll have to reorganize things quite a bit in comparison to Adder. That said, you can use your Adder implementation as a reference. Start by commenting out the compile_expression_with_environment function.

We want new functions to compile each of the A-normalized types of our code, so let’s start by writing stubs for them (so we can get the types right and compile a bit at a time). You can use the following stubs to get started:

let rec compile_i_expression (env : environment) (ie : i_expr) : argument =
  failwith "TODO"

and compile_c_expression (env : environment) (ce : c_expr) : instruction list =
  failwith "TODO"

and compile_a_expression (env : environment) (ae : a_expr) : instruction list =
  failwith "TODO"
;;

Note the use of and and the missing ;; in these declarations. This is because we are defining these functions to be mutually recursive (in a fashion similar to prototyping in C). You can call any of these functions from within the others.

Next, we need to update the compile_program function. You should change it so that it A-normalizes the code first and then uses compile_a_expression to compile it into assembly.

Finally, you need to fill in the stubs. Proceed one feature at a time just like you did with Adder: first, get integers working. Then, rewrite inc and dec. Move on to variables, let and if expressions, and binary operations after that. Make sure to write tests as you go! This is where the commented-out version of your compile_expression_with_environment function from the previous assignment will come in handy.

Testing the Boa Compiler

You are not required to write tests for this lab, but you definitely should! You still have tests from your first lab, so you can use those to test the Adder language features. You’ll want to write some more tests for Boa to make sure that things are working correctly. A good way to deal with this is as follows: every time you feel like you’ve finished a new feature, write a test for it. Every time you find a bug, write a test that fails because of the bug and then fix the bug afterward. If you do this, you’ll (1) have a good way of running your code as you fix your bugs and (2) quickly build up a library of tests so you don’t have to worry about breaking things in the future.

The freshening.ml module may behave somewhat strangely in your unit tests. The counter starts at 0 when the OCaml program starts and doesn’t reset unless you call reset_fresh_names (). You can prevent your unit tests from using weirdly large numbers (or numbers that change because of other unit tests!) by adding the line “reset_fresh_names ();” to the functions in testUtils.ml (immediately after the fun _ -> part). This will ensure that each test starts with names beginning at one.

Summary

To complete this assignment, you’ll need to

Unit tests are not listed on the bullet points above, but you should think of them as a natural part of the process.

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!