Compilers

Bluebird

bluebird
Bluebird

Due on Wednesday, February 13th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, we will extend the Auklet compiler to a new language called Bluebird. The Bluebird language includes the features of the Auklet language as well as let bindings and conditional expressions. Since we are extending your previous compiler, you will be using the same Git repository as you did for the previous assignment.

Note that you need to have finished Auklet before working on this assignment!

Transitioning to Bluebird

To transition between assignments, we will use Git branches. A branch names a point in time in your repository; your auklet branch, for instance, points to the last commit that you made when working on the Auklet compiler. Branches are often used to keep track of multiple lines of development in a repository; a project may have a “stable” branch and an “experimental” branch, for example, where most development is done in the latter and distributions of software are done in the former.

We will use branches in this course to keep track of the different versions of your compiler. Your Bluebird starter code is in a branch called bluebird which has been stored in your GitHub repository. First, you need to get the information about that branch by running

git fetch

You can then switch to the bluebird branch by running

git checkout bluebird

in your compiler directory. This causes Git to change all of your files to the most recent commit on the bluebird branch; this branch exists because your instructor created it for you. Of course, bluebird currently contains only the starter files; it doesn’t contain any of the work you did to complete the Auklet compiler. To bring that work into the bluebird branch, we can now run

git merge auklet

This will combine all of the changes in auklet (the work you did on the last assignment) with the starter code in bluebird. Once you’re finished, you’ll be on the bluebird branch and you’ll have all of your Auklet code. At this point, you should be able to make clean, make tests, and run ./tests.byte without seeing any difference in how Auklet worked. (You will, however, see some compiler warnings.)

This merge won’t change a lot. Most notably, it updates src/language/asts.ml file now contains the Bluebird AST constructors and the corresponding parser reads Bluebird syntax.

The Bluebird Language

As in the previous lab, we introduce the language you will compile in three parts: the concrete syntax, the abstract syntax, and the semantics. The Bluebird language extends Auklet to include let bindings and ifnz expressions with similar syntax to OCaml.

Concrete Syntax

The concrete syntax of Bluebird is:

<expression> ::=
  | <integer>
  | after(<expression>)
  | before(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | ifnz <expression> then <expression> else <expression>
  | (<expression>)

Syntactic precedence is as in OCaml. To clarify: multiplication has higher precedence than addition and subtraction, which have higher precedence than let. For instance, the expression ifnz x then 4 else 2 + 6 parses as ifnz x then 4 else (2 + 6); the expression (ifnz x then 4 else 2) + 6 must be written with parentheses to have that meaning.

Abstract Syntax

The abstract syntax of Bluebird is still defined in src/language/asts.ml. It now includes a few more constructors: one for variables and the other for let expressions. Consider the following examples:

Concrete syntax Abstract syntax Evaluated result
let x = 5 in before(x) ELet("x",EInt(5),EUnaryOp(OpBefore,EVar("x"))) 4
let x = (let y = 4 in after(y)) in after(x) ELet("x",ELet("y",EInt(4),EUnaryOp(OpAfter,EVar("y"))),EUnaryOp(OpAfter,EVar("x"))) 6
let x = 1 in ifnz x then 4 else 8 ELet("x",EInt(1),EIfNonZero(EVar("x"),EInt(4),EInt(8))) 4
let x = after(4) in y ELet("x",EUnaryOp(OpAfter,EInt(4)),EVar("y")) compile error

In the last example, the variable y is unbound. In this case, your compiler must throw an exception of type UnboundVariable, which is an exception type defined in compiler.ml. See below for more information.

Semantics

Bindings

As in Auklet, every Bluebird expression evaluates to a single number and we will use the driver to print that number. The unary, binary, and constant expressions have the same meaning as before. Let bindings have the same meaning as in OCaml: the first expression is evaluated and bound to a variable; the second expression is then evaluated with that binding.

Bindings are only visible in the latter part of the let expression in which they are defined. For instance, consider:

let x =
  let y = 5 in
  y
in
x + y

While this Bluebird code will parse correctly to produce an AST, it is meaningless; we can’t translate it to assembly language. In x + y, the variable y is undefined; this is because the let y expression only binds y in its own in expression; it does not bind it elsewhere in the code. As stated above, an UnboundVariable exception is required here.

Conditionals

Bluebird supports conditions similar to C: for the purpose of an ifnz expression, any non-zero value is true while zero is false. For instance, the Bluebird expression

ifnz 0 then 1 else 2

evaluates to 2 because the condition was 0. The expression

ifnz 5 then 4 else 8

evaluates to 4 because 5 is non-zero.

Implementing the Bluebird Compiler

Bindings

In order to implement let bindings, we’ll need a place to store the contents of variables. We already have the environment type which stores the locations of temporary data; we can use it to store variables as well. For Auklet, our environment type was simply an int giving the address of the next free stack memory location. For variables, we’ll need something a bit more complex (since we need to be able to look variables up again in the future).

The file utils.ml contains a definition of a module StringMap. This module defines a dictionary type in which the keys are strings; since variable names are strings, the StringMap should be appropriate for tracking where each variable wound up on the stack. We can change environment.ml to something like this:

type environment = int StringMap.t * int;;
let empty_environment = (StringMap.empty, 4);;

The int still contains the next available stack address like before. The int StringMap.t type is a dictionary mapping string keys to int values; it will map the name of each variable to the location in stack memory at which we have placed it.

You’ll also want to create a couple of functions: one which binds a new variable to the next free location (named e.g. environment_add_var) and another to get the location where a variable is stored (named e.g. environment_get_var). You can use these functions in the compile.ml file to implement the two new forms of expression.

Note again that you are not required to use this environment type. This type is simply recommended because it carries all of the information you need.

Conditionals

Implementing ifnz expressions will require that you extend your compiler’s model of assembly language. So far, we have relied upon the add, sub, and imul instructions; we’ll now need some more instructions to support conditionals. Again, the Guide to x86 Assembly from the University of Virginia may be helpful here. You’ll probably want to the following constructors to your instruction type:

When you add these instructions, you’ll have to add them to your code_of_instruction function as well.

Recall that, in x86 assembly, we use cmp to compare two arguments and then decide what to do about it as a separate instruction. For instance, the code ifnz 6 then 4 else 8 might be written in x86 assembly like so:

  mov eax, 6
  cmp eax, 0
  je label_else
  mov eax 4
  jmp label_endifnz
label_else:
  mov eax 8
label_endifnz:

Here, the je will jump when the comparison succeeds (meaning that the conditional was 0), leading 8 to be stored in eax. If the comparison succeeded (and the conditional was non-zero), the je instruction would have no effect and the code would keep running, storing 4 in eax. Then, it would jump to the end of the instructions shown here, which prevents the 8 from overwriting the 4.

Label Names

Each time you compile a conditional expression, you’ll need to pick unique label names. While this is possible in a pure functional style, it’s quite tedious. Instead, your support code for the Bluebird compiler includes a file freshening.ml with a function fresh_name that can provide you with unique names. For instance, you might write

open Freshening;;

...
    let name = fresh_name "else" in
    ...

which might assign the fresh_name variable the value "else_4" or "else_200" or something similar. This name is guaranteed to be distinct from any name you’ve gotten from a previous call to fresh_name; the string you provide is simply for convenience, as it might help you debug the output.

Testing the Bluebird Compiler

You are not required to write tests for this lab, but you definitely should! You still have tests from your first lab, so you can use those to test the Auklet language features. You’ll want to write some more tests for Bluebird to make sure that things are working correctly. A good way to deal with this is as follows: every time you feel like you’ve finished a new feature, write a test for it. Every time you find a bug, write a test that fails because of the bug and then fix the bug afterward. If you do this, you’ll (1) have a good way of running your code as you fix your bugs and (2) quickly build up a library of tests so you don’t have to worry about breaking things in the future.

Since there’s a new kind of behavior – compilation failure due to unbound variables – you’ll want to write some tests for that. The test_utils.ml file includes a function test_compile_failure which takes a filename and expected output and verifies that the compiler fails with the specified output. For instance, suppose the file unbound_x.bird contained

x

Then the following test should pass because the compiler will fail appropriately:

test_compile_failure "test_code/unbound_x.bird" "Unbound variable x with environment {}"

When you run your unit tests, you’ll also be testing your Auklet features too. This is intentional; it helps to be sure you haven’t broken anything!

Summary

To complete this assignment, you’ll need to

Unit tests are not listed on the bullet points above, but you should think of them as a natural part of the process.

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit on the appropriate branch as of the due date will be graded. For more information on when assignments are due, see the Policies page.

After each lab, you will complete a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!