Compilers

Dove

a ring-necked dove
Ring-Necked Dove

Due on Wednesday, March 6th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

This lab will extend the Cardinal compiler to a new language called Dove. Dove includes first-order functions (similar to C functions) and function calls. It also adds some compile-time error checking related to functions.

Transitioning to Dove

As usual, you will first change to the dove branch, where your Dove starter files are located:

git fetch
git checkout dove

You will then merge your work on the Cardinal compiler into that branch:

git merge cardinal

The changes for Dove are contained in two locations:

Note that, this time, you will not be able to build your compiler after the merge. This is because the Dove compiler’s parser produces a program instead of an expr. This is expected.

Your first task will be to update compile_program and compile_to_assembly_code to take program as an argument. For now, just make your code ignore the list of declarations in the program value and compile the expression it contains. Afterwards, you can make clean, make tests.byte, and run ./tests.byte to ensure that everything is working again.

The Dove Language

In the Dove language, programs are a list of declarations followed by an expression. The expression is essentially the “main” part of the program; the declarations allow you to create functions that expressions can call. Below, we discuss the syntax and semantics for these declarations.

Concrete Syntax

Here, we use a common convention that our language starts at the first non-terminal – program – below. The concrete syntax of Dove is:

<program> ::=
  | <declaration_list> <expression>
  | <expression>

<declaration_list> ::=
  | <declaration> <declaration_list>
  | <declaration>

<declaration> ::=
  | def <identifier>(<parameter_list>) <expression> end
  | def <identifier>() <expression> end

<parameter_list> ::=
  | <identifier> , <parameter_list>
  | <identifier>

<expression> ::=
  | true
  | false
  | <integer>
  | after(<expression>)
  | before(<expression>)
  | print(<expression>)
  | isbool(<expression>)
  | isint(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | <expression> < <expression>
  | <expression> > <expression>
  | <expression> = <expression>
  | <expression> && <expression>
  | <expression> || <expression>
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | ifnz <expression> then <expression> else <expression>
  | if <expression> then <expression> else <expression>
  | <identifier>(<argument_list>)
  | <identifier>()
  | (<expression>)

<argument_list> ::=
  | <expression> , <argument_list>
  | <expression>

Syntactic precedence is the same as in Cardinal.

Abstract Syntax

The abstract syntax of Dove has additional types: declaration and program. Make sure to examine src/language/asts.ml to see their definitions. Each of these types correspond to their like-named non-terminals in the grammar above.

Semantics

Function declarations in Dove have semantics much like those of Python or C. When a function is called, each argument is evaluated from left to right. The values of those arguments are stored in the parameter variables of the function and the body of the function is executed. The result of the function call is the value returned by the expression in the function body. You must generate code that conforms to the C calling conventions.

Here are some examples of Dove code:

Compile-Time Errors

The introduction of function definitions and function calls also introduces several new forms of mistake that a programmer can make which we can detect at compile time. Your Dove compiler will be required to detect and handle several classes of errors:

Report all the errors!

Unlike our previous compilers, the Dove compiler must report all errors for a given file. For instance, the code

def f(x)
  g(y,z)
end

f()

contains four errors: an unbound variable y, an unbound variable z, an undefined function g, and an argument count error where f is called. The compiler must report all of these errors, like so:

$ ./hatch.byte above_code.bird
Unbound variable y.
Unbound variable z.
Function g is not defined.
Function f is called with an incorrect number of arguments.
$

The order of the errors is not relevant, but each error must appear on its own line.

To do this, you will need to write compile-time error checks for your AST and use them before proceeding with compilation. Note that you will not be able to rely upon your previous code for detecting unbound variables; that code throws an exception when it encounters a single variable and does not take any other errors into account. Instead, you will need to recurse over the AST and collect all of the errors it contains.

A small stub is provided in wellformedness.ml to give you an approach to use. If you use this approach, you will need to modify the compile_to_assembly_code function in builder.ml (not your own version in compiler.ml) so that it catches the IllFormed exception and throws a BuildFailure with the appropriate text. The command-line-handling code in hatch.ml will catch the BuildFailure exception and print the message it contains.

Runtime errors

You should report the same runtime errors that you did in the Cardinal lab, using the same exit codes:

Implementing the Dove Compiler

Here’s a rough outline of how to approach this lab:

  1. Update your code so that your Cardinal unit tests pass. Initially, ignore the declarations that appear in the program so you can get your unit tests working again.

  2. Add function declarations and function calls. Don’t worry about error handling for now; only test good programs. As you write your code, add thorough unit tests for a variety of cases. For instance, make sure to test calling functions with zero, one, or more parameters, especially in cases where the order of the arguments matters (e.g. subtraction). Use what you learned about C calling conventions in the Cardinal lab to implement your function calls and returns correctly.

  3. Add error checking. Write unit tests for your compile-time errors. Be sure to be thorough.

C Calling Conventions

There is one aspect of the C calling conventions which we ignored in Cardinal: callee- and caller-saved registers. After a C function call, the called function may have changed a number of registers. The C calling conventions specify a contract about which of those registers may have changed and which must not have.

The caller-saved registers are those registers which the called function may change arbitrarily. If the caller cares about the values in caller-saved registers, then the caller must save those values and restore them later. If the caller does not do this, the values may be lost. Those registers are eax, ecx, and edx.

The callee-saved registers are those registers which the called function must preserve. If the called function changes those registers, their values must be restored by the end of the function call. The caller may assume that those registers contain the same values that they contained when the call was made. Those registers are ebx, esi, and edi.

Below is a complete description of the C calling conventions with the relevant parts highlighted in bold text:

Strictly speaking, the callee may save callee-saved register values at any point as long as the stack is returned to the same condition it was in when the call was made. Inserting the save and restore and the points indicated above, however, ensures that you do not have to change your logic for allocating temporary variables.

OCaml Data Structures

You may find that, in addition to the StringMap module in utils.ml, you need a set of strings. You can create such a module in a similar fashion. Feel free to modify utils.ml to include something like the following:

module StringSet = Set.Make(
  struct
    type t = string
    let compare = Pervasives.compare
  end
  );;

The StringSet module will then contain functions for using sets in OCaml (e.g. StringSet.union). Just like OCaml’s dictionaries, these sets are immutable and all modifying functions derive a new set. StringSet isn’t necessary for completing this lab, but it might be a useful tool.

Summary

To complete this assignment, you’ll need to

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit on the appropriate branch as of the due date will be graded. For more information on when assignments are due, see the Policies page.

After each lab, you will complete a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!