Compilers

Auklet

Auklet
Auklet

Due on Friday, February 8th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

This is the first graded lab in which you will produce a compiler. Your team name was sent to you in an e-mail prior to the start of this lab. Your compiler work will be done in a GitHub repository at the following URL:

git@github.swarthmore.edu:cs75-s19/compiler-<teamname>

You will be maintaining this repository throughout the course of the semester; all of your compiler lab assignments will be submitted by modifying this code and pushing your updates to the GitHub server. You are permitted to manage this repository in most any way you like so long as the command-line interface for using the compiler remains the same (though you might want to check with your instructor before making sweeping changes!). You are permitted (and encouraged!) to create new modules for helper functions, split the existing modules into multiple parts, or even change how your compiler is built.

Getting To Know Your Code

Your repository starts with several files to get you started. Since you’ll be in charge of this code, we’ll begin by introducing you to it. Some of the files are critical to the compiler and you’ll likely be changing them every week; other files contain boilerplate for e.g. running clang to link object files and you probably won’t need to change it at all. Here’s a summary of what the starter code does for you:

Building Your Code

The starter code is ready to be compiled. You can build your compiler just by typing

make

This will create a file hatch.byte that contains your compiler. You can use this compiler to compile some code by typing e.g.:

./hatch.byte test_code/4.bird

This will fail, since we haven’t written any of the important compiler functions yet.

In order to compile your tests, you can use the similar

make tests

which makes a test program called tests.byte. You can run this program just with

./tests.byte

which will report two failing tests because, again, we haven’t done any of the work yet.

The Auklet Language

Each compiler lab will introduce you to the language you will be compiling; most of these languages build upon the previous language. Our first language, Auklet, only has a few features: variables, integers, and very basic operations on them.

For each language, we must define

Concrete Syntax

The concrete syntax of Auklet is:

<expression> ::=
  | <integer>
  | after(<expression>)
  | before(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | (<expression>)

Abstract Syntax

The abstract syntax of Auklet can be found in the file src/language/asts.ml: it is the expr data type. Note that the abstract syntax does not have the parenthetical expression (<expression>) above. This is because parentheticals like this only affect the shape of the AST, not its contents, and so can be handled by the parser.

show Functions

In examining the ASTs, you will likely notice an OCaml syntax we did not discuss. The unary_operator type, for instance, looks like this:

type unary_operator =
  | OpInc
  | OpDec
[@@deriving eq, ord, show]
;;

The [@@deriving ...] part of the declaration is called a PPX extension and works sort of like a template: it represents code that will be generated for you by the compiler. This PPX extension doesn’t affect the meaning of the type and you can ignore it if you wish.

The important part for you is that this PPX extension generates a show function for the type. For instance, the above declaration creates a function named show_unary_operator with type unary_operator -> string, which means you can write e.g.

print_endline (show_unary_operator OpAfter)

and the string “OpAfter” will be printed. This will be especially helpful in your unit tests. We didn’t use this extension for assembly code because the output is always in an OCaml-like form, which doesn’t match the concrete syntax for x86 assembly. It is useful for debugging and testing purposes, though!

Semantics

An Auklet program always evaluates to a single number, which the driver will print. The inc and dec built-in expressions increase and decrease (respectively) their argument by one. let expressions bind a new variable to the result of an expression within another expression; the variable bound by the let is only in scope in the second expression. Here are some examples of Auklet programs:

Concrete syntax Abstract syntax Evaluated result
4 EInt(4) 4
before(after(after(7))) EUnaryOp(OpBefore,EUnaryOp(OpAfter,EUnaryOp(OpAfter,EInt(7)))) 8
4 + 2 EBinaryOp(OpPlus,EInt(4),EInt(2)) 6
before(5) - 4 EBinaryOp(OpMinus, EUnaryOp(OpBefore,EInt(5)), EInt(4)) 0

Implementing the Auklet Compiler

In the starter code,

Your primary task, then, is to add this functionality. You may do so in any way you see fit, but you are recommended to take the following path:

  1. Start in assemblyLanguage.ml for a warm-up. This file contains some data types describing the AST for a small subset of Intel x86 assembly language. There are five functions here which can turn the assembly language data types into strings of assembly code; they are used by the compiler to generate the file given to nasm. Start by implementing these functions. You can write tests if you like.

    Note that a Guide to x86 Assembly from the University of Virginia may be helpful here. In particular, the binary operator instructions will be add, sub, and imul. The mul instruction does not behave the way that add and sub do; the imul instruction will be much more familiar.

  2. Next, move to compiler.ml. The compile_expression function is used to transform an Auklet AST into a list of assembly instructions. Complete the generation of those instructions for the integer, after, and before expressions. For now, skip the other expressions. Your compiler should now work on programs with just integers, after, and before. Try it out!

  3. Once we have unary operators working, we need think about binary operators. As we discussed in class, binary operators have two subexpressions – the left subexpression and the right subexpression – and we need somewhere to store the result of the left subexpression while we are working on the right subexpression. To solve this problem, we will edit environment.ml.

    In this setting, the “environment” refers to where information will be stored when our compiled program is running. For instance, we need to be able to allocate a space to store the result of the left subexpression while we work on the right subexpression. To do this, you will want to write a function in environment.ml called e.g. environment_add_temp which takes in an environment and gives back two things: a new environment and the stack memory offset at which the temporary variable is stored.

    For this lab, the environment can simply be an integer describing the next location in stack memory where a value can be stored, so the environment type in the environment.ml file is just an int. In later labs, we will change the environment to hold more information. You can change this type any way you like and add any other functions to this code that you find appropriate. You will, however, need some kind of environment information to handle binary operations correctly.

  4. Now that you have defined a data structure to keep track of your environment, go back to compiler.ml and implement code for the binary operators. Note that the compile_program function has already been written to pass an empty environment to the first call to compile_expression. You just need to use the environment passed to compile_expression to allocate stack memory correctly.

Testing the Auklet Compiler

You can test your compiler directly after you make the program by running hatch.byte on a .bird file (e.g. hatch.byte test_code/4.bird). All code that the compiler generates gets stored in the output directory. If something goes wrong in assembling or linking your code, you can see this in the files stored in the logs directory (which capture the output and error messages of the nasm and clang calls we make).

You can also unit test your compiler. The src/testUtils.ml file contains functions that will help you build OUnit tests. For this lab, test_success is the most useful. This function takes the name of a source file and the output that you expect that source to produce when it is compiled and run.

You can see some examples of how to use these functions in the provided tests.ml file. For this assignment only, you are required to write some tests that exercise the Auklet language. Specifically, you must write tests to verify the behavior of each of the expressions (integers, unary operators, and binary operators) as well as tests to make sure you correctly allocate temporary variables in binary operations.

You are encouraged to use the test_success function to create tests. Don’t let this stop you from writing other kinds of tests, though! For instance, you can write tests to determine if your environment type is behaving the way it should or if your assembly instructions convert correctly to a string. To do this, just write the same kind of unit test code as in your previous labs.

Future assignments will not require you to write unit tests, but you definitely should. As the semester progresses, the unit tests that you write now will save you time in each future assignment. You can do your future self a favor by getting into the habit of writing unit tests!

Debugging the Auklet Compiler

There are a variety of techniques you can use to debug your compiler:

Summary

To complete this assignment, you’ll need to

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit on the appropriate branch as of the due date will be graded. For more information on when assignments are due, see the Policies page.

After each lab, you will complete a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!