CS 75 Project 3a: Code Generation

Due: Monday April 4, by 11:59pm
Introduction

For this project you will add basic code generation to your C-- compiler. Your code generator will take two inputs: the name of a C-- program file and the name of the desired MIPS output file. To execute the MIPS program produced use either spim or xspim. For example:

$ python codeGenerator.py test.c-- test.mips
$ spim -f test.mips
$ sxpim test.mips

To learn more about MIPS and spim check out:

Of the four projects we will complete in this course, this project (consisting of parts a and b) is the most difficult, but also the most satisfying because you finally see how all the major pieces of the compiler fit together.

I strongly advise you to do a small section of code generation at a time and to test each section thoroughly before attempting additional sections. You should create a test suite of C-- programs to test your code generator. As you add new sections to the code generator you should re-test your compiler on your entire test suite to be sure that your didn't break something that was previously working.

To get the starting point code for this project do update75.

The Structure of a MIPS Assembly Program

In your cs75/projects/3 directory is a file called start.mips that shows you the basic structure of a MIPS assembly program. Try running this MIPS program using xspim. It should print the number 6 followed by a newline and end. I will describe the key sections of the code below.


Getting Started

The following is a suggested order for tackling code generation.

  1. First, fix any problems in your parser before beginning code generation.

  2. Before implementing any code, try writing some simple MIPS programs by hand, such as one that would be quivalent to the following C-- program:
    int main() {
      write 6+3*8;
      writeln;
    }
    
    To do this you can copy the MIPS program that I provided, called start.mips, and just modify the code between the calling sequence and the return sequence to do the additional arithmetic operations.

  3. You will need a data structure to store the assembly code as you generate it. I suggest using a python list of lists where each inner list represents one line of assembly code. Each inner list can contain a flag representing what type of MIPS instruction it is, such as: 'code', 'label', or 'directive'. Further positions in the list can represent the necessary operators and operands. Write all of the necessary methods needed to store and emit code using your data structure.

  4. Add a method to generate the .data section of the assembly code that will begin every MIPS program.

  5. Add methods to generate the generic parts of the function calling sequence and return sequence. For now, you will hard code this in for main, but in the next part of this project, you will make this work for any function.

  6. Start adding code to support simple C-- programs that just contain a main program with expressions involving numbers (no variables) and only the statements write and writeln. See the next section for more detail on how to do this.

Implementing Basic Functionality

In order to break down the process of code generation into small, incremental steps, we will initially ignore variables and function calls. We will assume that all programs only have a main function that deals directly with numbers.

Your code generator will traverse the AST produced by the parser to generate MIPS code. Structure the code generator so that it has methods that roughly match the different node types in the AST.

For each step described below, I have provided at least one test program.

  1. Because you will need write and writeln to test all aspects of code generation, you should begin by implementing them.
    int main() {
      write 5;
      writeln;
    }
    
    These should be implemented using system calls with registers $a0 and $v0 as discussed previously in the MIPS example.

  2. Implement the arithmetic operators.
    int main() {
      write 5+7-3;   //9
      writeln;
      write 10*8/4;  //20
      writeln;
    }
    
    To generate code for expressions you need to use registers to store temporary results. For example, the AST for the first write statement above looks something like this:
             write
               |
              sub
             /   \
           add   num
           / \    |
        num  num  3
         |    |
         5    7 
    
    The value of 5 needs to be loaded into a register, the value of 7 needs to be loaded into a register, they need to be added together and the result stored in a register. Then the value of 3 needs to be stored in a register and subtracted from the register storing 5+7, and that result needs to be stored in the $a0 register, which will be used to write the value out.

    You should use the temporary registers $t0-$t7 for evaluating expressions. Your code generator will need to implement a register allocation scheme for these registers so that when a temporary register is needed to store an operand or result, one can be found and when a temporary register is no longer needed it can be made available for generating other instructions. Make sure that you can handle expressions that have more operands then there are temporary registers, such as:

    write 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10;
    

  3. Implement the relational operators.
    int main() {
      write 4 == 5;  //0
      write 4 >= 5;  //0
      write 4 > 5;   //0
      write 4 < 5;   //1
      write 4 != 5;  //1
      write 4 <= 5;  //1
      writeln;
    }
    

  4. Implement the boolean operators.
    // tests or
    int main() {
      write (4 == 5) || (4 > 5);  //0
      write 1 || 0;               //1
      write (1 == 2) || (5 < 6);  //1
      writeln;
    }
    
    // tests and
    int main() {
      write (4 != 5) && (5 != 6); //1
      write 1 && 0;               //0
      write 0 && 1;               //0
      writeln;
    }
    
    // tests not
    int main() {
        write !(3 < 4); //0
        write !(4 < 3); //1
        write !!1;      //1
        writeln;
    }
    
    Remember that in the C-- language and and or are short-circuit operators. In other words, if the first operand of an and is false, then false is returned and the second operand is never evaluated. Similarly, if the first operand of an or is true, then true is returned and the second operand is never evaluated.

    To do this you need to implement unique labels to which the short-circuiting code will branch allowing it to skip over evaluating the rest of the expression once the final result is known. You can use a simple label notation, like .Li where i increases for each unique label generated.

  5. Implement if statements.
    int main() {
      if (5 < 3) 
        write 1;
      else
        write -1;
      writeln;
      if (5 < 6)
        write 1;
      else
        write -1;
      writeln;
    }
    
    An if statement alters the control flow of a program. You need to generate branch instructions to unique labels to implement these correctly. Your code generator should implement somewhat meaningful unique labels to make your assembly code for if statments more readable. For example, labels for an if statement might look like:
        ...                   # instructions for the if part
        ...
    .L0_else:                 # a unique label for the start of the else part
        ...
        ...
    .L1_ifdone:               # a unique label for the end of the if-else
    

  6. Implement while loops, but don't worry about break statements yet.
    // Should go into an infinite loop printing 2's
    int main() {
      while (1)
        write 2;
    }
    
    /*
       Should output:
       1
       3
       While loop should never be executed.
    */
    int main() {
      write 1;
      writeln;
      while (0) {
        write 2;
        writeln;
      }
      write 3;
      writeln;
    }
    
    Similar to if statements, while loops will also require the use of branching instructions.

Submit
Run handin75 to turn in your code generator by the deadline. Make sure that your cs75/projects/3 directory contains all of the code necessary to test your code generator, which should be implemented in the file codeGenerator.py.