CS75 Lab 3: Code Generator Part 1

Due Thursday April 9 before 2am: (late Wednesday night)
Problem Introduction
For this project you will add the first part of the code generator to your C-- compiler. You will add code generation for parts of the C-- language that do not involve variables or functions; C-- code with a single main function that deals directly with numbers. The code generator part of your compiler will take as input the AST produced by the parser, and will traverse the AST to generate MIPS assembly code.

Your compiler executable should take two command line arguments, the first is the C-- code source file and the second is the name of the MIPS assembly code output file:

$ ./mycc  foo.c--  foo.mips
You can then run your MIPS assembly program using spim or xspim:
$ spim  foo.mips
$ xspim foo.mips
Getting Started

Grab the starting point code and add it to your svn repository

One of you or your partner, should do the following:
  1. From you project subdirectory, copy over the setup3 file from my directory containing the project 3 starting point code:
    $ cd cs75/CS75_repositoryname/project
    $ cp /home/newhall/public/cs75/proj3_startingpt/setup3 .
    
  2. run setup3:
    $ ./setup3
    
    The setup3 script will copy over project 3 starting point files into the symtab, parser, and includes subdirectories, call svn add to add these files to your repository, and then call svn to commit the new files.
  3. At this point your partner should be able to do an "svn update" to grab the project3 starting point files from your repository.

Starting Point Code Details

There is not much starting point code for this assignment. Most of the files are empty, and are just provided as a starting point that can be built with the supplied Makefile. You may want to add additional .c and .h files as you implement code generation functionality. If you do so, make sure to edit the Makefile to include your .c files in building the compiler executable.
include/[codegen.h, symtab.h]: shell of header file for code generator and symbol table
codegen/codegen.c: starting point for code generator module, the codegen function 
                   should take the AST and generate MIPS code to a codetable 
codegen/main.c: the main program for the full C-- compiler, you may need to change
                some of the function calls in here, but it is basically complete
codegen/Makefile: builds the mycc C-- compiler, modify this if you add more .c files
symtab/symtab.c: starting point for the symbol table module, you won't need 
                 this until the next assignment
Project Details:
As a first step, read over some of the MIPS references available off the "On-line Resources" part of the CS75 home page.

For this part of the code generator, you will ignore all variables and function calls. Assume that all C-- code will simply be a main program that deals directly with numbers.

Things to keep in mind as you add code generation

Suggestions for order in which to implement code generation

The following is a suggested order of implementation and testing:
  1. Implement a codetable data structure and functions for adding instructions to the code table, for modifying instructions in the code table, and for writing the codetable to the resulting MIPS assembly code output file.

    The code table contains instruction entries. You should define a struct that can store all types of MIPS instructions, labels, and directives. Make sure to use enum or #defines for field values to make your code readable.

  2. Add a function to generate the start of every MIPS assembly code file to the code table. The beginning of every MIPS assembly file should look like this:
               .data
    _newline_:
               .asciiz "\n"
               .text
               .globl main
    
    _newline_ is used for writeln and the only global symbol is main.
  3. Add functions to generate all the generic parts of function preamble and function epilog code to the code table.

    Assume that the callee will always save all callee saved registers in the prolog. As an extra credit part of the next assignment, you can think about saving only the ones the callee actually needs to use.

    Because you are not handling code generation for functions in this assignment, you should hard-code in calls to generate main's prolog before you handle generating code for the parts of C-- that you are handling, and then hard-code in a call to generate main's epilog after you are done generation code for the parts of C-- you are handling. In the next assignment you will remove these calls and handle generating code for main's prolog and epilog just like you handle for any other function definition in the AST.

  4. Start adding code to support simple C-- programs with a single main function with expressions involving numbers only and write and writeln statements. Here is a suggestion for an order in which to add and test functionality:
    1. writeln;
    2. write num;
    3. write expression;
    4. if else statements
    5. while loops (at this point you can only test loops that never execute or loops that are infinite, since you cannot yet handle assignment statements)

An example C-- program that you should be able to compile is the following (you do not need to handle the variable declaration list part of a block for this assignment):

int main() {
    if(4 != 5 || 6 == 8) {
        write 1;
        writeln;
        write 2 + 6 * 3 - 'a';
        writeln;
    } else {
        write 0;
        writeln;
    }
    while(8 < 3) {
        write 8;
    }
}

write and writeln

write and writeln are should be implemented using SPIM's support for system calls (print_int and print_string). Registers $a0 and $v0 are used for system calls.

Expressions

To generate code for expressions, start with C-- programs that contain simple arithmetic expressions like:
int main() {
  write 9;
  write 9 + 10;
  write 10 - 3;
}
Then try more and more complicated and longer arithmetic expressions and test that your generated code is correct.

To generate code for expressions you need to use registers to store temporary results. For example, if the AST for a write stmt looks like:

 write
   |
   +
  / \
 9   *
    / \
   10   3
The value of 10 needs to be loaded into a register, the value of 3 needs to be loaded into a register, they need to be multiplied together and the result stored in a register, then the value of 9 needs to be stored in a register and added to the register storing 10*3 and that result needs to be stored in a result register.

You should use the caller saved registers $t0-$t7 for evaluating expressions. Your code generator will need to implement a register allocation scheme for these registers so that when a temporary register is needed to store an operand or result, one can be found and when a temporary register is no longer needed it can be made available for generating other instructions. Make sure that you can handle expressions that have more operands then there are temporary registers:

write 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10;

Once you have stress tested arithmetic expressions, implement relational expressions:

write 9 > 13;    // evaluates to 0  (false)
write 9 <= 13;   // evaluates to 1  (true)
write 9 == 13;      // evaluates to 0

Once you have thoroughly tested relational operators, then implement code generation for expressions with logical operators:

write !2;                          // evaluates to 0
write 0 || 1;                      // evaluates to 1
write 8 < 9 && 7 != 6 && 2 > 10;   // evaluates to 0
You must implement short-circuiting for (expr || expr) and ( expr && expr) To do this you need to implement unique labels to which the short-circuiting code will branch to skip over evaluating the rest of the expression once the final answer is known. You can use a simple label notation, like .Li where i increases for each unique label generated. sprintf may be a useful function for producing the label string.

if-else and while

if-else and while are C-- constructs that alter the control flow of a program. You need to generate branch instructions to unique labels to implement these correctly. Your code generator should implement somewhat meaningful unique labels to make your assembly code for if-else and while control flow more readable. For example, here is what my labels for an if-else might look like:
    ...                   # instructions for the if part
    ...
.L0_else:                 # a unique label for the start of the else part
    ...
    ...
.L1_ifdone:               # a unique label for the end of the if-else stmt
What to hand in
A tar file to be submitted using cs75handin containing:
  1. a tar file of your project directory (i.e. all the source, header, and Makefile files I need to build and run your parser). You should do a 'make clean' before taring up your project directory, and make sure that all debug output is turned off (i.e. your compiler should have no output to stdout).
  2. A README file with:
    1. Your name and your partner's name
    2. The number of late days you have used on this assignment
    3. The total number of late days you have used so far

Project Demo
You and your partner will sign-up for a 20 minute demo of your compiler. Prior to your demo, you should come up with some example C-- programs that demonstrate both the correctness of your solution and that demonstrate any parts of your code generation that are particularly interesting, tricky, etc. You should also show me some error handling.