CS75 Lab 4: Code Generator Part 2

Due Thursday April 30 before 2am: (late Wednesday night)
You may use at most one lated day on this project (you must submit it by Tuesday May 5 before 2am).
Project FAQ
I'll post answers to questions I get about this assignment here:

Problem Introduction
For this project you will complete the code generator for your C-- program. You will add support for variables (globals, locals, parameters, temporaries) and function calls. Your completed compiler should be able to generate code for any valid C-- program and should correctly identify and handle errors in invalid C-- programs.

Start with your lab3 solution and add more functionality to your existing code generator; there is no starting point code for this assignment

Project Details:
To complete the code generation for C--, you need to add a symbol table and then add support for variables and functions. Again, use good modular design and incremental implementation and testing.

symbol table

You should first implement a symbol table data structure and add functions for accessing and manipulating symbol table entries. In symtab/symtab.c and include/symtab.h are starting point files for this.

As you traverse the AST and encounter a new nesting scope and definitions, you may add new symbol table entries, as you leave a nesting scope, you will remove all entries in the scope you are leaving.

As you encounter name definitions in the AST, you must determine the type, and location for these names. Also, for names whose location is on the stack or in global data area, you need to ensure that the addresses are aligned appropriately for the type. Here are the details of where different types of values should be allocated at runtime (you do not need to handle space allocation for these all yet, but as you add more and more code generation functionality for names, add corresponding symtab functionality for determining their location):

  1. ints that are stored on the stack or are globals, must be aligned on a 4 byte boundary (their addresses must be a multiple of 4).
  2. chars are byte aligned (any address is valid)
  3. the address stored in $sp and $fp must be 4 byte aligned, thus character arrays allocated on the stack may need padding to fill up a full multiple of 4 number of bytes.
  4. the first four parameters are passed in registers $a0 - $a3. For this assignment you do not need to handle functions with more than 3 parameters, however, if the values in $a0-$a3 need to be saved around a function call, they should be saved to the stack. Regardless of the size of the value stored in a register (i.e. an int or a char), always save the full 4-bytes of the register to the stack.
  5. The first 8 non-array local and temporary variables are allocated to registers $s0-$s7. Again, if these need to be saved to the stack, push the 4-bytes of register space even if the value stored in the register is only 1 byte.

    Array locals are always allocated on the stack, and functions that have more than 8 locals and temporaries allocate space for additional ones on the stack.

    Temporaries for non-overlapping scope, can share the same storage location.

Your symbol table entries for functions should contain enough information so that your compiler can detect errors in function calls with too few or too many arguments for the function.

code generation for variables and functions

I strongly suggest that you do a little bit of code generation at a time and test thoroughly after each new phase. The following is a suggested order of implementation and testing:

everything except arrays:

  1. Add support for some assignment statements (make sure to store the correct number of bytes for the variable type):
    	x = expr;   // x is a global   
    	y = expr;   // y is a local variable
    	
    You should detect invalid l-value errors:
    	x + 1 = 3;	// x + 1 is not a valid l-value
    	

  2. Add support for read:
      read x;    // x is an int or a char, a  global, local or parameter
    

  3. Add support for break

  4. Add support for function definitions:
    1. generate function prolog and epilog code for multiple functions
    2. generate code for simple functions without local variables and parameters

  5. add support for function calls
    1. functions with no arguments and no return values
    2. functions that return a value
    3. functions with 1, 2, or 3 arguments, try arguments that are numeric expressions, locals, globals, parameters
    4. function calls where $ti or $ai registers need to be saved and restored around the call
    5. functions that call functions that call functions...try a recursive function

arrays:

  1. array references on rhs of assignment stmt:
  2. array refs on lhs of assignment stmt:
       x = a[i] + 3;
       x = a[3 + 4 - x];
       a[i] = a[j] + 3; 
       a[i] = a[a[i] + 1] + 6;
    
    the array can be an int or char array, a global or a local, or a parameter (you may want to come back and handle parameters later)

  3. array parameters
      int foo(int a[]) { 
        a[1] = a[2] + 7;
        return a[3];
      } 
    
  4. array arguments:
       foo(array);
    
    You need to handle all possible cases for the type that array can be: a local variable, a global variable, a parameter.
Extra Credit
Once you successfully complete the basic C-- compiler, you can try adding optimizations to your compiler and/or additional features to C-- for extra credit points. Do not do extra credit part before your initial solution is complete and correct; an incomplete implementation of the required functionality plus an added feature will result in a lower grade than a complete implementation of the required functionality without any additional feature added.

If you add extra credit parts to your compiler, you will submit two versions of your compiler: the pre-extra credit version; and the extra credit version. One reason for this is that you may add extra credit features that would normally be considered errors in the regular version and I need to test your error handling. Another is that if your extra credit features result in breaking something that you had correct in your initial version, then I can test your working pre-extra credit version.

Before starting the extra credit part, I suggest checking in all your project code to svn, and then check out your project into a new directory for your extra credit part (there is a way in svn to create a separate branch for your extra credit parts).

Adding Optimizations to your compiler

There are two points when you can add optimizations to your compiler:
  1. before the first pass of code generation: here your optimizer will take as input the AST from the parser, and produce an optimized AST.

    Some examples of optimizations that you can perform are:

    • eliminate redundant expressions
    • simplify expressions
    • constant propagation
    • constant folding
    • eliminating dead code
    • eliminating unreachable code
    • moving invariant code from within a loop
    • loop unrolling

  2. after the code generation pass that generates code to a codetable: your optimizer will take the codetable as input and re-write it by applying peephole optimizations.
When adding optimizations it is important to test thoroughly to ensure that the optimized code and the non-optimized code do the same thing; your optimizations should not change what the program does. the

Adding features to C--

Here are some additions to C-- you might want to consider (please talk to me about any additional ideas you have for extra features before attempting them): Some of these are quite involved, while others are relatively minor additions. Extra credit points will be awarded based on difficulty of the functionality added, and are awarded on an "all or nothing" basis (only for complete and correct implementations of the added functionality).
What to hand in
A tar file to be submitted using cs75handin containing:
  1. your project directory (i.e. all the source, header, and Makefile files I need to build and run your parser). You should do a 'make clean' before taring up your project directory, and make sure that all debug output is turned off (i.e. your compiler should have no output to stdout).
  2. A README file with:
    1. Your name and your partner's name
    2. The number of late days you have used on this assignment
    3. The total number of late days you have used so far
  3. if you did an extra credit feature(s), then a project directory (proj_extra_credit) containing your extra credit version. In the directory should be:
    1. a README file that describes the extra credit features you added and how to test them.
    2. a testprogs directory that contains test programs that demonstrate your extra features (your README should describe what features these demonstrate and how to run them for extra features, or where in the MIPS code I should look to see the optimizations)

Project Report

Submitted with your tar file or as a hard copy, you must include a project report. Using at most two pages, give an overview of the design of your compiler. Explain the overall approach of each major component. Also, include a description of any additional features or optimizations you have added. If there are any parts of that have not fully been implemented or are not working properly, you should discuss them. Finally, explain which part(s) of the course project you found the most challenging and why/in what way.

Project Demo
You and your partner will sign-up for a 30 minute demo of your compiler. Prior to your demo, you should come up with some example C-- programs that demonstrate both the correctness of your solution and that demonstrate any parts of your code generation that are particularly interesting, tricky, etc. You should demo any extra credit features you added, and you should show some error handling.