Project FAQ
I'll post answers to questions I get about this assignment here:
Problem Introduction
For this project you will complete the code generator for your C--
program. You will add support for variables (globals, locals, parameters,
temporaries) and function calls. Your completed compiler should be able
to generate code for any valid C-- program and should correctly identify
and handle errors in invalid C-- programs.
Start with your lab3 solution and add more functionality to your existing
code generator; there is no starting point code for this
assignment
Project Details:
To complete the code generation for C--, you need to add a symbol table
and then add support for variables and functions. Again, use good modular
design and incremental implementation and testing.
symbol table
You should first implement a symbol table data structure and add functions
for accessing and manipulating symbol table entries. In symtab/symtab.c and
include/symtab.h are starting point files for this.
As you traverse the AST and encounter a new nesting scope and definitions,
you may add new symbol table entries, as you leave a nesting scope,
you will remove all entries in the scope you are leaving.
As you encounter name definitions in the AST, you must determine the type,
and location for these names. Also, for names whose location is on
the stack or in global data area, you need to ensure that the addresses
are aligned appropriately for the type. Here are the details of where
different types of values should be allocated at runtime (you do not need
to handle space allocation for these all yet, but as you add more and more
code generation functionality for names, add corresponding symtab
functionality for determining their location):
- ints that are stored on the stack or are globals, must be aligned
on a 4 byte boundary (their addresses must be a multiple of 4).
- chars are byte aligned (any address is valid)
- the address stored in $sp and $fp must be 4 byte aligned, thus
character arrays allocated on the stack may need padding to fill up
a full multiple of 4 number of bytes.
- the first four parameters are passed in registers $a0 - $a3. For this
assignment you do not need to handle functions with more than 3 parameters,
however, if the values in $a0-$a3 need to be saved around a function call,
they should be saved to the stack. Regardless of the size of the value
stored in a register (i.e. an int or a char), always save the full 4-bytes
of the register to the stack.
- The first 8 non-array local and temporary variables are allocated
to registers $s0-$s7. Again, if these need to be saved to the stack, push
the 4-bytes of register space even if the value stored in the register
is only 1 byte.
Array locals are always allocated on the stack, and functions that
have more than 8 locals and temporaries allocate space for additional
ones on the stack.
Temporaries for non-overlapping scope, can share the same storage
location.
Your symbol table entries for functions should contain enough information
so that your compiler can detect errors in function calls with too few
or too many arguments for the function.
code generation for variables and functions
I strongly suggest that you do a little bit of code generation at a time
and test thoroughly after each new phase. The following is a suggested
order of implementation and testing:
everything except arrays:
- Add support for some assignment statements (make sure to store the
correct number of bytes for the variable type):
x = expr; // x is a global
y = expr; // y is a local variable
You should detect invalid l-value errors:
x + 1 = 3; // x + 1 is not a valid l-value
- Add support for read:
read x; // x is an int or a char, a global, local or parameter
- Add support for break
- Add support for function definitions:
- generate function prolog and epilog code for multiple functions
- generate code for simple functions without local variables and
parameters
- add support for function calls
- functions with no arguments and no return values
- functions that return a value
- functions with 1, 2, or 3 arguments, try arguments that are
numeric expressions, locals, globals, parameters
- function calls where $ti or $ai registers need to be saved
and restored around the call
- functions that call functions that call functions...try a recursive
function
arrays:
- array references on rhs of assignment stmt:
- array refs on lhs of assignment stmt:
x = a[i] + 3;
x = a[3 + 4 - x];
a[i] = a[j] + 3;
a[i] = a[a[i] + 1] + 6;
the array can be an int or char array, a global or a local, or a
parameter (you may want to come back and handle parameters later)
- array parameters
int foo(int a[]) {
a[1] = a[2] + 7;
return a[3];
}
- array arguments:
foo(array);
You need to handle all possible cases for the type that array can be: a local
variable, a global variable, a parameter.
Once you successfully complete the basic C-- compiler, you can try adding
optimizations to your compiler and/or additional features to C-- for
extra credit points.
Do not do extra credit part
before your initial solution is complete and correct; an incomplete
implementation of the required functionality plus an added feature
will result in a lower grade than a complete implementation of the
required functionality without any additional feature added.
If you add extra credit parts to your compiler, you will
submit two versions of your compiler: the pre-extra
credit version; and the
extra credit version. One reason for this is that you may add extra
credit features that would normally be considered errors in the regular
version and I need to test your error handling. Another is that if your
extra credit features result in breaking something that you had correct
in your initial version, then I can test your working pre-extra credit
version.
Before starting the extra credit part, I suggest checking in all
your project code to svn, and then check out your project into
a new directory for your extra credit part (there is a way in svn
to create a separate branch for your extra credit parts).
Adding Optimizations to your compiler
There are two points when you can add optimizations to your compiler:
- before the first pass of code generation: here your optimizer will
take as input the AST from the parser, and produce an optimized AST.
Some examples of optimizations that you can perform are:
- eliminate redundant expressions
- simplify expressions
- constant propagation
- constant folding
- eliminating dead code
- eliminating unreachable code
- moving invariant code from within a loop
- loop unrolling
- after the code generation pass that generates code to a codetable:
your optimizer will take the codetable as input and re-write it by applying
peephole optimizations.
When adding optimizations it is important to test thoroughly to ensure that
the optimized code and the non-optimized code do the same thing; your
optimizations should not change what the program does.
the
Adding features to C--
Here are some additions to C-- you might want to consider (please talk to
me about any additional ideas you have for extra features before
attempting them):
- for loops
- functions with more than 4 parameters
- floats
- strings
- function prototypes (allows two functions to call each other)
- pass by reference (passing int and char variables by reference)
You must implement C style pass by reference, not C++ style.
An example of C style pass by reference:
int foo(int *x) { // x declared as a pointer to int
*x = 6; // dereference the pointer to modify the arg's value
return (*x + 10);
}
int main() {
int x;
write foo(&x); // pass x by reference to foo
}
- supporting assignment statements as expressions on the right-hand-side
of statements. For example:
a = b = c = d; // b, c, and d are both l-values and r-values
x = (y = 6) + 8; // (y = 6) assigns 6 to l-value y, then use
// the r-value of y to add to 8
- support for pointers and dynamic memory.
Some of these are quite involved, while others are relatively minor additions.
Extra credit points will be awarded based on difficulty of the functionality
added, and are awarded on an "all or nothing" basis (only for complete and
correct implementations of the added functionality).
What to hand in
A tar file to be submitted using cs75handin containing:
- your project directory (i.e. all the source, header,
and Makefile files I need to build and run your parser). You should do a
'make clean' before taring up your project directory, and make sure that
all debug output is turned off (i.e. your compiler should have no
output to stdout).
- A README file with:
- Your name and your partner's name
- The number of late days you have used on this assignment
- The total number of late days you have used so far
- if you did an extra credit feature(s), then a project directory
(proj_extra_credit)
containing your extra credit version. In the directory should be:
- a README file that describes the extra credit features
you added and how to test them.
- a testprogs directory that contains test programs that demonstrate
your extra features (your README should describe what features
these demonstrate and how to run them for extra features, or where
in the MIPS code I should look to see the optimizations)
Project Report
Submitted with your tar file or as a hard copy, you must include a project
report. Using at most two pages, give an overview of the design of your
compiler. Explain the overall approach of each major component. Also,
include a description of any additional features or optimizations you have
added. If there are any parts of that have not fully been implemented or
are not working properly, you should discuss them. Finally, explain which
part(s) of the course project you found the most challenging and why/in
what way.
Project Demo
You and your partner will sign-up for a 30 minute demo of your compiler.
Prior to your demo, you should come up with some example C-- programs
that demonstrate both the correctness of your solution and that demonstrate
any parts of your code generation that are particularly interesting, tricky,
etc. You should demo any extra credit features you added, and you should
show some error handling.