For this project you will complete the code generator for your C-- compiler. You will add support for variables (globals, locals, parameters), arrays, and function calls. Your completed compiler should be able to generate code for any valid C-- program and should correctly identify and handle errors in invalid C-- programs. Start with your solution from the previous project (cs75/projects/3/codeGenerator.py), and add more functionality to it.
You should continue to work incrementally on the code generator. I suggest you follow the progression given below for adding the remaining functionality.
The symbol table will now play a much more significant role. It will need to be updated to include information about the various kinds of identifiers (keywords, parameters, locals, globals, functions), their types (int, char, int array, char array), and their locations. Identifiers can be located in a register or offset from some pointer to memory. It would be useful to have methods in your symbol table to test whether a particular identifier is inRegister or inMemory.
As your code generator encounters a new function or a new block containing local variable declarations, it should initialize a new scope in the symbol table. As you leave the function or block, it should finalize the current scope in the symbol table. Each parameter or local variable should be entered into the symbol table as you encounter it. Your code should be able to detect duplicate definitions of the same name within the same scope, and report an error.
For functions, the symbol table should maintain information about the return type, parameter types, and parameter names. This information should be entered into the symbol table by the parser. The function information should be inserted at the global scope and remain in tact throughout the code generation process.
Parameters will be stored in the argument registers, but may need to be moved to save registers or spilled to the stack.
For now we are assuming that we have simple assignment statements of the form: lvalue = rvalue where the rvalue is the value of an expression and the lvalue is the address of an identifier or an array reference. Initially, focus on simple identifiers.
evaluate the right-hand side to get an rvalue lookup the id from the left-hand side in the symbol table if no entry exists report an error if entry is in register move rvalue into this register if entry is in memory store the rvalue into this location
The read statement only accepts simple identifiers, and not array references. Reading in an integer is done with a system call. It will put the value into register $v0. Then you must transfer this value into the proper storage location using a very similar method to assignment.
The break statement is only appropriate when used within a loop. To implement this feature you will need to add a label argument to your methods that handle statements and blocks. In most cases, this label will not be used, so should have a default value of None. In your code that processes loops, you will need to pass the loop ending label into the code that handles each of the loop's statements.
A return statement should evaluate its expression and put the result into register $v1. It should then do an unconditional branch to the exit label of the function which marks the return sequence.
Each function definition will have a boiler plate opening, which saves the 10 registers that must be preserved across calls, and then sets the frame pointer to the beginning of the activation record. Each function will also have a boiler plate closing, which restores the stack pointer to the head of the saved registers, restores the registers, and then returns. The frame pointer should stay fixed throughout the execution of a function, while the stack pointer may change as additional space is needed on the stack.
If the body of a function definition does not contain a function call, then the function's arguments can remain in the argument registers. Otherwise we will immediately move the function's arguments out of the argument registers before evaluating the function body. We can move them either to save registers or to the stack frame. You can use the helper function below to determine if a nested list ls contains a particular item x. In our case the nested list will be the AST of the function body and the particular item we will be searching for is a token like 'funcall'.
In either case the symbol table should be updated appropriately to reflect the locations of the parameters.
def deepMember(x, ls): if len(ls) == 0: return False if type(ls[0]) == type(ls): return deepMember(x, ls[0]) or deepMember(x, ls[1:]) if ls[0] == x: return True return deepMember(x, ls[1:])
Here are the steps needed to handle function calls with pass by value arguments. This will need to be modified once you add the ability to pass arrays.
lookup the name of the callee in the symbol table if the name is not declared or is not a function report an error if number of arguments do not match the number of parameters report an error if currently using any temporary registers spill them to the stack for each argument evaluate the argument expression move the result into an argument register jump to the callee if temporaries were spilled restore them result of function call is in $v1
In the symbol table you should keep track of the size and type of the array.
Whether you are assigning a value to an array reference or simply accessing an array reference you will need to go through the following steps.
load the address of the base of the array into a register evaluate the index expression multiply the index value by 4 (using a shift left by 2 operation) if the array is global add the offset to the base address else subtract the offset from the base address return the register containing the addressThen you can offset this address by 0 to store or load values.
Once you have successfully completed the basic C-- compiler you can try adding some of these additional extra credit features:
a = b = c = d; // b, c, and d are both l-values and r-values x = (y = 6) + 8; // (y = 6) assigns 6 to l-value y, then use // the r-value of y to add to 8
int foo(int *x) { // x declared as a pointer to int *x = 6; // dereference the pointer to modify x's value return 1; } int main() { int y; foo(&y); // pass y by reference to foo write y; }
Run handin75 to turn in your code generator by the deadline. Make sure that your cs75/projects/3 directory contains all of the code necessary to test your code generator, which should be implemented in the file codeGenerator.py.