For this project you will add basic code generation to your C-- compiler. Your code generator will take two inputs: the name of a C-- program file and the name of the desired MIPS output file. To execute the MIPS program produced use either spim or xspim. For example:
$ python codeGenerator.py test.c-- test.mips $ spim -f test.mips $ sxpim test.mips
To learn more about MIPS and spim check out:
Of the four projects we will complete in this course, this project (consisting of parts a and b) is the most difficult, but also the most satisfying because you finally see how all the major pieces of the compiler fit together.
I strongly advise you to do a small section of code generation at a time and to test each section thoroughly before attempting additional sections. You should create a test suite of C-- programs to test your code generator. As you add new sections to the code generator you should re-test your compiler on your entire test suite to be sure that your didn't break something that was previously working.
To get the starting point code for this project do update75.
In your cs75/projects/3 directory is a file called start.mips that shows you the basic structure of a MIPS assembly program. Try running this MIPS program using xspim. It should print the number 6 followed by a newline and end. I will describe the key sections of the code below.
.data
_newline_:
.asciiz "\n"
.text
.globl main
The data section includes the newline character, which will be used
for writeln statements, and the global text
symbol main, which will be the label that corresponds to the
first instruction of the main program.
main: addi $sp, $sp, -40 sw $fp, 36($sp) sw $ra, 32($sp) sw $s0, 28($sp) sw $s1, 24($sp) sw $s2, 20($sp) sw $s3, 16($sp) sw $s4, 12($sp) sw $s5, 8($sp) sw $s6, 4($sp) sw $s7, 0($sp) addi $fp, $sp, 40The stack grows towards lower addresses, so the first step moves the stack pointer down 40 so that the frame can store 10 items that are each 4 bytes long. In the frame we store the previous frame pointer, return address, and the 8 saved registers. Finally, we move the frame pointer to the beginning of the frame.
li $a0, 6 li $v0, 1 syscall la $a0, _newline_ li $v0, 4 syscallFor C-- we will be using 3 different MIPS system calls:
_main_exit: addi $sp, $fp, -40 lw $fp, 36($sp) lw $ra, 32($sp) lw $s0, 28($sp) lw $s1, 24($sp) lw $s2, 20($sp) lw $s3, 16($sp) lw $s4, 12($sp) lw $s5, 8($sp) lw $s6, 4($sp) lw $s7, 0($sp) addi $sp, $sp, 40 jr $raThe first step is to move the stack pointer back to the section of the frame that contains all of the saved data that needs to be restored. Remember that when generating code for the body of the function we may have added more data to the stack. Next we restore the frame pointer, the return address, and all of the saved registers. Finally, we move the stack pointer back to where it was before the function was executed and then jump to the return address.
The following is a suggested order for tackling code generation.
int main() {
write 6+3*8;
writeln;
}
To do this you can copy the MIPS program that I provided,
called start.mips, and just modify the code between the
calling sequence and the return sequence to do the additional
arithmetic operations.
In order to break down the process of code generation into small, incremental steps, we will initially ignore variables and function calls. We will assume that all programs only have a main function that deals directly with numbers.
Your code generator will traverse the AST produced by the parser to generate MIPS code. Structure the code generator so that it has methods that roughly match the different node types in the AST.
For each step described below, I have provided at least one test program.
int main() {
write 5;
writeln;
}
These should be implemented using system calls with registers $a0 and
$v0 as discussed previously in the MIPS example.
int main() {
write 5+7-3; //9
writeln;
write 10*8/4; //20
writeln;
}
To generate code for expressions you need to use registers to store
temporary results. For example, the AST for the first write
statement above looks something like this:
write
|
sub
/ \
add num
/ \ |
num num 3
| |
5 7
The value of 5 needs to be loaded into a register, the value of 7
needs to be loaded into a register, they need to be added together and
the result stored in a register. Then the value of 3 needs to be
stored in a register and subtracted from the register storing 5+7, and
that result needs to be stored in the $a0 register, which will be used
to write the value out.
You should use the temporary registers $t0-$t7 for evaluating expressions. Your code generator will need to implement a register allocation scheme for these registers so that when a temporary register is needed to store an operand or result, one can be found and when a temporary register is no longer needed it can be made available for generating other instructions. Make sure that you can handle expressions that have more operands then there are temporary registers, such as:
write 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10;
int main() {
write 4 == 5; //0
write 4 >= 5; //0
write 4 > 5; //0
write 4 < 5; //1
write 4 != 5; //1
write 4 <= 5; //1
writeln;
}
// tests or
int main() {
write (4 == 5) || (4 > 5); //0
write 1 || 0; //1
write (1 == 2) || (5 < 6); //1
writeln;
}
// tests and
int main() {
write (4 != 5) && (5 != 6); //1
write 1 && 0; //0
write 0 && 1; //0
writeln;
}
// tests not
int main() {
write !(3 < 4); //0
write !(4 < 3); //1
write !!1; //1
writeln;
}
Remember that in the C-- language and and or are
short-circuit operators. In other words, if the first operand of
an and is false, then false is returned and the second
operand is never evaluated. Similarly, if the first operand of
an or is true, then true is returned and the second operand
is never evaluated.
To do this you need to implement unique labels to which the short-circuiting code will branch allowing it to skip over evaluating the rest of the expression once the final result is known. You can use a simple label notation, like .Li where i increases for each unique label generated.
int main() {
if (5 < 3)
write 1;
else
write -1;
writeln;
if (5 < 6)
write 1;
else
write -1;
writeln;
}
An if statement alters the control flow of a program. You
need to generate branch instructions to unique labels to implement
these correctly. Your code generator should implement somewhat
meaningful unique labels to make your assembly code for if
statments more readable. For example, labels for an if
statement might look like:
... # instructions for the if part
...
.L0_else: # a unique label for the start of the else part
...
...
.L1_ifdone: # a unique label for the end of the if-else
// Should go into an infinite loop printing 2's
int main() {
while (1)
write 2;
}
/*
Should output:
1
3
While loop should never be executed.
*/
int main() {
write 1;
writeln;
while (0) {
write 2;
writeln;
}
write 3;
writeln;
}
Similar to if statements, while loops will also
require the use of branching instructions.