For this project you will add basic code generation to your C-- compiler. Your code generator will take two inputs: the name of a C-- program file and the name of the desired MIPS output file. To execute the MIPS program produced use either spim or xspim. For example:
$ python codeGenerator.py test.c-- test.mips $ spim -f test.mips $ sxpim test.mips
To learn more about MIPS and spim check out:
Of the four projects we will complete in this course, this project (consisting of parts a and b) is the most difficult, but also the most satisfying because you finally see how all the major pieces of the compiler fit together.
I strongly advise you to do a small section of code generation at a time and to test each section thoroughly before attempting additional sections. You should create a test suite of C-- programs to test your code generator. As you add new sections to the code generator you should re-test your compiler on your entire test suite to be sure that your didn't break something that was previously working.
To get the starting point code for this project do update75.
In your cs75/projects/3 directory is a file called start.mips that shows you the basic structure of a MIPS assembly program. Try running this MIPS program using xspim. It should print the number 6 followed by a newline and end. I will describe the key sections of the code below.
.data _newline_: .asciiz "\n" .text .globl mainThe data section includes the newline character, which will be used for writeln statements, and the global text symbol main, which will be the label that corresponds to the first instruction of the main program.
main: addi $sp, $sp, -40 sw $fp, 36($sp) sw $ra, 32($sp) sw $s0, 28($sp) sw $s1, 24($sp) sw $s2, 20($sp) sw $s3, 16($sp) sw $s4, 12($sp) sw $s5, 8($sp) sw $s6, 4($sp) sw $s7, 0($sp) addi $fp, $sp, 40The stack grows towards lower addresses, so the first step moves the stack pointer down 40 so that the frame can store 10 items that are each 4 bytes long. In the frame we store the previous frame pointer, return address, and the 8 saved registers. Finally, we move the frame pointer to the beginning of the frame.
li $a0, 6 li $v0, 1 syscall la $a0, _newline_ li $v0, 4 syscallFor C-- we will be using 3 different MIPS system calls:
_main_exit: addi $sp, $fp, -40 lw $fp, 36($sp) lw $ra, 32($sp) lw $s0, 28($sp) lw $s1, 24($sp) lw $s2, 20($sp) lw $s3, 16($sp) lw $s4, 12($sp) lw $s5, 8($sp) lw $s6, 4($sp) lw $s7, 0($sp) addi $sp, $sp, 40 jr $raThe first step is to move the stack pointer back to the section of the frame that contains all of the saved data that needs to be restored. Remember that when generating code for the body of the function we may have added more data to the stack. Next we restore the frame pointer, the return address, and all of the saved registers. Finally, we move the stack pointer back to where it was before the function was executed and then jump to the return address.
The following is a suggested order for tackling code generation.
int main() { write 6+3*8; writeln; }To do this you can copy the MIPS program that I provided, called start.mips, and just modify the code between the calling sequence and the return sequence to do the additional arithmetic operations.
In order to break down the process of code generation into small, incremental steps, we will initially ignore variables and function calls. We will assume that all programs only have a main function that deals directly with numbers.
Your code generator will traverse the AST produced by the parser to generate MIPS code. Structure the code generator so that it has methods that roughly match the different node types in the AST.
For each step described below, I have provided at least one test program.
int main() { write 5; writeln; }These should be implemented using system calls with registers $a0 and $v0 as discussed previously in the MIPS example.
int main() { write 5+7-3; //9 writeln; write 10*8/4; //20 writeln; }To generate code for expressions you need to use registers to store temporary results. For example, the AST for the first write statement above looks something like this:
write | sub / \ add num / \ | num num 3 | | 5 7The value of 5 needs to be loaded into a register, the value of 7 needs to be loaded into a register, they need to be added together and the result stored in a register. Then the value of 3 needs to be stored in a register and subtracted from the register storing 5+7, and that result needs to be stored in the $a0 register, which will be used to write the value out.
You should use the temporary registers $t0-$t7 for evaluating expressions. Your code generator will need to implement a register allocation scheme for these registers so that when a temporary register is needed to store an operand or result, one can be found and when a temporary register is no longer needed it can be made available for generating other instructions. Make sure that you can handle expressions that have more operands then there are temporary registers, such as:
write 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10;
int main() { write 4 == 5; //0 write 4 >= 5; //0 write 4 > 5; //0 write 4 < 5; //1 write 4 != 5; //1 write 4 <= 5; //1 writeln; }
// tests or int main() { write (4 == 5) || (4 > 5); //0 write 1 || 0; //1 write (1 == 2) || (5 < 6); //1 writeln; }
// tests and int main() { write (4 != 5) && (5 != 6); //1 write 1 && 0; //0 write 0 && 1; //0 writeln; }
// tests not int main() { write !(3 < 4); //0 write !(4 < 3); //1 write !!1; //1 writeln; }Remember that in the C-- language and and or are short-circuit operators. In other words, if the first operand of an and is false, then false is returned and the second operand is never evaluated. Similarly, if the first operand of an or is true, then true is returned and the second operand is never evaluated.
To do this you need to implement unique labels to which the short-circuiting code will branch allowing it to skip over evaluating the rest of the expression once the final result is known. You can use a simple label notation, like .Li where i increases for each unique label generated.
int main() { if (5 < 3) write 1; else write -1; writeln; if (5 < 6) write 1; else write -1; writeln; }An if statement alters the control flow of a program. You need to generate branch instructions to unique labels to implement these correctly. Your code generator should implement somewhat meaningful unique labels to make your assembly code for if statments more readable. For example, labels for an if statement might look like:
... # instructions for the if part ... .L0_else: # a unique label for the start of the else part ... ... .L1_ifdone: # a unique label for the end of the if-else
// Should go into an infinite loop printing 2's int main() { while (1) write 2; }
/* Should output: 1 3 While loop should never be executed. */ int main() { write 1; writeln; while (0) { write 2; writeln; } write 3; writeln; }Similar to if statements, while loops will also require the use of branching instructions.