CS75 Project Parts 3 and 4: A C-- Compiler

Part 3: Code Generation of basic functionality (10% of your grade)
Due: Thursday April 14, BEFORE 2:00 am (very late Wednesday night)

For the checkpoint you will implement the "Basic Functionality" part of code generation (described below).

Part 4: Complete C-- Compiler, Code, Project Report, and Demo (90%)
Due: Thursday April 28, BEFORE 2:00 am

You may use only 1 late day on this assignment; I will not accept assignments after 2:00 am Tuesday May 3.

Demo: sign-up for a 20 minute demo slot after you submit your code and project report


Problem Introduction and Specifications
Details
What to Hand in including a Project Demo.

Introduction

For this project you will add code generation to complete the C-- compiler. Your compiler will take the name of a C-- program input file and produce a stack machine code output file. It should be able to handle all of the C-- language specifications described in Project One, including function calls, parameter transmission of both arrays and non-arrays, and global and local variables.

Your compiler should take two command line arguments, the first is the C-- code source file and the second is the name of the stack machine output file:

% ./mycc  foo.c--  foo.stk  

Once you successfully complete the basic C-- compiler, you can try adding additional features for extra credit points. Do not add additional features before your initial solution is complete and correct; an incomplete implementation of the required functionality plus an added feature will result in a lower grade than a complete implementation of the required functionality without any additional feature added.

Here are some additions you might want to consider (please talk to me about any additional ideas you have for extra features before attempting them):

Some of these are quite involved, while others are relatively minor additions.

Details/Getting Started

Of the three projects, this is definitely the most difficult, but also the most interesting because you can start to see how it all fits together. You should start by reviewing the ISA of the Stack Machine and run some example stack code. The smach description and code is available here: ~newhall/public/cs75/smach/

I recommend that you use some type of revision control software (CVS would be the best choice if you are working with a partner) to keep versions of your working compiler as you add functionality. Information about using CVS and RCS is available off the cs75 homepage.

You should create a test suite of C-- programs to test your compiler (when you write a new one, first try parsing it with your parser to make sure it is valid C--). Think carefully about programs you add to this suite to make sure they completely test all aspects of your code generator. As you add features to your compiler, re-test your compiler on your test suite to make sure you didn't break something that you previously had working.

How to add code generation

I strongly advise you to do a little bit of code generation at a time and test thoroughly after each new phase. Below is one possible set of steps to take.

  1. Define a codetable in global.h that can be used to store the stack machine code as you generate it. You do not want to write each instruction to the output file as you generate it because you will have some back patching to do. Instead, generate stack machine code to a codetable data structure and output the codetable to the target output file before your compiler exits.
  2. Define a function called codegen, that writes lines of stack machine code to this table and use a variable to keep track of the current output line number.
  3. Update main.c to open an output file, and after parsing and code generation, write the contents of the codetable to the output file.
  4. Add fields to your symbol table data structure. Think about what type of attributes you need to keep for each type of symbol table entry.

    Now you are ready to actually generate some code.

    Basic Functionality: due as Part 3

  5. Initially, ignore variables and function calls. Assume that all C-- code will simply be a main program that deals directly with numbers. So for now your first line of output code will always be: INC 0 3 and your last line will always be RTN 0 0.
  6. In parser.c, add calls to codegen for all basic arithmetic operations and numbers. Test with programs such as:
    int main() {
      write 3+4*7; // should be 31
      writeln;
    }
    
  7. Then add calls to codegen for all basic boolean operations. Remember to use the concept of back patching to implement || and && We need to branch to some line, but at the time we need to make the call to codegen we don't know which line. Save the current output line number, so that when you do know the branch target, you can go back and update it in the code table. Test with programs such as:
    int main() {
      write (4 == 5) || (5 < 6);  // should be 1 for true
      write (4 == 5) || (5 < 4);  // should be 0 for false
      write (4 == 5) && (5 < 6);  // should be 0 for false
      write (7 >= 7) && (8 <= 8); // should be 1 for true
      writeln;
    }
    
  8. Once these are working correctly, add calls to codegen for if statements. Test with programs such as:
    int main() {
      if (!1)  // should output -1
        write 1;
      else
        write -1;
      writeln;
      if (1 == 1) // should output 1
        write 1;
      else
        write -1;
      writeln;
    }
    
  9. Then add calls to codegen for while loops. Initially don't worry about break statements. Test with programs such as:
    int main() {
      while (1)  // this will create an infinite loop
        write 1;
    }
    int main() {
      write 1;
      writeln;
      while (0) { // this loop should never get executed
        write 2;
        writeln;
      }
      write 3;
      writeln;
    }
    

The parts described above are what you must get working for Part 3

Part 4: Once this basic functionality is in place, then move on to the more complicated aspects of code generation.


What to Hand in

  1. Submit the following using cs75handin:
    A tar file containing:
    1. All the source header files and makefile I need to build and run your compiler.
    2. A README file with:
      1. Your name and your partner's name
      2. The number of late days you have used on this assignment
      3. The total number of late days you have used so far
    3. A sample of your parser's output on a good (and short) test C-- program that you devised. Annotate the output listing the parts (statement(s) of the C-- program) that are being parsed at different points in the output. Look at my unix help pages about using script to capture terminal I/O to a file.
    4. A copy of the test C-- program for which you submitted sample output.

  2. In addition you should submit a project report (may be submitted on-line as part of your tar file or as a hardcopy in class). If you submit it on-line it must be in either postscript, pdf, or ascii (with line length of < 80 chars) format.

    In 1 to 2 pages give an overview of the design of your compiler based on the approach to compiler construction we've adopted in this course. You should describe any special features of your compiler. If there are any aspects that have not been fully implemented or are buggy, you should discuss them here. And, include a copy of your LL(1) grammar.

  3. After submitting your solution, you and your partner will sign up for a 20 minute demo where you will run your compiler for me and show me all the cool things it can do. In preparing for your demo think about coming up with some good test programs that you can use to demonstrate your compiler's features. This is a good time to show off places where you are generating efficient code for C-- constructs, and to show off any extra credit features you implemented.