CS 75 Project 2b: Recursive-descent parser for C--

Due: Friday, March 4 by 11:59pm
Introduction

Run update75 to get the directory cs75/projects/2b/. Put your files for this project here.

For this project you will implement your LL(1) grammar as a recursive descent parser. Before you begin, you should correct any problems in your LL(1) grammar and copy the final version (which should be called LL1grammar) into the cs21/projects/2b directory. Here is an overview of the key steps you'll need to complete:

  1. Create a parser called parser.py that exactly mimics your LL(1) grammar. For each non-terminal in the grammar your parser should have a method that implements its production rules. Verify that your parser is correct by trying it on a number of valid as well as invalid C-- programs.
  2. Modify your parser so that it generates an abstract syntax tree (AST) represented as a list of lists. Using the formatting tools provided in the file formatAST.py, convert your list AST into a string, and then draw it using the nltk library. Verify that your AST is correct and summarizes the key features of the program being parsed.
  3. Finally, once your parser is working, and can generate an appropriate AST, add error recovery mechanisms.

1. Using the LL(1) grammar as the basis for a parser

In class, we discussed a simple expression parser that demonstrates how to implement an LL(1) grammar as a series of methods in a python class.

Notice that this parser uses a match method whenever a terminal is encountered in a production rule. This method attempts to match the desired terminal with the current token, and when a match is made it advances the scanner to the next token.

This parser also uses a debugging flag to easily monitor the program's progress as it parses a particular set of expressions.

You should include both of these features in your parser. Then use the debugging feature to verify that your parser is working correctly using both valid and invalid C-- programs. I have provided a number of test programs for you in the directory cs75/projects/tests. You should create additional test programs of your own and add them here.


2. Generating an abstract syntax tree while parsing

In class, we discussed how to modify the original expression parser given above to also return an abstract syntax tree. This parser with AST demonstrates how you can construct a python list, while parsing, to represent a tree that reflects the structure of the program being compiled.

You should structure your abstract syntax tree as follows:


Example of a basic program

Consider the following program:

int x;

int double(int a) {
   return a * 2;
}

int main() {
   int i;

   read x;
   i = 0;
   while (i < 10) {
     write double(x);
     writeln;
     i = i + 1;
   }
}
My parser returns the following AST, which is shown in pretty print format:
[ 'program',
  ['globals', [['int', 'x'], ['line', 1]]],
  [ 'functions',
    [ 'double',
      'int',
      ['parameters', ['int', 'a']],
      [ 'block',
        ['locals'],
        ['return', ['mul', ['id', 'a'], ['num', 2]], ['line', 4]]]],
    [ 'main',
      'int',
      ['parameters'],
      [ 'block',
        ['locals', [['int', 'i'], ['line', 8]]],
        ['read', 'x', ['line', 10]],
        ['assign', ['id', 'i'], ['num', 0], ['line', 11]],
        [ 'while',
          ['ls', ['id', 'i'], ['num', 10]],
          [ 'block',
            ['locals'],
            [ 'write',
              ['funcall', 'double', ['arguments', ['id', 'x']]],
              ['line', 13]],
            ['writeln', ['line', 14]],
            [ 'assign',
              ['id', 'i'],
              ['add', ['id', 'i'], ['num', 1]],
              ['line', 15]]],
          ['line', 12]]]]]]
When drawn this produces the following tree:



Example of a program that uses arrays
Consider the following program (test_27.c-- in your test cases):
int main() {
  int vals[2];
  vals[0] = 6;
  vals[1] = 5;
  printArray(vals);
}

int printArray(int a[]) {
  write a[0];
  writeln;
  write a[1];
  writeln;
}
My parser returns the following AST, which is shown in pretty print format:
[ 'program',
  ['globals'],
  [ 'functions',
    [ 'main',
      'int',
      ['parameters'],
      [ 'block',
        ['locals', [['array', 2, 'int', 'vals'], ['line', 2]]],
        [ 'assign',
          ['arrayref', 'vals', ['num', 0]],
          ['num', 6],
          ['line', 3]],
        [ 'assign',
          ['arrayref', 'vals', ['num', 1]],
          ['num', 5],
          ['line', 4]],
        [ 'funcall',
          'printArray',
          ['arguments', ['id', 'vals']],
          ['line', 5]]]],
    [ 'printArray',
      'int',
      ['parameters', ['array', 'int', 'a']],
      [ 'block',
        ['locals'],
        ['write', ['arrayref', 'a', ['num', 0]], ['line', 9]],
        ['writeln', ['line', 10]],
        ['write', ['arrayref', 'a', ['num', 1]], ['line', 11]],
        ['writeln', ['line', 12]]]]]]
When drawn this produces the following tree:



3. Error Recovery

Once your parser works and is returning an appropriate AST, add some simple error recovery. You should implement basic panic and phrase-level recovery schemes for a single missing or incorrect token. In addition, you should integrate existing error recovery from your scanner in cases where there are missing or incorrect characters in the input that can be easily handled.

When your compiler detects an error it should print out an error message and line number as shown in these examples. If possible, it should recover from the error and continue parsing. Otherwise it should exit and end the program. You are welcome to try implementing more complicated error recovery as well.


Submit
Run handin75 to turn in your parser by the deadline.