CS 75 Project 1: Scanner

Parts 1-3: Due by 11:59pm Monday, Jan. 31
Part 4: Due by 11:59pm Monday, Feb. 7
Introduction

Do an update75 to get the starting point files for this project. This will create the directories cs75/projects/1a and cs75/projects/1b.

Implementing the code for the lexical analyzer is straight forward once you have developed a correct DFA that accepts all of the tokens in the language. Therefore, you will complete parts 1-3 (given below) and turn them in before beginning on the implementation in part 4.

  1. List the set of tokens to be returned by your lexical analyzer in the file:
    cs75/projects/1a/tokens
  2. Define regular expressions for this set of tokens in the file:
    cs75/projects/1a/regular-expressions
  3. Derive a single DFA from your regular expressions. Turn in a hand-written drawing of your DFA in the slot outside my office door or provide a computer-generated drawing in the 1a directory.
  4. Implement the DFA in a Python program similar in structure to the scanner example provided for the infix to postfix translator we discussed in class. Specifically, each state of your DFA should be implemented as a method in your LexicalAnalyzer class. Your implementation should be in the file: cs75/projects/1b/scanner.py
Specifications
Testing your scanner

Below is an example of C-- code that can be used to test your scanner. Note that your actual token names do not need to be the same as the ones shown in this examples.

// this will be ignored
int main() {
  x = 0;
  _abc = 8;

  /* this too 
     will be
     ignored */

  if (x & 10) 
    write x;
  else
    ;

  return 0000042
}

Notice that there are a number of errors in this code, some can be caught by the scanner, some by the parser, and some can only be recognized in code generation. For example the last statement is missing a semi-colon. This error can be recognized by the parser, but not by the scanner. Also, variables must be declared at the top of a block in C--, but the variables x and _abc have not been declared. This error can be caught in code generation, but not by the scanner. However the scanner can recognize that the && operator must consist of two ampersands. In addition, the scanner can recognize that an integer may not begin with leading zeros. In both of these cases, the scanner returns an err token with an associated message.

Here is the type of output that should be produced by the scanner for the above test file.

int
id main
lparen
rparen
lbrace
id x
assign
num 0
semi
id _abc
assign
num 8
semi
if
lparen
id x
err Line 10: missing ampersand in and
num 10
rparen
write
id x
semi
else
semi
return
err Line 15: integers cannot have leading zeros 0000042
rbrace
done
Submit

Run handin75 by the end of Monday, Jan. 31 to turn in parts 1-3. Run it again by the end of Monday, Feb. 7 to turn in part 4, the implemented scanner.