sekcproject is more than just a compiler. It's a doorway into the vast, dare a say limitless reaches of meta-programming. Why bother writing your program at all, when you can write a program to write your program? But more on that later. The
sekcproject is Sean Finney and Erik Osheim's semester-long project for CS 75. The end goal of the project as outlined in CS 75 is to implement a more-or-less (more less and less more) fully functional compiler for a given simple LL(1) programming language. The end goal of the
sekcproject, on the other hand, is to do the same but for any given language.
the lexical analyzer is the module of the compiler that reads an input file (containing the programmer's source code) and translates what it reads in into a series of meaningful tokens, which can then be parsed later on by the next stage of the compilation process.
before we could begin coding, we needed a Deterministic Finite Automaton (DFA), which is the most straightforward method to translate a tokenizing process for a language from a conceptual level to something that can be easily, reliably, and efficiently coded. This is where the seeds of meta-programming were sowed in our heads. The method for constructing a DFA is itself a straightforward algorithm, which too, can be easily (haha) implemented in C-code. But at the time we hadn't fully grasped this concept, and did this by hand like everyone else.
before we could make a DFA, we needed a Non-deterministic Finite Automaton (NFA). here's what we made, following the algorithm we learned in class word-for-word, no shortcuts taken:
and then, following the epsilon closure algorithm (again, about as strictly as possible) we came up with the resulting DFA:
next we took tia's grammar, and did all kinds of horrible things to it like removing left factoring and lurking left recursion, amoung other things. Again, we realized that this wouldn't necesarily have to be done by hand, if we could come up with a way to write a program to do it. So, Erik took it upon himself to write up a small java program to take in a plaintext file specifying the grammar in a simple and straightforward format, and would do everything necessary to determine and output the grammar in an LL(1) form.
now we have a parsing transition table based on the language, and from that we wrote a top-down parser.
blah blah blah