I’ve always been fascinated by compilers. They seem almost magical, translating some source language to a target language or object code. And even though I understand how they work, it’s still looks like magic.
Compilation relies on parsing, which is one of the more difficult parts of a typical compiler. There are several ways to do parsing, most of them using some automated code generation tools (the classics being lex (techinally the scanner, or tokenizer) and yacc (the parser), with later versions fondly named flex and bison).
One way to write a parser is known as a recursive-descent parser, which is probably the only viable method for hand writing a parser. To this end, I’ve created a simple and useful (in my humble opinion) console calculator project on codeplex, that allows parsing of simple mathematical expressions with support for functions and variables. This demonstrates how a typical recursive-descent parser can be written (with a simple scanner to “tokenize” the input strings).
Here’s a typical interaction with the console calculator:
>> 2+3*4+5
ans = 19
>> sin(30)
ans = -0.988031624092862
>> set deg
Ok.
>> sin(30)
ans = 0.5
>> 2*pi
ans = 6.28318530717959
>> x=cos(60)*10
ans = 5
>> sqrt(x)
ans = 2.23606797749979
>> atan(1)*4 - pi
ans = 176.85840734641
You can type help at the prompt to get the list of supported functions.
The Parser class does the heavy lifting of parsing the expressions provided. The Scanner class tokenizes the input string, so the parser doesn’t have to deal with every single character. The Calculator class wraps it up in a nice package.
Technically, this is not a compiler, but an interpreter. It parses the input and executes it as it goes along.
I created it as a single Console Application in C# with Visual Studio 2010 project for simplicity sake, although it should have been two projects: the one as a library of the actual parser etc., and the other a console client, opening the possibility for other clients.
You can download the project source code here.