Friday, October 21, 2011

The Weekly Source Code 59 - An Open Source Treasure: Irony .NET Language Implementation Kit

Select Grammars Dialog in Irony filled with grammarsOne of the best, if not the best way to sharpen the saw and keep your software development skills up to date is by reading code. Sure, write lots of code, but don't forget to explore other people's brains code. There's always fifteen different ways to create a "textboxes over data" application, and while it's interesting to take a look at whatever the newest way to make business software, sometimes it's nice to relax by looking at some implementations of classic software issues like parsers, lexers, and abstract syntax trees. If you didn't go to school or failed to take a compilers class at least knowing that this area of software engineering exists and is accessible to you is very important.

It's so nice to discover open source projects that I didn't know existed. One such project I just stumbled upon while doing research for a customer is "Irony," a .NET language implementation kit. From the CodePlex site:

Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. Irony's scanner and parser modules use the grammar encoded as c# class to control the parsing process. See the expression grammar sample for an example of grammar definition in c# class, and using it in a working parser.

Irony includes "simplified grammars for C#, Scheme, SQL, GwBasic, JSON and others" to learn from. There are different kinds of parsers that are grammar generators you might be familiar with. For example, ANTLR is a what's called a LL(*) grammar generator, while Irony is a LALR (Look Ahead Left to Right) grammar generator.

Here's a very basic SQL statement for getting a show from my Podcast database:

SELECT ID, Title FROM Shows WHERE ID = 1

Here's the Irony Parse Tree as viewed in the irony Grammar Explorer:

A complete parse tree of the SQL statement with every node expanded

Typically, in my experience, when creating a parser one will use a DSL (Domain Specific Language) like the GOLD Meta Language building on the BNF (Backus-Naur Form) expression grammar. These domain specific languages are tightly optimized to express exactly how a language is structured and how it should be parsed. You learn a language to create languages.

Remember in the Irony intro text earlier? Let me repeat:

Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs.

What Roman from Irony has done here is use C# language constructs as if it's a DSL. A fluent parser, as it were. So he's using C# classes and methods to express the language grammar. It's a very interesting and powerful idea if you are interested in creating DSLs but not interested in learning other parsers like GOLD. Plus, it's just fun.

The Irony Grammar Explorer

He has a very rich bass class called Grammar that you derive from, like:

[Language("SQL", "89", "SQL 89 grammar")]
public class SqlGrammar : Grammar {
public SqlGrammar() : base(false) { //SQL is case insensitive
...

But instead of a grammar language like this (simplified by me) to express a SQL SELECT Statement:

! =============================================================================
! Select Statement
! =============================================================================

<Select Stm> ::= SELECT <Columns> <Into Clause> <From Clause> <Where Clause> <Group Clause> <Having Clause> <Order Clause>

<Columns> ::= <Restriction> '*'
| <Restriction> <Column List>

...snip for clarity...

<Restriction> ::= ALL
| DISTINCT
|

<Aggregate> ::= Count '(' '*' ')'
| Count '(' <Expression> ')'
| Avg '(' <Expression> ')'
| Min '(' <Expression> ')'
| Max '(' <Expression> ')'
| StDev '(' <Expression> ')'
| StDevP '(' <Expression> ')'
| Sum '(' <Expression> ')'
| Var '(' <Expression> ')'
| VarP '(' <Expression> ')'

<Into Clause> ::= INTO Id
|

<From Clause> ::= FROM <Id List> <Join Chain>

<Join Chain> ::= <Join> <Join Chain>
|

...snip for clarity...

You'd have something like this instead, again, simplified so this doesn't turn into a giant listing of code rather than a blog post.

//Select stmt
selectStmt.Rule = SELECT + selRestrOpt + selList + intoClauseOpt + fromClauseOpt + whereClauseOpt +
groupClauseOpt + havingClauseOpt + orderClauseOpt;
selRestrOpt.Rule = Empty | "ALL" | "DISTINCT";
selList.Rule = columnItemList | "*";
columnItemList.Rule = MakePlusRule(columnItemList, comma, columnItem);
columnItem.Rule = columnSource + aliasOpt;
aliasOpt.Rule = Empty | asOpt + Id;
asOpt.Rule = Empty | AS;
columnSource.Rule = aggregate | Id;
aggregate.Rule = aggregateName + "(" + aggregateArg + ")";
aggregateArg.Rule = expression | "*";
aggregateName.Rule = COUNT | "Avg" | "Min" | "Max" | "StDev" | "StDevP" | "Sum" | "Var" | "VarP";
intoClauseOpt.Rule = Empty | INTO + Id;
fromClauseOpt.Rule = Empty | FROM + idlist + joinChainOpt;
joinChainOpt.Rule = Empty | joinKindOpt + JOIN + idlist + ON + Id + "=" + Id;
joinKindOpt.Rule = Empty | "INNER" | "LEFT" | "RIGHT";
whereClauseOpt.Rule = Empty | "WHERE" + expression;
groupClauseOpt.Rule = Empty | "GROUP" + BY + idlist;
havingClauseOpt.Rule = Empty | "HAVING" + expression;
orderClauseOpt.Rule = Empty | "ORDER" + BY + orderList;

Here the variables and terms that are being use to build the grammar were defined earlier like this, as an example:

var SELECT = ToTerm("SELECT"); 
var FROM = ToTerm("FROM");
var AS = ToTerm("AS");

You might immediately declare, Dear Reader, that this is blasphemy!  How can C# compete with a specialized DSL like the BNF? This is a C# shaped peg being shoved into a round hold. Well, maybe, but it's interesting to point out that the SQL GOLD Grammar is 259 lines and the C# version of essentially the same thing is 247 lines. Now, I'm not pointing out line numbers to imply that this is a better way or that this is even a valid 1:1 comparison. But, it's interesting that the C# class is even close. You might have assumed it would be much much larger. I think it's close because Roman, the Irony developer, has a very well factored and specialized base class for the derived class to "lean on." Each of his sample grammars are surprisingly tight. For example:

  • "Mini" Python - ~140 lines
  • Java - ~130 lines
  • Scheme - ~200 lines
  • JSON - 39 lines

To conclude, here's the JSON grammar generator. 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Irony.Parsing;

namespace Irony.Samples.Json {
[Language("JSON", "1.0", "JSON data format")]
public class JsonGrammar : Grammar {
public JsonGrammar() {
//Terminals
var jstring = new StringLiteral("string", "\"");
var jnumber = new NumberLiteral("number");
var comma = ToTerm(",");

//Nonterminals
var jobject = new NonTerminal("Object");
var jobjectBr = new NonTerminal("ObjectBr");
var jarray = new NonTerminal("Array");
var jarrayBr = new NonTerminal("ArrayBr");
var jvalue = new NonTerminal("Value");
var jprop = new NonTerminal("Property");

//Rules
jvalue.Rule = jstring | jnumber | jobjectBr | jarrayBr | "true" | "false" | "null";
jobjectBr.Rule = "{" + jobject + "}";
jobject.Rule = MakeStarRule(jobject, comma, jprop);
jprop.Rule = jstring + ":" + jvalue;
jarrayBr.Rule = "[" + jarray + "]";
jarray.Rule = MakeStarRule(jarray, comma, jvalue);

//Set grammar root
this.Root = jvalue;
MarkPunctuation("{", "}", "[", "]", ":", ",");
this.MarkTransient(jvalue, jarrayBr, jobjectBr);

}//constructor
}//class
}//namespace

Pretty clever stuff, and a well put together project and solution that is well structured. I could myself using this in a C# or Compiler class to teach some of these concepts.

It's also a great little tool for creating small languages of your own. Perhaps you have a Wiki-dialect that's specific to your company and you want to get rid of all that nasty manual parsing? Or many you have an old custom workflow engine or custom expression system embedded in your application and never got around to changing all your parsing to a proper grammar? Maybe now is the time to get that little language you've been thinking about off the ground!

I encourage you, Dear Reader, to support open source projects like this. Why not go leave a comment today on your favorite open source project's site and just let them know you appreciate what they're doing?

Related Links



� 2011 Scott Hanselman. All rights reserved.


Source: http://feedproxy.google.com/~r/ScottHanselman/~3/SKh6688vA1g/TheWeeklySourceCode59AnOpenSourceTreasureIronyNETLanguageImplementationKit.aspx

Palm HP Palm Pre HP Touchpad

No comments:

Post a Comment