On Sat, 5 Feb 2011 13:29:39 -0800, "Scott L. Burson" Scott@sympoiesis.com wrote:
Boolean grammars solve both these problems. Let's look at how they work.
The key insight is that we consider nonterminal definitions to be operations on predicates over strings. The definition
A ::= B | C
is fundamentally a disjunction: a string is an A if it's a B, or if it's a C. Why not also allow the other boolean operations of conjunction and negation? This is what boolean grammars do. So you can have rules like
A ::= B & ~C
which says that a string is an A if it's a B _and_ it's not a C.
If you consider the input to consist not of single characters but of pairs of (a character and its following character), you can express the longest-match rule. In practice, rather than making you do this explicitly, implementations like SBP provide a "followed by" notation that allows you to constrain the success of a production on the immediately following character. Thus you can write a production for "`if' followed by a token break character" fairly succinctly.
I'm surprised that nobody has mentioned OMeta/ometa2 as of yet.
It has this boolean functionality you speak of:
http://subvert-the-dominant-paradigm.net/blog/?p=38
It's unfortunate that the author wasn't very experienced with CL, so his implementation doesn't feel idiomatic and the code isn't always clean.
Besides this, and a somewhat idiosynractic syntax of OMeta grammars (which is, to degree, by necessity) it's a very interesting framework.
I've got modifications adding match tracing and source location support, which I've never got around to submit to the author.
I'd guess the next step would be converting it to use SEXP's for grammar descriptions.