Dear mailing list member,
There have been a few exciting developments since the previous progress report.
First, we (me, Brian Mastenbrook, Christophe Rhodes) have a paper about Climacs syntax protocols accepted at the ILC 2005. In it, we explain the difference between per-buffer and per-window syntax updating and give examples of modules that use these methods. When reading what my co-authors had written, I was struck by the number of syntax modules that already exist for Climacs and how varied they are: a very complete syntax module for ISO Prolog, a module for Tabcode notation used for Lute scores, a module for Testing and Test Control Notation 3 (TTCN3), and of course existing but incomplete syntax modules for Common Lisp and HTML.
Second, due to difficulties achieving good performance with the Earley parser, I have written an alternative parser based on incremental LR parsing technology. I also wrote the definitions necessary for this module to parse Common Lisp. This syntax module is named "Lisp" to distinguish it from the one called "Common Lisp". It is experimental for now, so both are available. The incremental LR parser keeps a parse tree for the entire buffer up to date. It is thus a per-buffer module, which would be impossible with the current Earley parser. Parsing all of lisp-syntax.lisp (around 1000 lines) takes around 2 seconds on my machine, but this is only done when the file is initially loaded, or when edits to the buffer are made that makes the contents parse entirely differently from last time, such as when a closing parenthesis is introduced at the beginning of the buffer. In almost all cases, the incremental LR parser is able to patch the existing parse tree. As an example, typing a function definition at the beginning of a buffer containing lisp-syntax. lisp requires less than 10ms per keystroke to update the entire parse tree.
Current feature of the "Lisp" syntax module include C-M-b and C-M-f (backward and forward expression), C-M-x (eval defun), and of course syntax highlighting of strings, keywords, and comments.
One interesting feature of the LR parser that I implemented is that the lexer is a generic function that depends on the state of the parser. This method is slower than the one used for the Earley parser, but we could afford this since the parser is incremental. The consequences of this lexer organization is that the parser could enter a state in which a totally different language, with different lexemes, is to be parsed. This feature opens the door to modular parsers allowing us to parse a string of PHP code inside an HTML buffer, or to check the grammar of strings and comments of a program.
Plans are to factor out code that could be common for more LR-based syntax modules, either into syntax.lisp or to a new file, but probably the former since there are definitions like that of a parse tree that look very similar in the Earley parser and in the LR parser. I also haven't given up on trying to make the Earley parser truly incremental in the same way that the LR parser now is.
Plans also include writing a new LALR parser generator, simply because the one in Climacs uses generic functions instead of tables for the traditional `action' and `goto' (called `new-state' in Climacs) functions of LR parsing. The new parser generator would have to discover inheritance relations between parser states. Once such a generator exists, I might rewrite the HTML syntax module to take advantage of it.
For a detailed list of updates, I refer you to the CVS archives.
Take care,
climacs-announce@common-lisp.net