Update of /project/climacs/cvsroot/papers/ilc2005/syntax In directory common-lisp.net:/tmp/cvs-serv2490
Modified Files: climacssyntax.bib climacssyntax.tex Log Message: Mostly describe Prolog syntax. One or two fixups
Date: Mon May 23 12:21:41 2005 Author: crhodes
Index: papers/ilc2005/syntax/climacssyntax.bib diff -u papers/ilc2005/syntax/climacssyntax.bib:1.6 papers/ilc2005/syntax/climacssyntax.bib:1.7 --- papers/ilc2005/syntax/climacssyntax.bib:1.6 Sat May 21 18:32:16 2005 +++ papers/ilc2005/syntax/climacssyntax.bib Mon May 23 12:21:41 2005 @@ -153,4 +153,18 @@ publisher = {Springer-Verlag}, year = {1991, 1999--}, note = {http://www.finseth.com/craft%7D -} \ No newline at end of file +} + +@Manual{ISOProlog, + title = "{Information technology -- Programming languages -- Prolog -- Part 1: General core}", + key = {ISO/IEC 13211-1:1995}, + OPTauthor = {}, + organization = {ISO}, + OPTaddress = {}, + OPTedition = {}, + OPTmonth = {}, + year = {1995}, + OPTnote = {}, + OPTannote = {} +} +
Index: papers/ilc2005/syntax/climacssyntax.tex diff -u papers/ilc2005/syntax/climacssyntax.tex:1.15 papers/ilc2005/syntax/climacssyntax.tex:1.16 --- papers/ilc2005/syntax/climacssyntax.tex:1.15 Mon May 23 06:06:39 2005 +++ papers/ilc2005/syntax/climacssyntax.tex Mon May 23 12:21:41 2005 @@ -61,10 +61,13 @@ history and an apparent ability to cause passionate argument. Climacs is philosophically part of the Emacs editor tradition, which has spawned many variants with many different approaches to buffer -management, incremental redisplay, and syntax analysis. Emacs itself +management, incremental redisplay, and syntax analysis. Emacs itself traces its lineage to TECO, where Emacs was originally implemented as -a set of TECO macros. Climacs is compared to several interesting -variants of Emacs in Table \ref{table:editorcompare}. +a set of TECO macros. Climacs is compared to several interesting +variants of Emacs in Table \ref{table:editorcompare}; more information +about text editing in general, and some editors we shall not discuss +further, can be found in \cite{FinsethCraft,greenberg,Pike94,woodZ} +and references therein.
\begin{figure*} \begin{center} @@ -107,17 +110,17 @@ font-lock deficiencies.
The Climacs text editor is a combination of frameworks for buffer -representation and parsing, loosely coupled with a CLIM-based display -engine. It includes the Flexichain library \cite{flexichain}, which -provides an editable sequence representation and mark (cursor) -management using a simple linked list for implementing the buffer -protocol. Climacs also includes an implementation of a slight -modification of the Earley parsing algorithm \cite{earley}, to assist -in the creation of syntax-aware editing modes, though such modes can -use any appropriate parsing algorithm. An application can combine a -particular implementation of the buffer protocol, the syntax protocol, -and its own display methods to produce a sophisticated editor for a -particular language. +representation and parsing, loosely coupled with a CLIM-based +\cite{mckayacm} display engine. It includes the Flexichain library +\cite{flexichain}, which provides an editable sequence representation +and mark (cursor) management using a simple linked list for +implementing the buffer protocol. Climacs also includes an +implementation of a slight modification of the Earley parsing +algorithm \cite{earley}, to assist in the creation of syntax-aware +editing modes, though such modes can use any appropriate parsing +algorithm. An application can combine a particular implementation of +the buffer protocol, the syntax protocol, and its own display methods +to produce a sophisticated editor for a particular language.
The Climacs buffer protocol, which provides a standard interface to common text editor buffer operations, uses the Flexichain library; we @@ -182,8 +185,8 @@ point of edit before an edit operation is allowed. When the buffer is large and edit operations occur frequently at multiple locations in the buffer, this requires a substantial amount of copying between -edits. In this situation single-gap-buffer editors like Emacs will -noticably pause between edits to move the gap. A structure which +edits. In this situation single-gap-buffer editors such as GNU Emacs +will noticably pause between edits to move the gap. A structure which contains a sequence of lines and keeps the most recently used lines open as gap buffers can operate as a multi-gap buffer with automatic gap placement, while not providing pessimistic performance when @@ -193,14 +196,14 @@ implementation of the buffer protocol that is used. Space efficiency will also depend on the implementation of Common Lisp which is used to run Climacs. In Steel Bank Common Lisp, the type character is -represented with the full 22 bits used by the Unicode character -space. External character encodings other than Unicode are converted -to Unicode when a file is read in to memory. If the characters are -stored in a specialized array, this will net a worst case space -efficiency of four bytes of every byte in the file. However, the time -advantages of this representation outweigh the space -inefficiency. Searching for an individual character in a sequence of -characters encoded in UTF-8 is $O(n)$, because each individual +represented with the full 21 bits used by the Unicode character space. +External character encodings other than Unicode are converted to +UTF-32 when a file is read in to memory. If the characters are stored +in a specialized array, this will net a worst case space efficiency of +four bytes of every byte in the file. However, the time advantages of +this representation outweigh the space inefficiency. Searching for an +individual character in a sequence of characters encoded in UTF-8 (or +other variable-length encoding) is $O(n)$, because each individual character must be examined to determine the number of octets which are stored to represent that character.
@@ -208,7 +211,7 @@ which uses one machine word per element, either as an immediate value or as a pointer to a larger element. Climacs buffers can contain any object, so in a suitably complex syntax and buffer protocol -implementation an object in a buffer may +implementation an object in a buffer may ... may what? FIXME
\section{Syntax Protocol} \label{sec:syntax} @@ -243,13 +246,6 @@ \ref{fig:syntaxclasses} shows the organization of classes in the TTCN3 grammar.
-Climacs provides an Earley parser \cite{earley} which is sufficient to -parse most context-free grammars. Earley parser: discussion of -generality and asymptotic efficiency (in general and for usual cases). -Possibility of replacement in case of complicated grammars? (What was -that parsing algorithm that someone mentioned on \verb+#lisp+? -\textit{I have no idea.}) - The syntax analysis can be applied either in a per-window or per-buffer function. The per-window approach is best suited to analysis of text where the parse tree will be used only for display @@ -286,18 +282,19 @@ implementation of the Earley algorithm that we provide will turn out to be sufficiently fast for most Climacs syntax modules. Other possibilities include the Tomita parsing algorithm which provides more -generality than LR, but which is nearly as fast in most cases. +generality than LR, but which is nearly as fast in most cases.
\section{Syntaxes} \label{sec:syntaxes}
We describe two different approaches to syntax analysis in the Climacs editor. Per-window parsing is used by the provided modes for HTML, -Common Lisp, Prolog, and TTCN3. Each of these syntaxes is implemented -with the provided Earley parser \cite{earley}. The lute tablature -editor uses a per-buffer function for its syntax analysis and -implements a simple recursive-descent parsing algorithm. (FIXME: is -this true? I don't know enough about parsing) +Common Lisp, Prolog, and a Testing Control Notation (TTCN3). Each of +these syntaxes is implemented with the provided Earley parser +\cite{earley}. The lute tablature editor uses a per-buffer function +for its syntax analysis and implements a simple recursive-descent +parsing algorithm. (FIXME: is this true? I don't know enough about +parsing)
\subsection{Per-Window Syntaxes}
@@ -312,25 +309,35 @@ rules to accommodate the level of analysis to parse the language's grammar.
-Christophe might want to describe the Prolog syntax a bit here. In -particular any details having to do with the implementation of -operator precedence as specified in the ISO Prolog standard might be -interesting. - -[ Maybe just the \textit{priority} stuff? I think that's -the only interesting bit of non-context-freeness; user-defined -operators is OK because you can't actually define and use a new -operator in the same directive. ] - -Float tokenizer (not interesting, but not yet done) - -Need for inheritable grammars / tokenizers: most Prologs deviate from -ISO slightly, mostly in surface syntax rather than in the engine. -Examples: \verb+"ab\x011"+, \texttt{0'''}, \texttt{[H|T]} when -\texttt{|} is an operator. - -Possibility (probably ``Future Work'') to make nice integrated Prolog -environment. +Implementing the Prolog syntax proved a good test of the established +framework. Firstly, and most importantly, ISO Prolog \cite{ISOProlog} +is not a context-free grammar: \textit{terms} have an implicit +priority affecting their parse.\footnote{Formally, the grammar could + be made context-free by introducing 1200 new production rules.} The +implementation of Earley's algorithm, however, was able to address +this additional complexity with no difficulty. + +Another area of difficulty is the fact that parsing a Prolog text can +change the grammar itself through the use of the \texttt{op/3} +directive: including +\begin{verbatim} + :- op(100,xfy,<>). +\end{verbatim} +in a Prolog text means that, after parsing this directive, the token +\texttt{<>} must be recognized as an right-associative operator with +priority 100 in the grammar. This is achievable by keeping a cache of +parsed \texttt{op/3} directives, and maintaining and invalidating it +in parallel with the parse of the buffer itself. + +While the Prolog syntax included with Climacs is a demonstration of +the potential of strongly syntax-aware editors, it is not yet a +practical tool for Prolog programmers, as most implementations of +Prolog deviate from strict ISO compliance. On the level of surface +syntax, interpretation of quoting rules and escape sequences in +strings are very variable, while additionally treatment of operators +in currently-available Prologs can differ markedly from the standard +requirements. Nevertheless, work is underway to use Climacs' syntax +analysis to provide a front-end for a Prolog development environment.
At present, one parse error implies the invalidation of the rest of the file. This adds a burden on the mode implementor that the syntax @@ -410,20 +417,20 @@ \end{center} \end{figure*}
-\TabCode\ is a textual format for description of lute tablature. In -its simplest form, it is a sequence of whitespace-delimited -independent words, where each word represents either a set of frets to -depress and strings to be sounded, or alternatively some element of -musical notation (such as a barline); figure \ref{fig:besfantlach} -demonstrates a fragment of manuscript, and its \TabCode\ encoding. It -is also possible to encode more complex elements of lute tablature -notation in \TabCode: ornaments, fingering marks, beaming, connecting -lines and other complex elements can all be accommodated (see figure -\ref{fig:barley} for examples of more these more complex elements). - -\TabCode\ has been used to produce scholarly editions of lute works -\cite{Weiss} and to computer-based musicological studies (as in -\cite{ecolm-graz} for example). +\TabCode\ \cite{tabcode} is a textual format for description of lute +tablature. In its simplest form, it is a sequence of +whitespace-delimited independent words, where each word represents +either a set of frets to depress and strings to be sounded, or +alternatively some element of musical notation (such as a barline); +figure \ref{fig:besfantlach} demonstrates a fragment of manuscript, +and its \TabCode\ encoding. It is also possible to encode more +complex elements of lute tablature notation in \TabCode: ornaments, +fingering marks, beaming, connecting lines and other complex elements +can all be accommodated (see figure \ref{fig:barley} for examples of +some of these more complex elements). \TabCode\ has been used to +produce scholarly editions of lute works \cite{Weiss} and to assist in +computer-based musicological studies (as in \cite{ecolm-graz} for +example).
\TabCode\ has a rather ad-hoc cobbled together grammar. Tokenising is easy; determining which elements of the notation are best captured is @@ -462,13 +469,14 @@ Given the relatively small amount of work (only a few person-months) that has been put into Climacs so far, it is already a very competent and stable editor. Using CLIM (and in particular the McCLIM -implementation) as the display engine has allowed the project to -progress much more rapidly than what would have been possible -otherwise. However, Climacs has also revealed some serious -limitations and performance problems of the McCLIM library. We -maintain that using CLIM and McCLIM was the best choice, and in fact -advantageous to other McCLIM users as well, since work is being done -to correct and improve it for use with Climacs. +\cite{ilc2002-moore} implementation) as the display engine has allowed +the project to progress much more rapidly than would otherwise have +been possible. However, Climacs development has also revealed some +serious limitations and performance problems of the McCLIM library. +Nevertheless, we maintain that using CLIM and McCLIM was the best +choice, and in fact advantageous to other McCLIM users as well, as the +deficiencies in McCLIM implementation are being addressed and other +improvements made for use with Climacs.
Due to its reliance on fairly well-defined protocols, the Climacs text editor framework is flexible enough to allow for different future @@ -483,7 +491,7 @@ us to improve those protocols as well as their corresponding implementations.
-Another future direction high up on the list of priorites is the +Another future direction high up on the list of priorities is the planned implementation of the buffer protocol. Representing a line being editied as a flexichain can greatly improve the performance of some crucial operations that currently require looping over each @@ -491,7 +499,14 @@ that are currently prohibitive include knowing the line- and column number of a given mark.
-However, our plans for Climacs go further than creating an improved +One disadvantage of the current parsing scheme is that a single parse +error prevents analysis of the rest of the buffer, which is +potentially disturbing to a user's workflow. For relatively simple +grammars such as \TabCode, it is simple enough to resynchronize at the +next token, whereas for more complex grammars the resolution is less +clear. + +Our plans for Climacs go further than creating an improved implementation of Emacs. We intend to make Climacs a fully-integrated CLIM application. This implies among other things using the presentation-type system of CLIM to exchange objects between Climacs @@ -502,19 +517,20 @@
We are often asked whether applications such as VM and Gnus for GNU Emacs will be available for Climacs. Our opinion is that such -applications currenltly run as GNU Emacs subsystems, simply because -GNU Emacs does not have an independent substrate such as CLIM for -creating user interfaces. Climacs is itself a CLIM application, and +applications currently run as GNU Emacs subsystems simply because GNU +Emacs does not have an independent substrate such as CLIM for creating +user interfaces. Climacs is itself a CLIM application, and applications such as mail readers and news readers that do not require editable buffers should instead be implemented directly as CLIM applications, perhaps calling Climacs to write messages.
-\nocite{*} +%\nocite{*}
\section*{Acknowledgments}
-C.R. is supported by EPSRC grant GR/-S84750/-01. B.M. acknowledges -Motorola's generous support. +The authors wish to thank Aleksandr Bakic for fruitful discussions. +B.M.\ acknowledges Motorola's generous support. C.R.\ is supported by +EPSRC grant GR/-S84750/-01.
\bibliographystyle{acm} \bibliography{climacssyntax}