On Fri, 16 Feb 2007 12:58:42 +0100, Edi Weitz edi@agharta.de wrote:
Of course, if you're really adventurous, you could look at the source code of CREATE-SCANNER-AUX in CL-PPCRE and think about efficient variants of ADVANCE-FN for searching backwards. My guess (from looking at the Emacs C code for two minutes) is that this is more or less what Emacs is doing as well.
I forgot: In an empty Emacs *scratch* buffer type "aaaaaaaa" (eight #\a's) and put point in the middle (after the fourth #\a). Then evaluate (using eval-expression) the following
(re-search-forward "a+")
This should give you 9 and is what one would expect - the regex engine matches the four #\a's after point.
Now put point back in the middle of the string and evaluate
(re-search-backward "a+")
That'll give you 4, i.e. the engine matches (only) the fourth #\a - a string of length one.
I think this confirms my point that Emacs somehow has to go backwards and step by step while the regular expressions themselves still "match forwards" - so to say. It also shows that scanning backwards somehow destroys the semantics of some of the regex constituent - "*" or "+" used to mean "longest possible match", but is a string of length one really the longest match?