Hi,
I'm using CL-PPCRE to develop a character-at-a-time lexer. This is causing me some perplexity, though, with regexes like the common notation for hexadecimal literals:
"^0(?:x[0-9A-Fa-f]+)?$"
This should match both the string "0", between positions 0 and 1, as just a bare literal zero, and should also match things like "0xa6" between positions 0 and 3, but should not match simply "0x". But I want the longest match possible, so (for example) I'd like to know that while "0x" didn't match, parts of the regex *did* match and might produce a "real" match depending on what comes after "x".
So, in succession, if the input is "0xa6 ", my scanner gets called thus:
1. Input: "0". a) A match. b) But it *could* possibly match more, depending on what comes next. 2. Input: "0x". a) Not a match. b) But, once again, the possibility exists that more input could still produce a longer match than "0". 3. Input: "0xa". a) A match. b) Because of the "+" attached to the character class, a longer match is still possible. 4. Input: "0xa6". a) A match. b) As above. 5. Input: "0xa6 ". a) Not a match. b) Will *never* match no matter how much more input you add to it.
CL-PPCRE just tells me a), and I also want to know b). Is there any way to get this information (if it even exists) out of the scanner?
TIA,
-Dan -- Dan Debertin | airboss@nodewarrior.org | www.nodewarrior.org |