Here is 2 patches (attached, if mailman doesn't drop them) for CL-PPCRE for mixing parse trees and regex strings.
The first patch add a (?.<name>) syntax, where <name> designate a keyword, and is case sensitive. The ?. was chosen to match the idea of the #. reader macro. It includes the synonym parse tree from the corresponding keyword while the regex is parsed.
The second patch add the (:REGEX <regex>) construct to use regex string into parse tree. It's the opposite idea of the former patch.
The rationale is that sometimes it's preferable to use regex strings (for compactness), while sometimes it's better to use parse tree (when programmatically computed), but to my knowledge it was impossible to mix them easily.
I hope that these patches are not too "hackish". In particular, the reference to a synonym must be a keyword, it's not possible to specify symbols in other packages. Also, since cl-ppcre use symbol property, it's worst to use keywords (unless one take cares to avoid name clashes.) I'm not sure how to fix that.
These patches are more to describe what I've in mind rather than providing production quality patches.
What do you think about these suggestions ?
Examples of use:
-=-=- CL> (define-parse-tree-synonym :foo (:sequence #\a (:greedy-repetition 1 3 (:alternation #\b #\c)))) CL> (scan-to-strings "b(ar(?.FOO))a" "baracca") "baracca" #("aracc") -=-=-
Mixing the other way:
-=-=- CL> (scan-to-strings '(:sequence "b" (:register (:sequence "ar" (:regex "a(?:b|c){1,3}"))) "a") "baracca") "baracca" #("aracc") CL> -=-=-
Hi!
On Sat, 01 Jul 2006 15:57:26 +0200, Frédéric Jolliton cl-ppcre-devel@frederic.jolliton.com wrote:
The first patch add a (?.<name>) syntax, where <name> designate a keyword, and is case sensitive. The ?. was chosen to match the idea of the #. reader macro. It includes the synonym parse tree from the corresponding keyword while the regex is parsed.
Thanks for this patch, but as you wrote in your email, I think this one is a little bit too hackish. It also breaks compatibility with Perl syntax.
The second patch add the (:REGEX <regex>) construct to use regex string into parse tree. It's the opposite idea of the former patch.
And thanks for this one as well. I've made a new release (1.2.15) which incorporates your changes.
Cheers, Edi.
The first patch add a (?.<name>) syntax, where <name> designate a keyword, and is case sensitive. The ?. was chosen to match the idea of the #. reader macro. It includes the synonym parse tree from the corresponding keyword while the regex is parsed.
Thanks for this patch, but as you wrote in your email, I think this one is a little bit too hackish. It also breaks compatibility with Perl syntax.
Ok, then I've another suggestion. Let (:REGEX <string>) take optionally more symbols, and use place holders in <string> to insert corresponding syntax trees. For example:
(dpts tree1 (:regex "a{2,5}")) (dpts tree2 (:regex "b{1,3}")) (dpts tree3 (:regex "foo((?~)-bar-(?~)+)baz" tree1 tree2))
Where (?~) is the place holder. Or something else which doesn't break compatibility with Perl syntax.
Without such a feature, the last tree would have been:
(dpts tree3 (:sequence "foo" (:register (:sequence tree1 "-bar-" (:greedy-repetition 1 nil tree2))) "baz"))
(Where dpts = ppcre:define-parse-tree-synonym)
Is that a better alternative ?
On Mon, 03 Jul 2006 16:26:51 +0200, Frédéric Jolliton cl-ppcre-devel@frederic.jolliton.com wrote:
Ok, then I've another suggestion. Let (:REGEX <string>) take optionally more symbols, and use place holders in <string> to insert corresponding syntax trees. For example:
(dpts tree1 (:regex "a{2,5}")) (dpts tree2 (:regex "b{1,3}")) (dpts tree3 (:regex "foo((?~)-bar-(?~)+)baz" tree1 tree2))
Where (?~) is the place holder. Or something else which doesn't break compatibility with Perl syntax.
(?~) is not special in Perl, so this /would/ break compatibility with Perl syntax. In fact, everything would break compatibility.
Apart from that, you'd have to change the parser accordingly, you'd have to check if the number of occurrences of (?~) is equal to the number of optional parameters, you'd have to check that (?~) is only used within (:REGEX ...), and so on.
Without such a feature, the last tree would have been:
(dpts tree3 (:sequence "foo" (:register (:sequence tree1 "-bar-" (:greedy-repetition 1 nil tree2))) "baz"))
(Where dpts = ppcre:define-parse-tree-synonym)
Is that a better alternative ?
I don't think it's worth the trouble. My personal opinion is that for complicated regular expressions you should use the S-expression syntax anyway. YMMV, of course.
Cheers, Edi.
Ok, then I've another suggestion. Let (:REGEX <string>) take optionally more symbols, and use place holders in <string> to insert corresponding syntax trees. For example:
[...]
Is that a better alternative ?
I don't think it's worth the trouble. My personal opinion is that for complicated regular expressions you should use the S-expression syntax anyway. YMMV, of course.
Ok. Indeed, it is adding too much complexity. I will stick with s-exp.
And thanks again for this great package !
cl-ppcre-devel@common-lisp.net