The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, “Match a list containing between three and five integers”. This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5)
My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code?
Thanks, rg
Ron,
are you sure that a general-purpose pattern matching library like Optima ( https://github.com/m2ym/optima) would not be better than a generalized regular expression matching library for what you need to do?
-Hans
2014-03-19 1:15 GMT+01:00 Ron Garret ron@flownet.com:
The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, "Match a list containing between three and five integers". This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5)
My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code?
Thanks, rg
No, I’m not sure, but I strongly suspect that optima isn’t the right tool for my job.
My use-case is a macro I’m trying to write called DEFINE-RELATION, which, as one might suspect from a macro with this name, defines relations between instances of classes. For example:
(define-relation window <-> layout)
This means that every instance of a WINDOW is associated with an instance of a LAYOUT, and vice versa.
But not every relation can be defined simply by the classes. For example, one might want to define familial relationships among people:
(define-relation person as father <->> person as child)
This means that a single instance of a PERSON in the role of a father is associated with multiple instances of PERSON in the role of a child.
or…
(define-relation person as manager <->> person as employee) (define-relation person as owner <->> animal as pet) (define-relation person as owner <->> rock as pet) (define-relation person as owner <->> rock as weapon)
The “pattern” is naturally expressed as a regex:
(define-class ($class (:optional as $role)) (:or <-> <->> <<-> <<->>) ($class2 (:optional as $role2)))
I don’t see how to express this sort of thing in optima without enumerating all the possible cases, which rather defeats the purpose (one might as well just write a little parser at that point).
Then I want to be able to say things like this:
(define-relation user <->> time-period <->> goal)
which means that a user is associated with multiple time periods, and that each user-time-period pair is in turn associated with a number of goals.
This generalization is easily expressed as a minor tweak to the above regex:
(define-class ($class (:optional as $role)) (:one-or-more (:or <-> <->> <<-> <<->>) ($classN (:optional as $roleN))))
but AFAICT this is entirely beyond what optima can do.
rg
On Mar 19, 2014, at 2:40 AM, Hans Hübner hans.huebner@gmail.com wrote:
Ron,
are you sure that a general-purpose pattern matching library like Optima (https://github.com/m2ym/optima) would not be better than a generalized regular expression matching library for what you need to do?
-Hans
2014-03-19 1:15 GMT+01:00 Ron Garret ron@flownet.com: The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, “Match a list containing between three and five integers”. This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5)
My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code?
Thanks, rg
I'm pretty sure that someone already did this, i.e. they forked CL-PPCRE for arbitrary sequences. But I can't remember the details right now. You'll probably find a link hidden in the mailing list archives.
Cheers, Edi.
On Wed, Mar 19, 2014 at 1:15 AM, Ron Garret ron@flownet.com wrote:
The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, "Match a list containing between three and five integers". This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5)
My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code?
Thanks, rg
I'm pretty sure that someone already did this, i.e. they forked CL-PPCRE for arbitrary sequences. But I can't remember the details right now. You'll probably find a link hidden in the mailing list archives.
I tried quite some time ago to change the RE-compilation into a macro, so that the _whole_ needed code would be visible to the compiler in one compile unit.
That should have enabled quite a few optimizations - starting from matching against a base-string, an (unsigned-byte 8) vector, any other sequence ...
But I didn't get that far ... I'd have had to reimplement most of the existing code, resp. convert everything to return forms. That got a bit messy, too.
So later on I decided that the expected performance-improvements would be reached faster by waiting for 18 months (to get the CPUs catch up) than trying to completely reinvent the wheel here.
Regards,
Phil
cl-ppcre-devel@common-lisp.net