Hi Sébastien!
On Mon, 11 Oct 2004 18:52:56 +0200, Sébastien Saint-Sevin seb-cl-mailist@matchix.com wrote:
I'm doing multi-lines regex searches over big files that can't be converted to single string. So I introduced a kind of buffer that I'm using to search.
Now, I need to add a constraint to scan, do-scans & others (in addition to (&key start end)) : I want to be able to specify to the engine that a scan must start before a certain index in the string (to avoid searching further results that will be cancelled later because of my buffered multi-line matching process).
Logically, this :must-start-before value correspond to the first line of my buffer. If nothing starts at first line, I need to move the search one line forward, so everything that the engine would match later on in the string is wasted time.
How can I do it ?
Have you considered using something like
(?s:(?=.{n}))<actual-regular-expression>
where n obviously is an integer computed from your constraints above? I don't know how this'll behave performance-wise but you could just try it... :)
Or have I misunderstood your question? Actually, I'm not sure why the END keyword parameter doesn't suffice. Can you give an example?
As far as I understand it, (?s:(?=.{n})) will only garantee that at least n chars are remaining from match-start in the consumed string. This is not what I want. I want something that garantee that match-start will be before index n (meaning n'th char in consumed string), wether match-end is before or after this index n.
PS: Edi, if you are back, my previous post is still an open question ;-) (the one with FILTER...)
Yes, I'm back but unfortunately I'm very busy with commercial stuff right now. Sorry, filters will have to wait some more.
Cheers, Edi.
Here is what I've got right now (it's ok for my needs actually).
(defclass filter (regex) ((num :initarg :num :accessor num :type fixnum :documentation "The number of the register this filter refers to.") (predicate :initarg :predicate :accessor predicate :documentation "The predicate to validate the register with")) (:documentation "FILTER objects represent the combination of a register and a predicate. This is not available in regex string, but only used in parse tree."))
(defmethod create-matcher-aux ((filter filter) next-fn) (declare (type function next-fn)) ;; the position of the corresponding REGISTER within the whole ;; regex; we start to count at 0 (let ((num (num filter))) (lambda (start-pos) (declare (type fixnum start-pos)) (let ((reg-start (svref *reg-starts* num)) (reg-end (svref *reg-ends* num))) ;; only bother to check if the corresponding REGISTER as ;; matched successfully already (and reg-start (funcall (predicate filter) (subseq *string* reg-start reg-end)) (funcall next-fn start-pos))))))
ADDED TO (defun convert-aux (parse-tree) ...
;; (:FILTER <number> <predicate>) ((:filter) (let ((backref-number (second parse-tree)) (predicate (third parse-tree))) (declare (type fixnum backref-number)) (when (or (not (typep backref-number 'fixnum)) (<= backref-number 0)) (signal-ppcre-syntax-error "Illegal back-reference: ~S" parse-tree)) (unless (or (typep predicate 'symbol) (typep predicate 'function)) (signal-ppcre-syntax-error "Illegal predicate: ~S" parse-tree)) ;; stop accumulating into STARTS-WITH and increase ;; MAX-BACK-REF if necessary (setq accumulate-start-p nil max-back-ref (max (the fixnum max-back-ref) backref-number)) (make-instance 'filter ;; we start counting from 0 internally :num (1- backref-number) :predicate predicate)))
ADDED FOR MY PURPOSES...
(defmethod create-scanner-with-predicate ((regex-string string) predicate &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive) (declare (optimize speed (safety 0) (space 0) (debug 0) (compilation-speed 0) #+:lispworks (hcl:fixnum-safety 0))) (declare (ignore destructive)) ;; parse the string into a parse-tree and then call CREATE-SCANNER again (let* ((*extended-mode-p* extended-mode) (quoted-regex-string (if *allow-quoting* (quote-sections (clean-comments regex-string extended-mode)) regex-string)) (*syntax-error-string* (copy-seq quoted-regex-string)) (parse-tree (parse-string quoted-regex-string))) ;; wrap the result with FILTER to check for predicate (create-scanner `(:sequence (:register ,(shift-back-reference parse-tree)) (:filter 1 ,predicate)) :case-insensitive-mode case-insensitive-mode :multi-line-mode multi-line-mode :single-line-mode single-line-mode :destructive t)))
(defun shift-back-reference (tree) (if (and (consp tree) (eq (first tree) :back-reference)) `(:back-reference ,(1+ (second tree))) (if (atom tree) tree (cons (shift-back-reference (car tree)) (shift-back-reference (cdr tree))))))