Hi Edi & list,
While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one.
I don't know the Perl behaviour in this particular case, but I hope it is not a peculiar behaviour as the doc says :-)
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
I would prefer to get ("foo" "bar" "baz")
What do you think of it ? Thks, Sebastien.
Hi again,
Hi Edi & list,
While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one.
I should have say "the empty string at pos 0" (I'm ok with empty strings in the middle of the string when two consecutives matches occurs with no char in between). The same can be said for an empty string at the end (but this can't be seen as the empty strings are removed when at the end). Hope this clarifies a bit my thought...
I don't know the Perl behaviour in this particular case, but I hope it is not a peculiar behaviour as the doc says :-)
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
I would prefer to get ("foo" "bar" "baz")
What do you think of it ? Thks, Sebastien. _______________________________________________
Sébastien Saint-Sevin wrote:
I don't know the Perl behaviour in this particular case, but I hope it is not a peculiar behaviour as the doc says :-)
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
I would prefer to get ("foo" "bar" "baz")
bash$ perl -e 'print join(" ", map { ""$_"" } split(/\s+/, " foo bar baz ")), "\n"' "" "foo" "bar" "baz"
CL-PPCRE is matching Perl's behavior here.
Matthew Sachs a écrit :
Sébastien Saint-Sevin wrote:
I don't know the Perl behaviour in this particular case, but I hope it is not a peculiar behaviour as the doc says :-)
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
I would prefer to get ("foo" "bar" "baz")
bash$ perl -e 'print join(" ", map { ""$_"" } split(/\s+/, " foo bar baz ")), "\n"' "" "foo" "bar" "baz"
CL-PPCRE is matching Perl's behavior here.
I'm not that much surprised that PERL can be doing it this way...
Thanks for the perl test, Matthew.
Cheers, Sebastien.
cl-ppcre-devel site list cl-ppcre-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
Sébastien Saint-Sevin seb-cl-mailist@matchix.com writes:
While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one.
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
It is an interesting question, but I believe that the current split behavior of the returning the leading empty string is the rational behavior. In mind my in comes down to the definition of split "returns a list of the substrings between the matches".
Having said that I often have real-world needs to *not* have the leading string around. I wish there were explicit keyword args to omit any leading and trailing empty strings. If I get motivated, I might even make a patch! Perl's version of split doesn't have keyword args so it tries to fit several behavior changes into its arguments.
Here's some more practical advice: If you know your problem domain well, you can try the inverse match trick. Instead of calling SPLIT, call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case:
(all-matches-as-strings "\S+" " foo bar baz ") => ("foo" "bar" "baz")
(This will skip internal empty strings in the general case, but doesn't matter for your example case.)
It's also easy to also write your own split that does what you want. An untested version is below.
Cheers, Chris Dean
(defun simple-split (regex target-string) "A simple version of split that doesn't handle registers in any special way and discards leading and trailing empty matches. Untested!" (let ((res nil) ; The result (last-end 0)) ; The end positon of the last match (cl-ppcre:do-matches (mstart mend regex target-string) (unless (zerop mstart) (push (subseq target-string last-end mstart) res)) (setf last-end mend)) (when (< last-end (length target-string)) (push (subseq target-string last-end) res)) (nreverse res)))
Thanks a lot Chris, Very interesting feedback
Cheers, sebastien.
Chris Dean a écrit :
Sébastien Saint-Sevin seb-cl-mailist@matchix.com writes:
While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one.
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
It is an interesting question, but I believe that the current split behavior of the returning the leading empty string is the rational behavior. In mind my in comes down to the definition of split "returns a list of the substrings between the matches".
Having said that I often have real-world needs to *not* have the leading string around. I wish there were explicit keyword args to omit any leading and trailing empty strings. If I get motivated, I might even make a patch! Perl's version of split doesn't have keyword args so it tries to fit several behavior changes into its arguments.
Here's some more practical advice: If you know your problem domain well, you can try the inverse match trick. Instead of calling SPLIT, call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case:
(all-matches-as-strings "\S+" " foo bar baz ") => ("foo" "bar" "baz")
(This will skip internal empty strings in the general case, but doesn't matter for your example case.)
It's also easy to also write your own split that does what you want. An untested version is below.
Cheers, Chris Dean
(defun simple-split (regex target-string) "A simple version of split that doesn't handle registers in any special way and discards leading and trailing empty matches. Untested!" (let ((res nil) ; The result (last-end 0)) ; The end positon of the last match (cl-ppcre:do-matches (mstart mend regex target-string) (unless (zerop mstart) (push (subseq target-string last-end mstart) res)) (setf last-end mend)) (when (< last-end (length target-string)) (push (subseq target-string last-end) res)) (nreverse res))) _______________________________________________ cl-ppcre-devel site list cl-ppcre-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
cl-ppcre-devel@common-lisp.net