Sébastien Saint-Sevin seb-cl-mailist@matchix.com writes:
While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one.
Ex : (cl-ppcre:split "\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz")
It is an interesting question, but I believe that the current split behavior of the returning the leading empty string is the rational behavior. In mind my in comes down to the definition of split "returns a list of the substrings between the matches".
Having said that I often have real-world needs to *not* have the leading string around. I wish there were explicit keyword args to omit any leading and trailing empty strings. If I get motivated, I might even make a patch! Perl's version of split doesn't have keyword args so it tries to fit several behavior changes into its arguments.
Here's some more practical advice: If you know your problem domain well, you can try the inverse match trick. Instead of calling SPLIT, call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case:
(all-matches-as-strings "\S+" " foo bar baz ") => ("foo" "bar" "baz")
(This will skip internal empty strings in the general case, but doesn't matter for your example case.)
It's also easy to also write your own split that does what you want. An untested version is below.
Cheers, Chris Dean
(defun simple-split (regex target-string) "A simple version of split that doesn't handle registers in any special way and discards leading and trailing empty matches. Untested!" (let ((res nil) ; The result (last-end 0)) ; The end positon of the last match (cl-ppcre:do-matches (mstart mend regex target-string) (unless (zerop mstart) (push (subseq target-string last-end mstart) res)) (setf last-end mend)) (when (< last-end (length target-string)) (push (subseq target-string last-end) res)) (nreverse res)))