* Raymond Toy [2010-10-01 15:18] writes:
CMUCL doesn't currently have a codePointCount function, we that's easy enough to add if slime wants it. Here's one:
(defun codepoint-count (string) "Return the number of code points in the string. The string MUST be a valid UTF-16 string." (do ((len (length string)) (index 0 (1+ index)) (count 0 (1+ count))) ((>= index len) count) (multiple-value-bind (codepoint wide) (lisp:codepoint string index) (declare (ignore codepoint)) (when wide (incf index)))))
I hope this is faster than it looks :-).
What does read-sequence if the input stream contains surrogate pairs? Swank uses code like
(let* ((buffer (make-string length)) (count (read-sequence buffer stream))) buffer)
where length is the number of code points as computed by Emacs. If read-sequence also works on code units than we can't send surrogate pairs from Emacs -> Lisp.
Helmut