On 10/1/10 1:46 AM, Helmut Eller wrote:
- Raymond Toy [2010-10-01 00:11] writes:
This has been happening for some time, and it's annoying enough that I want to fix it. With CMUCL 20b and slime 2010-09-20, try the following:
(defvar *s* (make-string 4)) *s* (setf (lisp:codepoint *s* 0) #x10000))
Upto now, everything is ok. Now print the string:
*s*
At this point, the string is displayed, with a rectangular box for the codepoint #x10000 followed by two ^@ for the two null characters. (Recall that unicode strings in cmucl are utf-16 strings, so the first two elements of *s* are the surrogate pair for #x10000.)
What is the length of *s* or (prin1-to-string *s*) now? Should it be 3 not 4?
Good question. The answer now is 4, not 3. There are 4 code units in the string, so that is the length. Length would be really slow if it had to scan the whole string looking for surrogate pairs and counting them as one instead of two.
Is that the reason for the problem? Confusion between emacs and lisp on the length of the string? It does appear that the string only has 3 characters, as displayed by emacs.
Doesn't acl have this problem too? It also uses 16-bit strings like cmucl.
Ray