* Raymond Toy [2010-10-01 10:20] writes:
What is the length of *s* or (prin1-to-string *s*) now? Should it be 3 not 4?
Good question. The answer now is 4, not 3. There are 4 code units in the string, so that is the length. Length would be really slow if it had to scan the whole string looking for surrogate pairs and counting them as one instead of two.
Is that the reason for the problem? Confusion between emacs and lisp on the length of the string? It does appear that the string only has 3 characters, as displayed by emacs.
Very likely, Emacs uses something like utf-8 internally and counts code points not code units (expect for line endings which is probably a different issue).
Doesn't acl have this problem too? It also uses 16-bit strings like cmucl.
Allegro has no lisp:codepoint function and (code-char #x10000) returns nil. Similar situation in ABCL just that it returns #\null.
In Java, strings have a length method which returns code units and a codePointCount method for the other use. Maybe CMUCL has something like that and we should use it in SWANK.
Helmut