Re: [slime-devel] CMUCL unicode strings breaks slime

2 Oct 2010


      On 10/2/10 3:45 AM, Helmut Eller wrote:
...
* Raymond Toy [2010-10-01 19:49] writes:
...
Oh, that's a problem.  In the example, length is 3, but the string
actually has 4 code units, so read-sequence only reads 3 code units,
completely missing the last code unit.
I think we have the following options:
1) Don't support code points beyond 16 bits.  Clean and easy.
Yes.  I only ever use codepoints outside the BMP when testing unicode.
But it is annoying that slime breaks.
...
2) Introduce variants of length and read-sequence that use the same
   notion of character as Emacs.  Kinda messy and probably slow, but
   relatively easy.
I don't know slime internals, but wouldn't you only need a special
version of length and read-sequence for cmucl with unicode?  The normal
length/read-sequence would be fine for everyone else.
...
3) Switch from character streams to binary streams so that we can use
   byte counts instead of character counts.  This has several
   advantages:
    - surrogate pairs are no problem
    - don't need flexi-streams for Lispworks
Why does Lispworks need flexi-streams?  Does this have to do with using
read-byte on character streams or read-char on binary streams?
...
- it would be easier to switch encoding after connecting
    - read/write-sequence is probably faster on byte streams
   disadvantageous:
    - more consing, and Emacs's GC isn't that good
    - need a string-to/from-bytearray function for every backend
Doesn't every backend already have such a function?  Of course, someone
has to hook that up, but at least it doesn't have to be written from
scratch.
...
- breaks third party backends
Sounds like a show stopper to me.

Ray