On 10/2/10 3:45 AM, Helmut Eller wrote:
- Raymond Toy [2010-10-01 19:49] writes:
Oh, that's a problem. In the example, length is 3, but the string actually has 4 code units, so read-sequence only reads 3 code units, completely missing the last code unit.
I think we have the following options:
- Don't support code points beyond 16 bits. Clean and easy.
Yes. I only ever use codepoints outside the BMP when testing unicode. But it is annoying that slime breaks.
- Introduce variants of length and read-sequence that use the same notion of character as Emacs. Kinda messy and probably slow, but relatively easy.
I don't know slime internals, but wouldn't you only need a special version of length and read-sequence for cmucl with unicode? The normal length/read-sequence would be fine for everyone else.
- Switch from character streams to binary streams so that we can use byte counts instead of character counts. This has several advantages:
- surrogate pairs are no problem
- don't need flexi-streams for Lispworks
Why does Lispworks need flexi-streams? Does this have to do with using read-byte on character streams or read-char on binary streams?
- it would be easier to switch encoding after connecting - read/write-sequence is probably faster on byte streams
disadvantageous: - more consing, and Emacs's GC isn't that good - need a string-to/from-bytearray function for every backend
Doesn't every backend already have such a function? Of course, someone has to hook that up, but at least it doesn't have to be written from scratch.
- breaks third party backends
Sounds like a show stopper to me.
Ray