* Raymond Toy [2010-10-01 19:49] writes:
Oh, that's a problem. In the example, length is 3, but the string actually has 4 code units, so read-sequence only reads 3 code units, completely missing the last code unit.
I think we have the following options:
1) Don't support code points beyond 16 bits. Clean and easy.
2) Introduce variants of length and read-sequence that use the same notion of character as Emacs. Kinda messy and probably slow, but relatively easy.
3) Switch from character streams to binary streams so that we can use byte counts instead of character counts. This has several advantages: - surrogate pairs are no problem - don't need flexi-streams for Lispworks - it would be easier to switch encoding after connecting - read/write-sequence is probably faster on byte streams disadvantageous: - more consing, and Emacs's GC isn't that good - need a string-to/from-bytearray function for every backend - breaks third party backends
Helmut