On 10/6/10 3:34 AM, Helmut Eller wrote:
- Raymond Toy [2010-10-06 01:07] writes:
On 10/2/10 3:45 AM, Helmut Eller wrote:
- Raymond Toy [2010-10-01 19:49] writes:
Oh, that's a problem. In the example, length is 3, but the string actually has 4 code units, so read-sequence only reads 3 code units, completely missing the last code unit.
I think we have the following options:
Do you have a preference for any of the options (besides option 1). I'd like to make this work, because it's really annoying when slime crashes. I usually remember not to do these things, but when an error is thrown and slime brings up the debugger and displays the string on the backtrace, slime crashes, just when I really needed to know what happened.
[Ideally the different Lisp implementations should have the same notion of "character". That CMUCL thinks of characters as Unicode code units while SBCL uses code points is IMO and unfortunate development. In Scheme (R6RS) they say that a Scheme character should correspond to one Unicode scalar value, which seems to be the ranges [0, #xD7FF] and [#xE000, #x10FFFF]. Java and .NET use code units. It would not be the worst idea to adopt one standard; the earlier we do that the less it costs.]
It was a tradeoff between space usage (16-bit strings vs 32-bit strings), compiler complexity (managing 8-bit and 32-bit strings) and user complexity (base-strings vs strings).
For now option 2) is probably the simplest.
Ok. Can you give some hints on where to start looking at this?
In the long run, byte streams would be more flexible. In theory we could use something like HTTP chunking, if it's worth the complexity.
If you ever start working on this approach, let me know and I'll try to help out.
Ray