Hi Anton,
sorry for the delay. I have to think about this a bit more, but here's some preliminary feedback:
On Sat, 21 Apr 2007 22:33:02 +0400, Vodonosov Anton vodonosov@mail.ru wrote:
Edi, as far as I understand external format with multibyte encodings and cr-lf style newlines are not optimized because its difficult to predict number of characters that will fit into buffer.
We can solve it if we will always have few reserved bytes in buffer. 20 will be sufficient for any encoding. I.e. loop while we have at least 20 free bytes in buffer.
This solution is implemented in the attached patch (against 0.11.2).
Thanks. The win for UTF-8 is impressive, but I'm a bit concerned that you'll lose a lot of performance for 8-bit encodings. I think it'd be better to leave the 8-bit version in there and only use your code for other strings.
The patch also contains some additions in tests. As for separate tests for READ-SEQUENCE/WRITE-SEQUENCE, maybe thay are useless - I noticed that, at least in CLISP, WRITE-LINE is implemented using WRITE-SEQUENCE. So in case of errors in WRITE-SEQUENCE both WRITE-LINE and WRITE-SEQUENCE tests fail.
Thanks, more tests are always good.
In regard to the way I reused existing WRITE-CHAR code in STREAM-WRITE-SEQUENCE. I do not like mach working throught temporary stream, for example we have redundant slot access in WRITE-BYTE*. I've tried to keep changes small and not disturb other code. Maybe with some refactoring it will be possible to have more clean and efficient code.
Yes, the more we optimize for performance, the more of a mess it becomes... :(
BTW, I've started changing STREAM-WRITE-SEQUENCE with the version provided below. It is more efficient, but it isn't thread safe.
Hmmm...
A I said, I have to ponder this a bit more, but unfortunately I'm too busy right now. More later.
Cheers, Edi.