* Hugo Duncan [2011-11-07 04:04] writes:
On Sun, 06 Nov 2011 12:13:07 -0500, Helmut Eller heller@common-lisp.net wrote:
Counting characters was problematic, especially with Lisps that use UTF16 internally (Allegro, CMUCL, JVM based Lisps). Emacs counts the length of strings in Unicode code points, while in UTF16 a single code point may occupy either 1 or 2 indexes (code units) and so CL:LENGTH may return something different as Emacs expected. For the same reason we can't use READ-SEQUENCE to read a specified number of code points.
The new format looks so:
| byte0 | 3 bytes length | | ... payload ... |
The 3 bytes length header specify the length of the payload in bytes.
Is there a reason to start using a binary encoding of the message length?
No deep reason. We actually used binary encoding before we used hex-strings. That worked fine with latin-1 but not with utf-8. I guess it's just instinct; now that we explicitly work on a byte stream it's even more natural. Should probably have used network byte order.
This makes the messages less easy to inspect, and less easy to write integration tests for.
Only marginally. Shifting 3 bytes together is not exactly rocket since.
The playload is an s-exp encoded as UTF8 text.
Normalising on utf-8 and counting bytes sounds like it would solve the original issue without changing to a binary encoding of the message length.
Right. It would not be backward compatible, tho.
Helmut