On Sun, 06 Nov 2011 12:13:07 -0500, Helmut Eller heller@common-lisp.net wrote:
Counting characters was problematic, especially with Lisps that use UTF16 internally (Allegro, CMUCL, JVM based Lisps). Emacs counts the length of strings in Unicode code points, while in UTF16 a single code point may occupy either 1 or 2 indexes (code units) and so CL:LENGTH may return something different as Emacs expected. For the same reason we can't use READ-SEQUENCE to read a specified number of code points.
The new format looks so:
| byte0 | 3 bytes length | | ... payload ... |
The 3 bytes length header specify the length of the payload in bytes.
Is there a reason to start using a binary encoding of the message length? This makes the messages less easy to inspect, and less easy to write integration tests for.
The playload is an s-exp encoded as UTF8 text.
Normalising on utf-8 and counting bytes sounds like it would solve the original issue without changing to a binary encoding of the message length.