I just committed changes to the wire format. The new format is defined on bytes and not characters.
Counting characters was problematic, especially with Lisps that use UTF16 internally (Allegro, CMUCL, JVM based Lisps). Emacs counts the length of strings in Unicode code points, while in UTF16 a single code point may occupy either 1 or 2 indexes (code units) and so CL:LENGTH may return something different as Emacs expected. For the same reason we can't use READ-SEQUENCE to read a specified number of code points.
The new format looks so:
| byte0 | 3 bytes length | | ... payload ... |
The 3 bytes length header specify the length of the payload in bytes. The playload is an s-exp encoded as UTF8 text. byte0 is currently always 0; other values are reserved for future use. Robert Brown said he'd like to use compression, so byte0 might come in handy.
The change breaks backward compatibility. When upgrading, make sure that both Lisp and Emacs use the new format.
I did some light testing with most of the backends and provided a portable version for the utf8 encoding/decoding. I didn't test SCL and CormanCL.
Third party backends, for Clojure etc., are obviously broken now. So if you need those, wait until somebody fixes them and the dust has settled.
Helmut