Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Christophe Rhodes csr21@cam.ac.uk writes:
I see this as an interim step towards what I would consider a proper solution: the variant of (1) which reads "the communication protocol between slime and lisp is defined over octets; where these octets are to be interpreted as forming character data, the encoding is ucs-4-bigendian". The point of this would be to allow those lisps supporting more characters than 256 to communicate these characters, while having those with just 256 (or fewer) characters simply deal with a slightly more space-wasteful protocol.
Why do you want to use ucs-4? Why not utf-8? Is that SBCL's internal format? If we want to use a multibyte encoding at all, we should also consider emacs-mule, that's Emacs' internal encoding used for multibyte characters.
For the encoding across the wire, I don't really care what is used: all that really matters is that it be defined and one-to-one over the space of characters that both ends agree to agree on. ucs-4 is fixed-width, which potentially makes it easier to write the conversion routines to take a stream of octets and return a string, that's all. If emacs-mule is a fixed, supported and documented internal encoding, then fine, that would work.
In general, I don't like this multibyte character shit. It seems to me like a feature that 99% of the users don't need. I certainly don't need it.
Of course it's possible to do lots of interesting computations and make interesting user interfaces with only ascii characters. On the other hand, people writing applications may want to internationalize them; people describing geographical data may want to use localized names; I don't know. But if the tools don't support multibyte characters, it's no big surprise that users aren't using them.
If you don't need multibyte character stuff, then clearly it's unfair to ask you to spend your time on it. In that case, I suggest that for now it be documented that slime (or its underlying Lisp, at any rate) must be run in the POSIX locale -- or other implementation-defined locales where the external format for character streams is latin1-based -- and that it be left at that.
Cheers,
Christophe