Christophe Rhodes csr21@cam.ac.uk writes:
I see this as an interim step towards what I would consider a proper solution: the variant of (1) which reads "the communication protocol between slime and lisp is defined over octets; where these octets are to be interpreted as forming character data, the encoding is ucs-4-bigendian". The point of this would be to allow those lisps supporting more characters than 256 to communicate these characters, while having those with just 256 (or fewer) characters simply deal with a slightly more space-wasteful protocol.
Why do you want to use ucs-4? Why not utf-8? Is that SBCL's internal format? If we want to use a multibyte encoding at all, we should also consider emacs-mule, that's Emacs' internal encoding used for multibyte characters.
In general, I don't like this multibyte character shit. It seems to me like a feature that 99% of the users don't need. I certainly don't need it.
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Christophe Rhodes csr21@cam.ac.uk writes:
I see this as an interim step towards what I would consider a proper solution: the variant of (1) which reads "the communication protocol between slime and lisp is defined over octets; where these octets are to be interpreted as forming character data, the encoding is ucs-4-bigendian". The point of this would be to allow those lisps supporting more characters than 256 to communicate these characters, while having those with just 256 (or fewer) characters simply deal with a slightly more space-wasteful protocol.
Why do you want to use ucs-4? Why not utf-8? Is that SBCL's internal format? If we want to use a multibyte encoding at all, we should also consider emacs-mule, that's Emacs' internal encoding used for multibyte characters.
For the encoding across the wire, I don't really care what is used: all that really matters is that it be defined and one-to-one over the space of characters that both ends agree to agree on. ucs-4 is fixed-width, which potentially makes it easier to write the conversion routines to take a stream of octets and return a string, that's all. If emacs-mule is a fixed, supported and documented internal encoding, then fine, that would work.
In general, I don't like this multibyte character shit. It seems to me like a feature that 99% of the users don't need. I certainly don't need it.
Of course it's possible to do lots of interesting computations and make interesting user interfaces with only ascii characters. On the other hand, people writing applications may want to internationalize them; people describing geographical data may want to use localized names; I don't know. But if the tools don't support multibyte characters, it's no big surprise that users aren't using them.
If you don't need multibyte character stuff, then clearly it's unfair to ask you to spend your time on it. In that case, I suggest that for now it be documented that slime (or its underlying Lisp, at any rate) must be run in the POSIX locale -- or other implementation-defined locales where the external format for character streams is latin1-based -- and that it be left at that.
Cheers,
Christophe
I added support for multibyte coding systems. The new magic variable is slime-net-coding-system. Currently there are 3 possible values for the coding system: iso-8859-1-unix, utf-8-unix, and emacs-mule-unix. At startup, Emacs tells the Lisp implementation which coding system to use and same encoding is used for the rest of the session. Not all Lisps implementations support all coding systems. The situations is as follows:
CMUCL, OpenMCL: support only iso-8859-1-unix. SBCL, CLISP: can be used with iso-8859-1-unix or utf-8-unix. Allegro: supports all three.
LispWorks and ABCL support only iso-8859-1-unix because I didn't know how to set the external-format for socket streams in those Lisps.
Of course, the Emacs version must support the coding system too. GNU Emacs 20 and 21 support emacs-mule. utf-8 is probably only available in CVS Emacs or CVS XEmacs. AFAIK, no XEmacs has no support for emacs-mule.
If you use a multibyte encoding in GNU Emacs, it is important that you set default-enable-multibyte-characters to t. That's default, so most people don't have to worry about it.
No change is needed for people who don't care about multibyte characters. Everything should work as before (modulo bugs).
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
I added support for multibyte coding systems. The new magic variable is slime-net-coding-system. Currently there are 3 possible values for the coding system: iso-8859-1-unix, utf-8-unix, and emacs-mule-unix. At startup, Emacs tells the Lisp implementation which coding system to use and same encoding is used for the rest of the session. Not all Lisps implementations support all coding systems. The situations is as follows:
CMUCL, OpenMCL: support only iso-8859-1-unix. SBCL, CLISP: can be used with iso-8859-1-unix or utf-8-unix. Allegro: supports all three.
Thank you. What Lisp support do you need to support other external formats? A lisp-side understanding of the emacs multibyte system?
If so, where is emacs-mule-unix documented?
Cheers,
Christophe
On 20 Nov 2004, Christophe Rhodes wrote:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
I added support for multibyte coding systems. The new magic variable is slime-net-coding-system. Currently there are 3 possible values for the coding system: iso-8859-1-unix, utf-8-unix, and emacs-mule-unix. At startup, Emacs tells the Lisp implementation which coding system to use and same encoding is used for the rest of the session. Not all Lisps implementations support all coding systems. The situations is as follows:
CMUCL, OpenMCL: support only iso-8859-1-unix. SBCL, CLISP: can be used with iso-8859-1-unix or utf-8-unix. Allegro: supports all three.
Thank you. What Lisp support do you need to support other external formats? A lisp-side understanding of the emacs multibyte system?
If so, where is emacs-mule-unix documented?
`emacs-mule' is an internal coding system -- *not* something that you really want to use for communications between (X)Emacs and another process.
One key reason for this is that this encoding has been known to change from version to version of Emacs itself -- as the needs of the internal data storage system change, at least under XEmacs. I believe that GNU Emacs has done the same, notably with the addition of some Unicode support around the 21.3 release.
Supporting utf-8 is good, and is available (for many characters) through the mule-ucs package, or internal support, under recent Emacs and XEmacs versions.
This package provides a CCL based tool for translating utf-8 into the internal Mule encoding and vice-versa, insulating you from the details of the internal coding.
Supporting the various 8-bit ISO encodings (other than 8859-1) is probably also nice, but not that necessary with utf-8 support.
If you really want something that isn't Unicode, but that does support a variety of coding systems at once, one of the real ISO-2022 encodings is what you want -- something standard that allows selecting the active character set, without the random variation of the internal Mule encoding.
Regards, Daniel
Daniel Pittman daniel@rimspace.net writes:
On 20 Nov 2004, Christophe Rhodes wrote:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
CMUCL, OpenMCL: support only iso-8859-1-unix. SBCL, CLISP: can be used with iso-8859-1-unix or utf-8-unix. Allegro: supports all three.
Thank you. What Lisp support do you need to support other external formats? A lisp-side understanding of the emacs multibyte system?
If so, where is emacs-mule-unix documented?
`emacs-mule' is an internal coding system -- *not* something that you really want to use for communications between (X)Emacs and another process.
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
Cheers,
Christophe
Christophe Rhodes writes:
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
I don't think so. I've used emacs 21.3.1 with clisp 2.33.2 communicating in utf-8 with no problem (with the characters I used). Of course, that was with ilisp...
Christophe Rhodes csr21@cam.ac.uk writes:
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
I was wrong. According to the NEWS file the utf-8 coding system was already present in Emacs 21.1., i.e. the first release of the 21 series. Sorry, for the confusion.
I don't know what the exact state in the XEmacs world is. My XEmacs 21.4 here had neither utf-8 nor emacs-mule support. But as Daniel said, utf-8 can be added with the mule-ucs package. I just tried it and it worked painlessly under Debian. mule-ucs works even for Emacs20.
One difference I observed between emacs-mule and utf-8 was that the same CL character can be mapped to different Emacs characters. E.g. in Allegro #\greek_small_letter_lamda (no, that's not a typo :) with char-code #x3bb, can be displayed as #x513bb or #xd34b. With my fonts, the glyph for the latter is a bit wider than a normal character. No idea what that means, though.
Helmut.
On 21 Nov 2004, Helmut Eller wrote:
Christophe Rhodes csr21@cam.ac.uk writes:
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
I was wrong. According to the NEWS file the utf-8 coding system was already present in Emacs 21.1., i.e. the first release of the 21 series. Sorry, for the confusion.
I don't know what the exact state in the XEmacs world is. My XEmacs 21.4 here had neither utf-8 nor emacs-mule support. But as Daniel said, utf-8 can be added with the mule-ucs package. I just tried it and it worked painlessly under Debian. mule-ucs works even for Emacs20.
*nod* Basically, it installs a CCL driver that knows about the internal mule encoding on those platforms, and maps UTF-8 to and from it.
If you want to use `mule-unicode' yourself, you need to embed that same knowledge in your own Unicode<->MULE layer. If the Lisp vendor did that for you, of course, then it is their problem to do that work. ;)
One difference I observed between emacs-mule and utf-8 was that the same CL character can be mapped to different Emacs characters. E.g. in Allegro #\greek_small_letter_lamda (no, that's not a typo :) with char-code #x3bb, can be displayed as #x513bb or #xd34b. With my fonts, the glyph for the latter is a bit wider than a normal character. No idea what that means, though.
The GNU Emacs project allocated an internal section of the emacs-mule space for representing Unicode character ranges. These are the `mule-unicode-XXXX-YYYY' ranges, represented internally as `9C F0 XX XX' according to my Emacs 21.3.
These are used by the internal UTF-8 support, and can only display characters from fonts that have an X `iso10646-1' font encoding.
There is a very limited range of suitable fonts available for this, especially if you don't use a stock 75DPI monitor, and many of them are of variable or low quality.[1]
MULE-UCS, on the other hand, uses CCL to transform the UTF-8 byte stream into characters in the pre-assigned MULE space, so your Greek character should have turned up as something like `greek-iso8859-7', represented as `86 XX' internally.
This will then use a font in the relevant X ISO encoding to display. Since there have been a lot more years of using that encoding, many higher quality fonts are available.
Presumably, on your system, the font here matches the iso8859-1 font exactly, while the Unicode encoded font does not.
You can find out what internal codeset and font were used, on GNU Emacs at least, by moving the cursor over the character and entering `C-u C-x ='.
Finally, one other difference between the MULE-UCS and internal support is the handling of characters outside the supported range. While I have not tested this, I am assured that as of MULE-UCS 0.83 it is true:
When the current MULE-UCS release loads a character that it cannot map from UTF-8 to emacs-mule, it replaces it with a substitution character destructively. When you write it out, that information is lost.
The internal utf8 coding system stores the original bytes internally, and when you write it out, the original byte sequence is reproduced on output.
This, apparently, is a model that MULE-UCS is adopting on their next major release. I don't know if the `mule-ucs' package in Debian has this implemented yet, but the version number suggests a CVS release, probably after this feature was added.
I don't actually use MULE-UCS at the moment, so can't comment beyond this.
Regards, Daniel
Footnotes: [1] This is my one real objection to the internal UTF-8 encoding in Emacs 21; it made it *really* hard to get decent Unicode font support, since my display is ~120DPI.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Christophe Rhodes csr21@cam.ac.uk writes:
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
I was wrong. According to the NEWS file the utf-8 coding system was already present in Emacs 21.1., i.e. the first release of the 21 series. Sorry, for the confusion.
Thanks. I can confirm that emacs21 (as distributed by Debian) works fine with SBCL 0.8.16.3x and slime-net-encoding-system set to 'utf-8-unix; also, that it works "out of the box" provided that no character with code-point above 255 is encountered -- if it is, then SBCL's currently incredibly fragile encoding error reporting causes death of the running lisp. :-/
Cheers,
Christophe
On 21 Nov 2004, Christophe Rhodes wrote:
Daniel Pittman daniel@rimspace.net writes:
On 20 Nov 2004, Christophe Rhodes wrote:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
CMUCL, OpenMCL: support only iso-8859-1-unix. SBCL, CLISP: can be used with iso-8859-1-unix or utf-8-unix. Allegro: supports all three.
Thank you. What Lisp support do you need to support other external formats? A lisp-side understanding of the emacs multibyte system?
If so, where is emacs-mule-unix documented?
`emacs-mule' is an internal coding system -- *not* something that you really want to use for communications between (X)Emacs and another process.
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs
I suspect this to be true, at least for XEmacs. I last used that four or five months ago, so my knowledge is less strong after that, but at the time, IIRC, a couple of encodings like Ethiopic and some of the forms of CKJ were not fully supported.
-- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
No. MULE-UCS, or Emacs 21.1, give you "good enough" UTF-8 support for most practical purposes. Obviously, since both are a mapping layer over the internal MULE encoding, that are not completely perfect.
I don't seriously expect anything to change here until the release of an internally Unicode Emacs, or XEmacs. At present, the GNU project seems to be closer, but both teams have working prototype code AFAIK.[1]
Regards, Daniel
Footnotes: [1] I don't follow internal XEmacs development any longer, since around four or five months ago, so things may have changed there.
Christophe Rhodes csr21@cam.ac.uk writes:
Thank you. What Lisp support do you need to support other external formats? A lisp-side understanding of the emacs multibyte system?
Basically yes. The front end only deals with character streams and char <-> code conversion is no longer used. The implementation dependent socket stream does all the encoding. We could probably also write our own stream classes, but I think it's easier to use the available support from the implementation.
If so, where is emacs-mule-unix documented?
I don't know. Maybe it's only documented in comments at the beginning of the source file[*].
But I don't know if it worth to spend time on other encodings.
Helmut.
[*] http://savannah.gnu.org/cgi-bin/viewcvs/emacs/emacs/src/coding.c?rev=1.307&a...