On 21 Nov 2004, Helmut Eller wrote:
Christophe Rhodes csr21@cam.ac.uk writes:
Maybe -- but, unless I read Helmut's work wrongly, it's also the only way of communicating the full space of characters to Emacs -- that is, Helmut's message strongly implied to me that current released versions of the emacsen do not support utf-8 communications in any useful way. Is this correct?
I was wrong. According to the NEWS file the utf-8 coding system was already present in Emacs 21.1., i.e. the first release of the 21 series. Sorry, for the confusion.
I don't know what the exact state in the XEmacs world is. My XEmacs 21.4 here had neither utf-8 nor emacs-mule support. But as Daniel said, utf-8 can be added with the mule-ucs package. I just tried it and it worked painlessly under Debian. mule-ucs works even for Emacs20.
*nod* Basically, it installs a CCL driver that knows about the internal mule encoding on those platforms, and maps UTF-8 to and from it.
If you want to use `mule-unicode' yourself, you need to embed that same knowledge in your own Unicode<->MULE layer. If the Lisp vendor did that for you, of course, then it is their problem to do that work. ;)
One difference I observed between emacs-mule and utf-8 was that the same CL character can be mapped to different Emacs characters. E.g. in Allegro #\greek_small_letter_lamda (no, that's not a typo :) with char-code #x3bb, can be displayed as #x513bb or #xd34b. With my fonts, the glyph for the latter is a bit wider than a normal character. No idea what that means, though.
The GNU Emacs project allocated an internal section of the emacs-mule space for representing Unicode character ranges. These are the `mule-unicode-XXXX-YYYY' ranges, represented internally as `9C F0 XX XX' according to my Emacs 21.3.
These are used by the internal UTF-8 support, and can only display characters from fonts that have an X `iso10646-1' font encoding.
There is a very limited range of suitable fonts available for this, especially if you don't use a stock 75DPI monitor, and many of them are of variable or low quality.[1]
MULE-UCS, on the other hand, uses CCL to transform the UTF-8 byte stream into characters in the pre-assigned MULE space, so your Greek character should have turned up as something like `greek-iso8859-7', represented as `86 XX' internally.
This will then use a font in the relevant X ISO encoding to display. Since there have been a lot more years of using that encoding, many higher quality fonts are available.
Presumably, on your system, the font here matches the iso8859-1 font exactly, while the Unicode encoded font does not.
You can find out what internal codeset and font were used, on GNU Emacs at least, by moving the cursor over the character and entering `C-u C-x ='.
Finally, one other difference between the MULE-UCS and internal support is the handling of characters outside the supported range. While I have not tested this, I am assured that as of MULE-UCS 0.83 it is true:
When the current MULE-UCS release loads a character that it cannot map from UTF-8 to emacs-mule, it replaces it with a substitution character destructively. When you write it out, that information is lost.
The internal utf8 coding system stores the original bytes internally, and when you write it out, the original byte sequence is reproduced on output.
This, apparently, is a model that MULE-UCS is adopting on their next major release. I don't know if the `mule-ucs' package in Debian has this implemented yet, but the version number suggests a CVS release, probably after this feature was added.
I don't actually use MULE-UCS at the moment, so can't comment beyond this.
Regards, Daniel
Footnotes: [1] This is my one real objection to the internal UTF-8 encoding in Emacs 21; it made it *really* hard to get decent Unicode font support, since my display is ~120DPI.