On Mon, Apr 9, 2012 at 11:33, Douglas Crosher dtc-asdf@scieneer.com wrote:
Attached is my suggestion for adding external-format support.
Thanks a lot for taking the time to express your suggestion as a patch. It is much appreciated.
You've convinced me that we should consider making asdf-encodings part of asdf itself, at the cost of ~200 lines. I can see arguments both ways, and I'm open to opinions.
However, I still think that ASDF needs a portable file encoding layer well distinguished from non-portable external-formats (see below).
If anyone here cares, please speak. Unless I start seeing many people concur against me, I'll stick to my plan.
- A table of translations is included, based on asdf-encodings, but if not found then the external-format is passed through.
Note that this may not be obvious from the lacking documentation, but asdf-encodings already passes through code encodings that are not found.
It is not the intention that the list be exhaustive or that any attempt be made to verify the encoding.
Unhappily, the "verification" step is the only way I've found to determine which of many many known aliases a given implementation uses, so as to portably map other, unsupported, names, to that alias. Since some implementation (say lispworks) insist in calling :latin-1 what another implementation (say clisp) insists in calling charset:iso-8859-1, whereas yet another implementation (say ecl) wrongfully thinks :latin-6 is :iso-8859-6 when it is actually :iso-8859-10, unless you're going to build a big table valid for 9 to 14 implementations and all past and future versions thereof, you're not going to win.
Detection works, and allows me to provide a list of encoding names that actually works portably. You want :latin1 ? I'll give it to you on sbcl, ccl, lispworks, clisp, ecl, scl, allegro, etc.
- It uses :external-format. Users will be working with external-formats, perhaps for a foreign CL implementation but still
external-formats. Introducing new terminology of 'encoding' seems a mistake.
That's one case where I don't see you convincing me, for the reason I detailed above: passing through encoding names basically requires the user to know a common name for all the implementations that he may want to target. Such common name doesn't even exist in the common case of latin1. Therefore there MUST be a translation layer. Giving the same name to two very different things is NOT going to help anyone, but only to cause confusion.
Also, note that ASDF's encodings play a much more limited role than external-formats: they are not meant as a way to express arbitrary transformations an implementation may provide for a variety of input and output uses, but only as a way to *portably* specify an character encoding for the input only reading of Lisp source files for the purposes of LOADing of COMPILE-FILEing.
Therefore, ASDF encodings are NOT meant as a full portable replacement for any implementation's external-format system. For that, you'll want flexi-streams, iolib (using babel), or some other library to be determined. They are only meant to allow the portable use of non-UTF8 code.
- No attempt is made to verify the external format. This is does seem necessary and not even possible.
I proved in my code that detection is possible on all implementations that support multiple external-formats, except for abcl (bug filed): allegro, clozure, clisp, cmucl, ecl, sbcl, lispworks (mostly, based on documentation), scl (kind of; help required). Remain unsupported (possibly forever): cormanlisp, gcl, genera, rmcl, xcl
- A declarative system definition can be used for both portable :utf-8 and implementation dependant (non-portable) external-formats.
The current system already allows that. asdf-encodings is already pass-through when it doesn't recognize an encoding. If I merge asdf-encodings into asdf, the will become the default behavior.
However, in the current setup where asdf-encodings is separate, I explicitly decided against making the default behavior pass-through, otherwise many library authors will be confused into believing their code is portable when really it is not, just because their current implementation recognizes the name they use, when other implementations won't, but asdf-encodings could if it were loaded. One of my goals as asdf maintainer is to make its semantics more predictable in a portable way, rather than pushing the responsibility of portability upon the user.
There is no need to add code methods or extend asdf-encodings to use user defined or implementation dependant external formats. Supporting declarative definitions has many advantages over the alternative of requiring asdf-encoding code or asdf methods to support user defined or implementation dependant external-formats.
That's a valid argument for merging asdf-encodings into asdf, whatever we decide is the semantics of asdf-encodings.
- The default is :default. The external-format support in ASDF would seem to be needed to write 'portable' libraries with UTF-8
source files so it will not be possible until users have upgraded anyway. Portability is not gained now by making :utf-8 the default, so I just don't see the advantage of making :utf-8 the default when this would break backward compatibility and make migration problematic and run contra to the ANSI CL standard.
Most library users are *already* using UTF-8, in a way that in practice works well in the common case. My goal with asdf-encodings is to (1) make this common case work *reliably*, and (2) (also reliably) support the uncommon case on non-UTF8 source code.
Yes, portability *would* be gained by making UTF-8 the official default, rather than requiring every user to somehow magically setup his Lisp environment before he starts invoking ASDF, in a way that makes libraries with contrary encoding assumptions become mutually incompatible.
- At less than 200 lines of code it is just included in asdf.lisp.
That's a valid argument for merging asdf-encodings into asdf, whatever we decide is the semantics of asdf-encodings.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The truth of a proposition has nothing to do with its credibility. And vice versa. — Robert Heinlein, "Time Enough For Love"