Re: [asdf-devel] Update: encoding file options version.

19 Apr 2012

      On 04/19/2012 10:12 AM, Faré wrote:
...
On Wed, Apr 18, 2012 at 10:24, Douglas Crosher <dtc-asdf@scieneer.com> wrote:
...
A revised version of ASDF replacing the system definition :encoding support with support for reading the encoding from file options
is now available at: http://www.scieneer.com/files/asdf.lisp
...
I understand this may be frustrating for you,
especially since for some reason you truly hate the idea of a
:encoding specification
(though I'm not sure I understand why).
Sure, lets keep the encoding code output of ASDF - after considering your points below it does seem clear that it is best to keep
this output of ASDF.   I was trying to address the need for a bundled solution, but this does not appear necessary.    However I
would ask you to reconsider the impact of releasing ASDF with the :encoding declaration bundled and of recommending its use.

If ASDF is released with the :encoding system definition declaration and further offered as the only solution to authors of portable
UTF-8 code then some will no doubt start using this and the :encoding declarations will be in system definition files for tools to
deal with and to be supported in future.

The encoding file option seems to do a better job of solving the problem and is usable by other code.  It solves the problem of the
system definition file encoding, which the :encoding declaration can't.  It can be used by automated recoding tools that are badly
needed - there is a path for Quicklisp to automatically recode projects to suit the CL implementation.  It can be used by CL
implementations for 'load and 'compile-file, and by editors.  This seems like the best path for solving the problem.

Once the encoding file option is implemented, the :encoding declaration would seem to just be a liability.  Code that already uses
the :encoding declaration will not assist other tools that look for the encoding file option.  For example if there are some
Quicklisp projects using an :encoding option then recoding their source becomes more problematic or the tools much more complex.
Further there is the problem of what to do in future if there is a conflict between the :encoding declaration and an encoding file
option.  What if someone does recode files and this adds a coding file option but does not track down and update :encoding
declarations in scope?

For these reasons I think the :encoding declaration is a dead end and a liability, and that it should not be released, and that you
should not be encouraging its use.

I suggest that the encoding file option is a good plan, and that this be communicated to users, and that the social solution of
having everyone use UTF-8 be toned down.
...
However, keeping the encodings support separate, even temporarily, has
several advantages:
Sure, it's a chunk of code that does not seem to really belong in ASDF.

Are you sure that Quicklisp does not need any support bundled?  Perhaps it can bootstrap from just asdf.lisp, keep itself ASCII
clean, build itself, then download and install asdf-encodings which could be ASCII clean too.  Sounds like a good plan if it can all
work.

I was working on a bare minimum as an alternative to the :encoding declaration, there are hooks for the rest.  Perhaps even this is
unnecessary, in which case please do keep it all out of ASDF.  But please also consider removing the :encoding support too, as it
does seem like a liability - just keep the hooks and default to :default.  Sounds like a good plan.

Regarding the hooks, might it be better for them to be lists of functions to call in turn until successful, so that multiple
projects can add hooks and still work together?
...
* it allows this particular fast moving code to evolve and be refined
without burdening asdf,
 and without having to cast in stone design choices made before we
fully understand the issues.
Same could be said for the :encoding declaration.  You may regret releasing it and having to maintain it, deal with conflicts with
future file option solutions, and to deal with authors who keep using it, and with tools that don't work with it!
...
* it keeps ASDF small for most people, yet allows the extension code
to grow big.
Agreed, and I have been trying to strip it down to a bare minimum that could be bundled, and even this is in a hook that could be
replaced when asdf-encodings is loaded.
...
For instance, not only can the extension itself can have many long
files and depend on many libraries,
 it is conceivable that a far future version of asdf-encodings
 could translate on the fly between encodings not supported by an implementation
 and utf-8 as supported by the implementation (this would require
additional hooks in ASDF).
Great, I am glad you can see a path to support non-UTF-8 too.
...
* It fits the minimalist design goal of ASDF, whereby
 ASDF tries to contain only the bare minimal to build Lisp software,
 together with enough extension hooks so that desired features can be
built on top.
 I believe this has been a relative success so far, with extensions including
 CFFI-GROVEL, HU.DWIM.ASDF, POIU, ASDF-DEPENDENCY-GROVEL,
 plus various things used at ITA that I know, and possibly more used
other places that I don't.
And yes, if and when we reach some stable, widely accepted
design and implementation in asdf-encodings,
it will be time to consider integrating it into ASDF itself.
However, unlike the case of the source-registry and the
output-translations layers,
where bootstrapping was an issue that made keeping those layers
separate a self-defeating non-starter,
I believe we can afford (at least for now) to require people who need
non-standard encodings
to load asdf-encodings separately and to merge that layer at a later
date (if ever).
As for the specific code you propose,
* I asked on #emacs pointers to how Emacs identifies coding.
 I documented the results in comments in asdf-encodings.
 The Emacs way differs from your code in various ways.
 If we are going that way,
 is there any reason not to "just adopt" the Emacs code?
It's written in C, and is a big chunk of code, and puts a lot more weight on auto-detection.  My code does the bare minimum to read
the file options, and it has been tested on every encoding supported by my system that also supports the characters needed for CL
code (ebcdic excluded, but all these also works with another 40 lines of code).
...
* Does it make sense for a file to have a UTF-16LE header that
 specifies coding: koi8-r ? I don't think so.
Yes, it is inconsistent, but it may be better to pass on the file option anyway so the error is detected.  There are only a few
cases that are detected from the BOM.  Keep in mind that a BOM can be added when not appropriate and most decoders will just ignore
it and keep working, so reading and returning the file option seems the best path.
...
Or a pun file that in UTF-16BE says it's UTF-16LE,
 and the other way around (or a longer circuit)?
 I think your algorithm tries both too hard (as in this case),
 and too little (as in cases where Emacs finds a coding and your code doesn't).
I am not aware of any cases where my code fails to read the file options, and there is a big set of tests available to confirm this.

Keep in mind that detection is not 100% reliable, and there are often multiple encodings that match a file.  One concern is that if
people start trusting auto-detection and not adding a file option then the mechanism become less reliable - another tool may not
have the same detection algorithm or make different assumptions.

Reading the file option is reliable, and would likely remain the first thing to check.
...
* All in all that doesn't mean your code is bad,
 but that probably means we should experiment with it and tweak it,
 before we declare ourselves satisfied with burning it into ASDF
 (which is somewhat less easy to upgrade than a casual library).
I am not suggesting releasing the code, just making the progress available.  The reading of the file options has been well tested
though.  Other areas needing work are the translations of the external-formats for each CL implementation, and compatibility with
the Emacs codings.
...
—♯ƒ
I'm a polyatheist — there are many gods I don't believe in. — Dan Fouts