On Mon, Apr 9, 2012 at 11:37, Douglas Crosher dtc-asdf@scieneer.com wrote:
Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8?
No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8" That's out of 700 libraries in Quicklisp. Only 9 have been found to be an actual problem, and two are fixed already. https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-...
The only issue is to make the results *reliable* for these systems that depend on UTF-8.
I do see a concern that if developers are required to change their definitions to add :encoding :default then they will be forced to also make sure their user base has upgraded now. Further if their users do upgrade ASDF then it breaks again - there is no migration path for them.
Yes. No one in their right mind would use :encoding :default for a library. Each author knows what encoding he uses, say :latin1, :koi8-r, :mac-roman or :euc-jp, and would specify just that, not :default.
I was thinking of :default 1- because I hadn't written asdf-encodings yet, and needed *some* way to support those setups 2- for full backwards compatibility: "if it's not backwards, it's not compatible"
Perhaps the difference is that portable UTF-8 source is new source and requires an upgrade of ASDF anyway, whereas making the default :utf-8 forces :encoding :default on current users and affects legacy code that is already written without a migration path.
UTF-8 is not just for new source. It doesn't require an upgrade of ASDF. There is plenty of UTF-8 source already, though mostly for comments (but not only for comments: see e.g. λ-reader). All modern implementations support UTF-8, though not always as the default. Let's just make it a reliable default so we can WORM (write once run everywhere). And the migration path is clear: recode l1..u8 foo.lisp
- thus, library developers can do nothing but wait for EVERYONE
to be using a recent ASDF before they can do anything.
Wouldn't this be the reality for portable libraries no matter which default is chosen?
Whatever the default encoding is, libraries can't use :encoding until all their users use a recent ASDF. But if :utf-8 becomes the default and they use it, they can already enjoy the benefits of deterministic encoding, and tell users who have encoding issues "just upgrade your ASDF".
- Therefore, noone will enjoy any benefit of :encoding for a year,
and when we do, it will cause massive backward incompatibility.
I don't appreciate the 'massive backward incompatibility' so perhaps do not understand your perspective? I see that future projects using UTF-8 source would need to declare this in the system definition, but this would not seem to qualify.
If the default is :default and you want to enjoy reliable utf-8, then you'll need to specify :encoding :utf-8, at which point your library ceases to be compatible with users who haven't upgraded ASDF. I call that massive backward incompatibility.
If the default is :utf-8 and your library has a latin1 character, you use recode, and your new code still works on old ASDFs as well as new ones. That's massive backward compatibility.
Choosing :default would seem to cause the least backward incompatibility as this is the current behaviour, and offers a migration path to get ASDF upgrades in place.
It's compatible for now, but setting us up for massive incompatibility later.
Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way.
It would be great if some suggestions like this could be offered to ease the transition.
I inserted this suggestion in the ASDF documentation. I can't retroactively modify old ASDF installations to point people at precisely the paragraph they need to consult in the docs when they upgrade and things break for them, but I trust that Google will help them.
Most portable libraries are ASCII, and there would be some benefit in libraries needing UTF-8 support to declare this in the system definition.
ASCII libraries will work everywhere anyway whatever we do about the default. That is, until some maniac writes a Lisp using EBCDIC; and still making UTF-8 the default will ensure he can still just download source from the net and use it without having to transcode it for his implementation. Of course, a lot of code that assumes ASCII or ASCII-like continuity of letter ranges with fail, but that's a given if he uses EBCDIC.
There may be a concern that their users would have to upgrade ASDF now.
No. Making :utf-8 the default means no one needs to upgrade ASDF now, but a few people may have to upgrade a few libraries when they upgrade ASDF.
Making :default the default and forcing people to use :encoding :utf-8 to enjoy any reliability means people who use libraries that want to be reliable will be forced to upgrade ASDF.
How can everyone enjoy reliable non-ASCII today, without the user base having upgraded ASDF?
Mostly, they can setup their system defaults to UTF-8 and enjoy most Lisp code already on most implementations. When they stray from this default setup I want to formalize, nothing works reliably today.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Merely having an open mind is nothing; the object of opening the mind, as of opening the mouth, is to shut it again on something solid. — G.K. Chesterton