[asdf-devel] source file encoding

newer
[asdf-devel] Ran the tests on Mac...

older
[asdf-devel] logical pathname fixes

Anton Vodonosov

29 Jan 2011 29 Jan '11

4:24 a.m.

Hello. How to specify source files encoding in ASD2? I.e. what value to pass to the :external-format argument of the compile-file? In the old ASDF I did it by (let ((asdf:*compile-file-external-format* #+:clisp charset:cp1251 #+:sbcl :cp1251)) (asdf:operate 'asdf:load-op :mysystem)) But today the variable asdf:*compile-file-external-format* does not exist. Also, it of course would be better to specify the encoding in the .asd file (the system author knows the files encoding, and should care about this, but not the system user). Could you advise me how to do it? Best regards, - Anton

Show replies by date

Cyrus Harmon

29 Jan 29 Jan

8:18 a.m.

And, even worse, how do we asdf:load-op .asd files that contain non-ascii characters? (besides changing the locale, which, while it will work for a single .asd file, doesn't address the issue of how to load multiple ASDF systems from different locales). thanks, Cyrus On Jan 29, 2011, at 4:24 AM, Anton Vodonosov wrote:

...

Hello.

How to specify source files encoding in ASD2? I.e. what value to pass to the :external-format argument of the compile-file? In the old ASDF I did it by

(let ((asdf:*compile-file-external-format* #+:clisp charset:cp1251 #+:sbcl :cp1251)) (asdf:operate 'asdf:load-op :mysystem))

But today the variable asdf:*compile-file-external-format* does not exist. Also, it of course would be better to specify the encoding in the .asd file (the system author knows the files encoding, and should care about this, but not the system user).

Could you advise me how to do it?

Best regards, - Anton

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

Faré

9:08 a.m.

On 29 January 2011 11:18, Cyrus Harmon <ch-lisp@bobobeach.com> wrote:

...

And, even worse, how do we asdf:load-op .asd files that contain non-ascii characters? (besides changing the locale, which, while it will work for a single .asd file, doesn't address the issue of how to load multiple ASDF systems from different locales).

Probably one of the below makes sense: 1- we don't do it, and declare it non-portable. 2- we don't do it, and enforce loading in US-ASCII, a la SBCL 3- we say it's whatever trivial byte to character encoding the implementation provides, and use iso-8859-1 on all platforms that have it. 4- we embrace Unicode, and say it's UTF-8 by default wherever supported, falling back to iso-8859-1 or whatever on implementations that don't have it. I'd vote for 4. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Insanity is hereditary — you get it from your kids.

Cyrus Harmon

9:15 a.m.

asdf:*load-external-format* perhaps? On Jan 29, 2011, at 9:08 AM, Faré wrote:

...

On 29 January 2011 11:18, Cyrus Harmon <ch-lisp@bobobeach.com> wrote:

...
And, even worse, how do we asdf:load-op .asd files that contain non-ascii characters? (besides changing the locale, which, while it will work for a single .asd file, doesn't address the issue of how to load multiple ASDF systems from different locales).

Probably one of the below makes sense: 1- we don't do it, and declare it non-portable. 2- we don't do it, and enforce loading in US-ASCII, a la SBCL 3- we say it's whatever trivial byte to character encoding the implementation provides, and use iso-8859-1 on all platforms that have it. 4- we embrace Unicode, and say it's UTF-8 by default wherever supported, falling back to iso-8859-1 or whatever on implementations that don't have it.

I'd vote for 4.

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Insanity is hereditary — you get it from your kids.

Anton Vodonosov

12:42 p.m.

29.01.2011, 19:36, "Faré" <fahree@gmail.com>:

...

Dear Anton,

Sorry, I see no trace of *compile-file-external-format* ... it seems to rely on some local patch to ASDF that was never merged upstream.

You are right! Now I remember, when I worked on that project several years ago I just opened asdf.lisp, found the compile-file call and introduced the *compile-file-external-format* there, and then passed the encoding via this variable. I am not undertaking the patch now, because the project I am working on will only be started on my development machine and my server, and I can use some easy workaround, e.g. most lisps accept default encoding as a command line argument. My first letter was to ensure I am not overlooking a standard way for specifying the encoding. Anyway, thank you for the info, it's interesting to know. Also, several notes, which may be useful later, when someone will implement the patch eventually. In 99.9% of cases it is enough to specify encoding for the whole system, not for separate files. Only in some extraordinary case the system author would chose to store source files in different encodings.

...

Also it might or might not be a good idea to store the external-format in a slot of cl-source-file, and to have a proper :initform in it with a valid default value to be used when upgrading ASDF.

How the slots are populated from the defsystem expression? E.g. if I have (:file "package" :enc :utf-8) will the :enc :utf-8 be passed as initargs to (make-instance 'cl-source-file)? Or for (defsystem :mysystem :version "0.1.0" :serial t :enc :utf-8 .... Are these attributes passed to the component instantiation as initargs?

...

The problem for you will be to reasonably support 11 implementations existing implementations or so.

Actually, not a big problem. We will just create a mapping from the encoding specifications allowed in .asd files to the encoding specification of the underlying compiler. Like (defun enc (enc) (case enc ((:utf8 :utf-8) #+:clisp 'charset:utf-8 #+:sbcl :utf8 #+ccl :utf-8 ....) ((:cp1251 :cp-1251) #+:clisp 'charset:cp1251 #+:sbcl :cp1251 #+ccl :cp-1251) ...) ... ) Would you accept a patch with support only 7-10 the most important encodings (all unicodes + several the most frequent single-byte encodings)? 29.01.2011, 20:15, "Cyrus Harmon" <ch-lisp@bobobeach.com>:

...

asdf:*load-external-format* perhaps?

Does the problem with national characters in .asd files really exits? Do you use non ASCII characters in .asd files? asdf:*load-external-format* would be more flexible than a hard-coded encoding, but it still doesn't solve the problem you mentioned: handling several .asd files with different encodings. If start improvements, IMHO enforcing UTF-8 is a good start and should be enough (the option 4 listed by Fare). If more is needed, a complete solution allowing per .asd encoding specification is better. We need to chose a good notation, that will allow reasonably simple implementation. It might be either Emacs comment in the first line ;;; -*- coding: utf-8; -*- Or special lisp form: (asdf:asd-file-encoding :utf-8) But interpretation of that form will require switching encoding of the lisp reader stream, which I believe will be problematic on some Lisps. Therefore it will require feeding the reader from our custom input stream implementation, like flexi-streams. And still it will be not good enough, because only ASDF will create that special stream for the .asd files, when you execute it from REPL/SLIME, the meaning of that expression is unclear. Another alternative, is naming conventions for .asd files: mysystem.utf-8.asd. It's simple to implement, and after some thinking, it seems better than the two suggestions above. But again, we should decide if the problem really exists and avoid solving problems that we don't have. I personally never use national characters in .asd files. Best regards, - Anton

Faré

1:02 p.m.

On 29 January 2011 15:42, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

You are right! Now I remember, when I worked on that project several years ago I just opened asdf.lisp, found the compile-file call and introduced the *compile-file-external-format* there, and then passed the encoding via this variable.

Since it's for both compile-op and load-source-op, a better name is required than compile-file-*. Maybe *cl-source-file-external-format* ?

...

I am not undertaking the patch now, because the project I am working on will only be started on my development machine and my server, and I can use some easy workaround, e.g. most lisps accept default encoding as a command line argument.

Makes sense.

...

Also, several notes, which may be useful later, when someone will implement the patch eventually.

In 99.9% of cases it is enough to specify encoding for the whole system, not for separate files. Only in some extraordinary case the system author would chose to store source files in different encodings.

Yes, but as you note below, the decision is rightly done per-system, rather than globally. Whoever writes the files is who knows their encoding, not anyone else. ASDF2 follows the principle: he who knows is he who specifies.

...

...
Also it might or might not be a good idea to store the external-format in a slot of cl-source-file, and to have a proper :initform in it with a valid default value to be used when upgrading ASDF.

How the slots are populated from the defsystem expression?

E.g. if I have

(:file "package" :enc :utf-8)

will the :enc :utf-8 be passed as initargs to (make-instance 'cl-source-file)?

Or for

(defsystem :mysystem :version "0.1.0" :serial t :enc :utf-8 .... Yes, except that :external-format or :encoding is to be used instead of :enc. New abbreviations are evil.

...

Are these attributes passed to the component instantiation as initargs?

Yes they are.

...

...
The problem for you will be to reasonably support 11 implementations existing implementations or so.

Actually, not a big problem. We will just create a mapping from the encoding specifications allowed in .asd files to the encoding specification of the underlying compiler.

...

(defun enc (enc) (case enc ((:utf8 :utf-8) #+:clisp 'charset:utf-8 #+:sbcl :utf8 #+ccl :utf-8 ....) ((:cp1251 :cp-1251) #+:clisp 'charset:cp1251 #+:sbcl :cp1251 #+ccl :cp-1251) ...) ... ) Ouch. I'd rather we leave only the bare minimum in asdf itself, i.e.

Not big, but painful to get right. At the very least, unsupported implementations should keep the previous behavior rather than be broken. the default, utf-8 or whatever. Any such function, etc., should be imported in some asdf extension that itself uses the default. All .asd files should be using the default. Only lisp files can be customized.

...

Would you accept a patch with support only 7-10 the most important encodings (all unicodes + several the most frequent single-byte encodings)?

Yes.

...

If start improvements, IMHO enforcing UTF-8 is a good start and should be enough (the option 4 listed by Fare).

Yes, I think UTF-8 is the way to go, these days.

...

If more is needed, a complete solution allowing per .asd encoding specification is better. We need to chose a good notation, that will allow reasonably simple implementation.

I suppose you mean that the .asd file specifies per-system default encoding of lisp files. The .asd file itself will be loaded before the encoding may be specified, so will always be loaded with the default, which will presumably be UTF-8. People who don't like UTF-8 encoding for extra characters should stick to US-ASCII. They can have whatever other character sets in their Lisp files - just not the .asd file.

...

It might be either Emacs comment in the first line ;;; -*- coding: utf-8; -*-

Or special lisp form: (asdf:asd-file-encoding :utf-8)

But interpretation of that form will require switching encoding of the lisp reader stream, which I believe will be problematic on some Lisps.

Therefore it will require feeding the reader from our custom input stream implementation, like flexi-streams. And still it will be not good enough, because only ASDF will create that special stream for the .asd files, when you execute it from REPL/SLIME, the meaning of that expression is unclear.

I will NEVER commit that to asdf. That's lots of crazy non-portable infrastructure for precious little gain.

...

Another alternative, is naming conventions for .asd files: mysystem.utf-8.asd. It's simple to implement, and after some thinking, it seems better than the two suggestions above.

Still bad, requires asdf to know about all the potential encoding names. Crazy.

...

But again, we should decide if the problem really exists and avoid solving problems that we don't have. I personally never use national characters in .asd files.

Let's make that compulsory. You want it otherwise, fork ASDF. The only thing we need to do about it is document it. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Many people believe "in the name of the (nation, poor, god, nature...)" like "simon says" justifies or damns political opinions when said or omitted.

Daniel Herring

2:14 p.m.

On Sat, 29 Jan 2011, Faré wrote:

...

On 29 January 2011 11:18, Cyrus Harmon <ch-lisp@bobobeach.com> wrote:

...
And, even worse, how do we asdf:load-op .asd files that contain non-ascii characters? (besides changing the locale, which, while it will work for a single .asd file, doesn't address the issue of how to load multiple ASDF systems from different locales).

Probably one of the below makes sense: ... 4- we embrace Unicode, and say it's UTF-8 by default wherever supported, falling back to iso-8859-1 or whatever on implementations that don't have it.

I'd vote for 4.

I'd agree. There are very few cases which justify using an encoding other than utf-8 (or occasionally utf-16) for source code. None should be relevant for ASD files. Establishing one standard with fallbacks for less-capable implementations is better for portability than trying to accommodate every encoding. - Daniel

Faré

8:36 a.m.

Dear Anton,

...

How to specify source files encoding in ASD2? I.e. what value to pass to the :external-format argument of the compile-file? In the old ASDF I did it by

(let ((asdf:*compile-file-external-format* #+:clisp charset:cp1251 #+:sbcl :cp1251)) (asdf:operate 'asdf:load-op :mysystem))

Sorry, I see no trace of *compile-file-external-format* in 1.369, the version I inherited from Gary King, or 1.97, the version that Gary King inherited from Nikodemus and Xof. I'm not saying what you have isn't a good idea, but it seems to rely on some local patch to ASDF that was never merged upstream.

...

But today the variable asdf:*compile-file-external-format* does not exist. Also, it of course would be better to specify the encoding in the .asd file (the system author knows the files encoding, and should care about this, but not the system user).

Could you advise me how to do it?

You need to send me a patch to ASDF that modifies (defmethod perform ((operation compile-op) (c cl-source-file)) ...) and (defmethod perform ((operation load-source-op) (c cl-source-file)) ...) to do something about external-format. Also it might or might not be a good idea to store the external-format in a slot of cl-source-file, and to have a proper :initform in it with a valid default value to be used when upgrading ASDF. Finally, you need to document that feature in the manual and explain that it will only be available starting with e.g. ASDF 2.013. The problem for you will be to reasonably support 11 implementations existing implementations or so. Good luck! [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] I'm a polyatheist — there are many gods I don't believe in. — Dan Fouts

Orivej Desh

20 Mar 20 Mar

5:04 p.m.

Now, the topic of supporting specifying source encoding is a year away. Should I have not replied to it and rather started a new one?

...

You need to send me a patch to ASDF that modifies (defmethod perform ((operation compile-op) (c cl-source-file)) ...) and (defmethod perform ((operation load-source-op) (c cl-source-file)) ...) to do something about external-format.

I propose the attached file.

...

Also it might or might not be a good idea to store the external-format in a slot of cl-source-file, and to have a proper :initform in it with a valid default value to be used when upgrading ASDF.

It stores encoding in a property of the component, the component being a system or a source file. This allows for both per system and per source file component encoding, the latter taking precedence, without additional effort. In my implementation default :initform would not have helped because #'component-encoding switches between per component system encoding and per component encoding based on the former being specified or not. Hence the default (:default) is embedded in #'component-encoding.

...

The problem for you will be to reasonably support 11 existing implementations or so.

Since making single specification portable requires comparing all external formats of all supported implementations, I think it is reasonable to leave it to the author of a system definition to research by which name his preferred encoding is accessible in different implementations he wants to support, and to specify appropriate read-time conditionals.

...

Finally, you need to document that feature in the manual and explain that it will only be available starting with e.g. ASDF 2.013.

I have not written documentation or tests yet, only tested manually and ensured that existing tests still pass with SBCL.

Faré

6:36 p.m.

On Tue, Mar 20, 2012 at 20:04, Orivej Desh <orivej@gmx.fr> wrote:

...

Now, the topic of supporting specifying source encoding is a year away. Should I have not replied to it and rather started a new one?

I think you did well to reply.

...

...
You need to send me a patch to ASDF that modifies (defmethod perform ((operation compile-op) (c cl-source-file)) ...) and (defmethod perform ((operation load-source-op) (c cl-source-file)) ...) to do something about external-format.

I propose the attached file.

Thanks. I know that Stelian Ionescu was also working on it, so I'm giving him an opportunity to chime in before I merge that. Also, I agree with Stelian that it's better to standardize on one default encoding for all files to be loaded by ASDF. If we do, then there's a chance that things will work without user configuration. If we don't, we're pushing configuration onto the user, and guaranteeing misery for newbies, and hard-to-debug situations even for seasoned users. These days, UTF-8 looks like the obvious encoding to standardize on. And on implementations that don't support UTF-8, some 8-bit-clean encoding that will at least accept UTF-8 encoded comments and has a chance of doing the right things with strings and symbols. Therefore, we'd use something like that: (defparameter *utf-8-external-format* #+sbcl :utf-8 ... #-(or sbcl ...) :default "external-format argument to pass for CL:OPEN to accept UTF-8 encoded source code")

...

...
Also it might or might not be a good idea to store the external-format in a slot of cl-source-file, and to have a proper :initform in it with a valid default value to be used when upgrading ASDF.

It stores encoding in a property of the component, the component being a system or a source file. This allows for both per system and per source file component encoding, the latter taking precedence, without additional effort. In my implementation default :initform would not have helped because #'component-encoding switches between per component system encoding and per component encoding based on the former being specified or not. Hence the default (:default) is embedded in #'component-encoding.

I think you're doing the right thing, except that (1) we should probably use "external-format" instead of "encoding", since that's what the CLHS calls it, and that (2) the default should be *utf-8-external-format*. Then there's the whole horror of CR/LF that I'm trying to not think about.

...

...
The problem for you will be to reasonably support 11 existing implementations or so.

Since making single specification portable requires comparing all external formats of all supported implementations, I think it is reasonable to leave it to the author of a system definition to research by which name his preferred encoding is accessible in different implementations he wants to support, and to specify appropriate read-time conditionals.

I think it's OK to require authors who want non-default settings to do their research on how to do it on each and every platform they want to support (or depend on a library that does it for them). But I think it's a mistake to fail to provide a sensible default, which in effect forces EVERYONE to do to the research or face crazy error situations in some of their users. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The Constitution may not be perfect, but it's a lot better than what we've got!

Orivej Desh

7:30 p.m.

I agree with all that you've written. As for the last point, transformation of the external format may happen in #'(setf component-external-format); and here is a slightly updated patch, this time as a proper attachment.

Faré

21 Mar 21 Mar

1:38 a.m.

On Tue, Mar 20, 2012 at 22:30, Orivej Desh <orivej@gmx.fr> wrote:

...

I agree with all that you've written. As for the last point, transformation of the external format may happen in #'(setf component-external-format); and here is a slightly updated patch, this time as a proper attachment.

OK, I pushed Orivej's patch, plus UTF-8 support for more than SBCL, as ASDF 2.20.1. Please test. It also includes a fix for ECL, that I somehow broke in 2.019.9 without anyone noticing (or notifying me). Oops. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Law is always law of the strongest. The only question is what feedback mechanism selects this strongest and controls his behavior.

Orivej Desh

3:29 a.m.

I suggest to intern keyword external format in charset package in CLISP in #'(setf component-external-format) as a sane default.

Robert Goldman

12:34 p.m.

I think this is great, but I think it's critical that we get a test case or two before we try pushing it out into the world. Orivej, can you provide a sample .asd file and some .lisp files with an exciting encoding? If so, I'm sure we could wrap them up in a test or two. thanks, r

Orivej Desh

3 p.m.

Here is a test case for SBCL and CLISP. (CLISP part needs my last patch applied.) This ASDF system is supposed to compile and load without warnings or errors. That patch makes CLISP behave like this when a charset cannot be found: Error while trying to load definition for system external-format-test from pathname /tmp/external-format-test/external-format-test.asd: INTERN("SHIFT_JIS"): #<PACKAGE CHARSET> is locked If I replace #'intern with #'find-symbol in it the message is: OPEN: Illegal :EXTERNAL-FORMAT argument :SHIFT_JIS with a useful restart USE-VALUE. I think the latter is somewhat better, hence updated patch. I'd also like to express concern that this change has to be properly announced because it will affect those who use ASDF for their local projects with non UTF-8 compliant encoding and previously assumed that other ASDF systems fit into ASCII.

Orivej Desh

4:23 p.m.

Oops, that was an old one.

Robert Goldman

22 Mar 22 Mar

7:31 a.m.

Sorry --- I am catching up on this discussion after a long time doing other things. Looking things over, I was left with some questions (I hope these haven't been thoroughly discussed and I missed the answers): Is there some reason why we must put the external-format into the property list instead of just giving it a slot in the component class definition? Also, what sort of an entity are the external format values? Is it always a keyword symbol? Can we say that it should always be a keyword and that we will massage it to something else, if necessary, for the benefit of the implementation when reading a file? In that case we could have an accessor that will do the implementation-specific massaging for us (e.g., we could store :utf-8, but on clisp we would present charset:utf-8 when reading...). That seems somehow tidier to me, rather than changing the value behind the programmer's back as we do here. OTOH, we do quietly change symbols to strings, so maybe I'm just talking through my hat. I wasn't sure I understood the following paragraph, either: "I'd also like to express concern that this change has to be properly announced because it will affect those who use ASDF for their local projects with non UTF-8 compliant encoding and previously assumed that other ASDF systems fit into ASCII." Can you amplify? Thanks for any enlightenment!

Faré

6:31 p.m.

On Thu, Mar 22, 2012 at 10:31, Robert Goldman <rpgoldman@sift.info> wrote:

...

Sorry --- I am catching up on this discussion after a long time doing other things. Looking things over, I was left with some questions (I hope these haven't been thoroughly discussed and I missed the answers):

It's all a slow thing, anyway. Note that I already merged Orivej's patch (with modifications) into 2.20.2. NB: we need to add tests to the test suite before release — and not just for SBCL and CLISP.

...

Is there some reason why we must put the external-format into the property list instead of just giving it a slot in the component class definition?

Indeed that bugged me a little bit. At first, I thought the idea was to use :properties (:external-format :iso-8859-1) or some such, and let .asd files be backwards compatible with older versions of asdf. If the idea is to use :external-format :iso-8859-1 instead directly in the component form and require users of given systems to upgrade to asdf 2.21 or later (e.g. using :defsystem-depends-on ((:version :asdf "2.21"))) then it doesn't make sense to use properties, and instead we should use a slot just like for :around-compile (which has the same inheritance mechanism).

...

Also, what sort of an entity are the external format values? Is it always a keyword symbol? Can we say that it should always be a keyword and that we will massage it to something else, if necessary, for the benefit of the implementation when reading a file?

Yes, making it a symbol sounds like a good idea, especially since (1) defsystem forms are wholly read before any defsystem-depends-on is evaluated and is given the chance to create another package, much less define functions to use in #. (2) defsystem forms are directly processed by the do-defsystem function, which doesn't itself include any mechanism for macroexpansion or programmability. Of course, every implementation has its own variants for accepted keywords (or not a keyword, in the case of clisp), and we do NOT want to pull all a (big, and possibly ever changing) external-format library into ASDF. Therefore, what we want is: (1) provide a sensible default. Probably :default, or :utf-8. (2) provide a hook onto which a library that is defsystem-depends-on'ed can define a translation layer between some standard keyword interface e.g. one or many of :latin1, :latin-1 or :iso-8859-1, and whatever the implementation actually wants, e.g. #+sbcl :latin1 #+clisp charset:iso-8859-1 ... I'd say we're not ready for release 2.21 until we have (1) a coherent interface, either through properties or its own slot. (2) a good story on backwards-compatibility or lack thereof (so when there's breakage, error messages will clearly direct the user onto which component needs to be updated.) (3) a sane default. I believe :utf-8 it should be on all modern platforms and whatever doesn't break too bad on legacy systems (genera, gcl, cormanlisp, mcl) (4) tests that exercise said functionality and pass on all systems. (5) a hook to allow for portable extension of the set of defined external-formats, by, e.g. babel. (6) tests for this extension. Note: I believe we should create a new test suite alongside the current one; the new test suite would use a master process and slaves processes (using inferior-shell), and hopefully a test tool that's portable to all our target platforms (at least all those covered by the current test suite). —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Classical liberalism is not an economic doctrine. It is a theory of Law.

Orivej Desh

25 Mar 25 Mar

7:47 p.m.

This is just a curt supplement to Faré's answer. (Hopefully I won't be bothered anymore by common-lisp.net antispam machinery.)

...

Is there some reason why we must put the external-format into the property list instead of just giving it a slot in the component class definition?

It was my ignorance.

...

Also, what sort of an entity are the external format values? Is it always a keyword symbol? Can we say that it should always be a keyword and that we will massage it to something else, if necessary, for the benefit of the implementation when reading a file?

Maybe yes. Consider that e.g. some implementations accept more options (mostly to control line terminators) — CLISP as instances of ext:encoding, LispWorks as lists like '(:latin-1 :eol-style :lf); but then CLISP explicitly says that line terminators don't matter during input.

...

In that case we could have an accessor that will do the implementation-specific massaging for us (e.g., we could store :utf-8, but on clisp we would present charset:utf-8 when reading...). That seems somehow tidier to me, rather than changing the value behind the programmer's back as we do here. OTOH, we do quietly change symbols to strings, so maybe I'm just talking through my hat.

I'd appreciate if you explain in a more detail what happens when and how. Is it like in the attached patch, but with logic moved from the setf'er to the accessor?

...

I wasn't sure I understood the following paragraph, either

If one goes beyond ASCII, saves not in UTF-8 (as expected on MS Windows, but even on Linux LispWorks Personal IDE tried to save a file in Latin-1), manages local projects with ASDF and upgrades ASDF, he will be affected.

Faré

9:21 p.m.

On Sun, Mar 25, 2012 at 22:47, Orivej Desh <c@orivej.org> wrote:

...

...
Is there some reason why we must put the external-format into the property list instead of just giving it a slot in the component class definition?

It was my ignorance.

I thought it was to allow a backwards-compatible syntax of (:file "foo" :properties (:encoding :latin1)) If instead we want to encourage people to use (:file "foo" :encoding :latin1) and force users to upgrade ASDF, then it makes no sense using properties instead of a slot. Are we set in requiring that this new encoding specification will require ASDF 2.21 to work? If so, we should ask people to not actually start using it in libraries until a few months from now, when ASDF 2.21 is more widely available (i.e. has made it to Quicklisp, SBCL, etc.). Note that in the end, I prefer :encoding if we're going to add an implicit translation layer between that and the actual :external-format option of CL:LOAD, so the user understands there's a difference. If we're going to NOT going to add a translation layer, and instead require users to use #.(foo:encoding-to-external-format :latin1).

...

...
Also, what sort of an entity are the external format values? Is it always a keyword symbol? Can we say that it should always be a keyword and that we will massage it to something else, if necessary, for the benefit of the implementation when reading a file?

Maybe yes. Consider that e.g. some implementations accept more options (mostly to control line terminators) — CLISP as instances of ext:encoding, LispWorks as lists like '(:latin-1 :eol-style :lf); but then CLISP explicitly says that line terminators don't matter during input.

Oh yeah, I had tried to blank out on line terminators. Hopefully, they won't matter much indeed, since ASDF only cares about input encoding, and line terminators are an output option. That's one more reason to call our thing :encoding instead of :external-format.

...

...
In that case we could have an accessor that will do the implementation-specific massaging for us (e.g., we could store :utf-8, but on clisp we would present charset:utf-8 when reading...). That seems somehow tidier to me, rather than changing the value behind the programmer's back as we do here. OTOH, we do quietly change symbols to strings, so maybe I'm just talking through my hat.

I'd appreciate if you explain in a more detail what happens when and how. Is it like in the attached patch, but with logic moved from the setf'er to the accessor?

I suppose we'll have something like that: (defun trivial-encoding-to-external-format-hook (encoding) (declare (ignore encoding)) *utf-8-external-format*) (defvar *encoding-to-external-format-hook* #'trivial-encoding-to-external-format-hook) ... (load ... :external-format (funcall *encoding-to-external-format-hook* encoding) ...) ... Then you'll have to :defsystem-depends-on (:asdf-encodings) or some such to be able to use different encodings.

...

If one goes beyond ASCII, saves not in UTF-8 (as expected on MS Windows, but even on Linux LispWorks Personal IDE tried to save a file in Latin-1), manages local projects with ASDF and upgrades ASDF, he will be affected.

Ouch. I'd say that in this case, LW personal has obsolete default settings. I still think that ASDF should assume utf-8 by default. I've committed something along those lines as 2.20.3. Only minimal testing for no obvious breakage (make test, using sbcl). —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Time and money spent in helping men to do more for themselves is far better than mere giving. — Henry Ford

Douglas Crosher

29 Mar 29 Mar

4:07 a.m.

My suggestion is to just use :external-format plus some minimal translations for CLISP to help write portable code, for example :utf-8 and :iso-8859-1 or :latin1, which should cover most portable CL projects, and to make the default :default. It is not possible to validate the external-format so the default action should be to pass it through. With a default of passing it through, it would seem most appropriate to name it :external-format, and to call any processing a translation. The external-format is implementation dependent and user extensible which making it impossible for ASDF to validate. Requiring a custom ASDF hook to be installed seems an unnecessary burden with no utility. CL implementations are required to recognise :default as an external-format, and could be expected to have a sensible default for their environment. Any other default value is problematic as it may not be supported. If a project really needs UTF-8 source files then it would not appear to be a big burden to require this to be specified. A lot of portable code can be written without needing UTF-8 source files, even code that supports UNICODE.

...

(:file "foo" :external-format :latin1))

Regards Douglas Crosher

Faré

30 Mar 30 Mar

3:21 p.m.

On Thu, Mar 29, 2012 at 07:07, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

My suggestion is to just use :external-format plus some minimal translations for CLISP to help write portable code, for example :utf-8 and :iso-8859-1 or :latin1, which should cover most portable CL projects, and to make the default :default.

Well, :external-format is not quite portable, and any translation layer means that reusing the same name will lead to confusion.

...

It is not possible to validate the external-format so the default action should be to pass it through. With a default of passing it through, it would seem most appropriate to name it :external-format, and to call any processing a translation.

The issue is that the only qualified person to specify what the encoding is is whoever writes (or repackages) the source code. Pushing the responsibility back to whoever tries to use the source code is the current situation, and just doesn't work so well.

...

The external-format is implementation dependent and user extensible which making it impossible for ASDF to validate. Requiring a custom ASDF hook to be installed seems an unnecessary burden with no utility.

But that's precisely the point: whoever packages the source does not control which implementation is used, and needs to specify the encoding in an implementation-independent way. Sure, we could force packagers to use #. all over the place, but it's more declarative and better overall to move the translation to a hook in ASDF: same work for whoever codes the translation library up to trivial registering of hook, and much less work for whoever uses it.

...

CL implementations are required to recognise :default as an external-format, and could be expected to have a sensible default for their environment. But the file encoding is inherent in the source code as distributed, not in the implementation's environment.

...

Any other default value is problematic as it may not be supported. If a project really needs UTF-8 source files then it would not appear to be a big burden to require this to be specified.

...

A lot of portable code can be written without needing UTF-8 source files, even code that supports UNICODE.

Whenever people restrict themselves to ASCII, they are already pretty much guaranteed to have their code work everywhere. I'd like for a similar guarantee to be available to everyone using some superset of ASCII; and UTF-8 is the obvious choice for a superset of ASCII that is suitable for everyone. Obviously, some legacy 8-bit only implementations just can't do it, and that's where we'll use :default. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org What we Are is God's gift to us. What we Become is our gift to God.

Orivej Desh

31 Mar 31 Mar

7:38 a.m.

Douglas' main point may be transformed as follows, which is a legitimate question: if the task is to extend the supported character set to UTF-8, is not it solved by accepting :encoding option and defining default #'encoding-external-format which understands (nothing but) :utf-8? Given that, should the default be UTF-8 rather then :default? Answering `yes' might cause more or less trouble to some people, answering `no' will provide for a gradual transition. I think we should ask Zach Beane about issues with unspecified and discerned external formats. Another issue which somewhat bothers me: is such kind of a hook right? It seems to be inherently unmanaged (just like *macroexpand-hook*), i.e. setting it in a system affects future loaded systems, unless it is set lexically in around-compile. But then, it might as well be another ASDF option (say, either a package designator which exports #'encoding-external-format, or a list of a package and a keyworded symbol designating desired function). (By the way, I wouldn't call a hooked function a hook, so that #'default-encoding-external-format-hook would be #'default-encoding-external-format.) The last issue relates to the strictness of the default-encoding-external-format. Probably it's all right, but then wouldn't it be good to define a permissive alternative which behaves like in 2.20.2?

Faré

10:09 a.m.

On Sat, Mar 31, 2012 at 10:38, Orivej Desh <c@orivej.org> wrote:

...

Douglas' main point may be transformed as follows, which is a legitimate question: if the task is to extend the supported character set to UTF-8, is not it solved by accepting :encoding option and defining default #'encoding-external-format which understands (nothing but) :utf-8?

Yes, that's what we have now with 2.20.x.

...

Given that, should the default be UTF-8 rather then :default? Answering `yes' might cause more or less trouble to some people, answering `no' will provide for a gradual transition. I think we should ask Zach Beane about issues with unspecified and discerned external formats.

Source code that uses more than the ASCII character set wasn't portably supported previously, but in practice, utf-8 worked everywhere and was backhandedly enforced by a lot of people using SBCL and utf-8 and sending reports to authors so they make their packages compatible. This change therefore only formalizes a de facto standard, and allows for extension and customization where no such thing was previously possible. In the future, maybe we should distinguish between :default that is :utf-8 where supported and falls back where not supported, and :utf-8 that means "I really really want utf-8", e.g. for lambda-reader? I think it'll be better solved as using :utf-8 in all cases and #-asdf-unicode (error ...) in the source code when it's not available.

...

Another issue which somewhat bothers me: is such kind of a hook right? It seems to be inherently unmanaged (just like *macroexpand-hook*), i.e. setting it in a system affects future loaded systems, unless it is set lexically in around-compile. But then, it might as well be another ASDF option (say, either a package designator which exports #'encoding-external-format, or a list of a package and a keyworded symbol designating desired function).

Good suggestion: I've refactored the external-format extraction to happen inside the around-compile hook. But yes, the hook is intended as a global hook to be used once, by a global asdf extension called asdf-encodings, to be written. The reason to make it an extension rather than put it all in asdf is that I expect external-format support to be a long and painful thing to write to support all encodings on all implementations; I'd rather that be done outside of ASDF, because it's a lot of code I'd rather not put in ASDF, the development cycles are different, and it shouldn't matter for the vast majority of us who'll use the default settings (i.e. UTF-8).

...

(By the way, I wouldn't call a hooked function a hook, so that #'default-encoding-external-format-hook would be #'default-encoding-external-format.)

Good suggestion. Renamed in 2.20.7.

...

The last issue relates to the strictness of the default-encoding-external-format. Probably it's all right, but then wouldn't it be good to define a permissive alternative which behaves like in 2.20.2?

I'm not sure who's to gain what with that. If you're writing a .asd, you know what charset your code is using. If it's UTF-8 indeed, why would you want to reduce the number of cases in which your charset is correctly recognized? And if it's not UTF-8, you're probably having trouble with "bug" reports from all those SBCL + utf-8 users around today. Or maybe you don't have end-users, and want to force your local encoding; if such cases exist around the world, we might need a solution quicker than expected; and so you've convinced me to add support for an explicit :default as a valid encoding for backwards-compatibility purposes only. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Apparently a government can prevent itself and its successors indefinite from doing bad things, just by writing a note to itself that says "don't do bad things." — Mencius Moldbug on constitutions

Orivej Desh

4:24 p.m.

...

utf-8 worked everywhere and was backhandedly enforced by a lot of people using SBCL and utf-8 and sending reports to authors

I wasn't aware of this. Now the choice of UTF-8 is justified for me. I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8: ./antik-20111105-git/physical-quantities/angle.lisp ./binary-types-20101006-git/example.lisp ./bknr-web-20111105-git/src/web/event-log.lisp ./cells-20101207-git/cells-test/build-sys.lisp ./cl-bibtex-20110110-cvs/cmp.lisp ./cl-pdf-20110418-svn/salza/compressor.lisp ./cl-png-0.6/._lisp-unit.lisp ./cl-typesetting-20110219-svn/examples/business-card.lisp ./clfswm-20120305-git/contrib/server/crypt.lisp ./clsql-20120305-git/examples/sqlite3/init-func/example.lisp ./linedit-20120208-git/terminfo.lisp ./lispbuilder-20110619-svn/lispbuilder-openrm/openrm/window.lisp ./lispbuilder-20110619-svn/lispbuilder-regex/regexp-test-suite.lisp ./mcclim-20110730-cvs/Examples/gadget-test-kr.lisp ./mcclim-20110730-cvs/Experimental/pixel-format.lisp ./mcclim-20110730-cvs/Tools/gilbert/clim-doc-convert.lisp ./mcclim-20120305-cvs/Examples/gadget-test-kr.lisp ./mcclim-20120305-cvs/Experimental/pixel-format.lisp ./mcclim-20120305-cvs/Tools/gilbert/clim-doc-convert.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/appearance-mcl.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/processes.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/progress-indicator.lisp ./metatilities-20101006-darcs/dev/mcl/pop-up-menu.lisp ./montezuma-20120305-git/lucene-in-action/listing-2-1.lisp ./mtlisp-20110522-git/closstar.lisp ./mtlisp-20110522-git/ctrace.lisp ./mtlisp-20110522-git/mt-pkg.lisp ./phemlock-20120305-cvs/src/core/charmacs.lisp ./plain-odbc-20111105-svn/src/test/test-oracle.lisp ./regex-1/regexp-test-suite.lisp ./utils-kt-20101006-git/quad.lisp

Orivej Desh

4:26 p.m.

...

utf-8 worked everywhere and was backhandedly enforced by a lot of people using SBCL and utf-8 and sending reports to authors

Faré

1 Apr 1 Apr

10:44 a.m.

...

I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8:

./antik-20111105-git/physical-quantities/angle.lisp ./binary-types-20101006-git/example.lisp ./bknr-web-20111105-git/src/web/event-log.lisp ./cells-20101207-git/cells-test/build-sys.lisp ./cl-bibtex-20110110-cvs/cmp.lisp ./cl-pdf-20110418-svn/salza/compressor.lisp ./cl-png-0.6/._lisp-unit.lisp ./cl-typesetting-20110219-svn/examples/business-card.lisp ./clfswm-20120305-git/contrib/server/crypt.lisp ./clsql-20120305-git/examples/sqlite3/init-func/example.lisp ./linedit-20120208-git/terminfo.lisp ./lispbuilder-20110619-svn/lispbuilder-openrm/openrm/window.lisp ./lispbuilder-20110619-svn/lispbuilder-regex/regexp-test-suite.lisp ./mcclim-20110730-cvs/Examples/gadget-test-kr.lisp ./mcclim-20110730-cvs/Experimental/pixel-format.lisp ./mcclim-20110730-cvs/Tools/gilbert/clim-doc-convert.lisp ./mcclim-20120305-cvs/Examples/gadget-test-kr.lisp ./mcclim-20120305-cvs/Experimental/pixel-format.lisp ./mcclim-20120305-cvs/Tools/gilbert/clim-doc-convert.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/appearance-mcl.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/processes.lisp ./metatilities-20101006-darcs/dev/contrib/mcl/progress-indicator.lisp ./metatilities-20101006-darcs/dev/mcl/pop-up-menu.lisp ./montezuma-20120305-git/lucene-in-action/listing-2-1.lisp ./mtlisp-20110522-git/closstar.lisp ./mtlisp-20110522-git/ctrace.lisp ./mtlisp-20110522-git/mt-pkg.lisp ./phemlock-20120305-cvs/src/core/charmacs.lisp ./plain-odbc-20111105-svn/src/test/test-oracle.lisp ./regex-1/regexp-test-suite.lisp ./utils-kt-20101006-git/quad.lisp

OK. Any volunteer to contact the authors of these 20 systems and get each issue fixed? Or even to split the job in 2, 3, 4, 5? —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The college idealists who fill the ranks of the environmental movement seem willing to do absolutely anything to save the biosphere, except take science courses and learn something about it. — P.J. O'Rourke

Orivej Desh

3 Apr 3 Apr

5:48 a.m.

I set up a wiki to track this issue at https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-...

Faré

7 Apr 7 Apr

10:04 p.m.

On Sun, Apr 1, 2012 at 13:44, Faré <fahree@gmail.com> wrote:

...

...
I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8:

OK. Any volunteer to contact the authors of these 20 systems and get each issue fixed? Or even to split the job in 2, 3, 4, 5?

OK, I contacted all the authors or maintainers of the offending software that Orivej identified as requiring a fix to compile. In at least one case (regex), the original author's address Michael Parker <mparker762@hotmail.com> is invalid, and I contacted a maintainer instead Michael Weber <michaelw@foldr.org> who might not be the source that Xach uses judging from the version number in Quicklisp (michaelw uses darcs, but quicklisp has a version 1 instead of $date-darcs). I'm not convinced that portable-hemlock is actively maintained either. I updated the status page at https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-... Finally, I created a mostly useless asdf-encodings at: ssh://common-lisp.net/project/asdf/git/asdf-encodings.git git://common-lisp.net/projects/asdf/asdf-encodings.git I invite those people who want to use non-UTF-8 encodings to submit patches to this project. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org A computer is like an Old Testament god, with a lot of rules and no mercy. — Joseph Campbell

Douglas Crosher

8 Apr 8 Apr

4:31 a.m.

The portable-hemlock is still maintained and was updated a few months ago to avoid the use of non-ascii characters in the source so it builds cleanly with UTF-8 as the input external-format. The code is not in great shape, but is being improved. See: http://gitorious.org/hemlock/pages/Home Even if you get all the quick lisp projects converted to be UTF-8 clean, this still represents a subset of ASDF users. I wish you would reconsider these changes to ASDF because I fear it is divisive. It is not reasonable to expect users of ASDF to hack on external support code just to use non-UTF-8 external-formats, and the external library you plan for can never be complete because the external-format is user extensible. ASDF could easily be flexible regarding the external-format and not a limited bastion of portable open source code. It would be very easy and workable to just name this :external-format, and to pass through encodings not recognised - all the quicklisp projects would work just fine using :utf-8 and other CL users could use encodings as needed. Regards Douglas Crosher On 04/08/2012 03:04 PM, Faré wrote:

...

On Sun, Apr 1, 2012 at 13:44, Faré <fahree@gmail.com> wrote:

...
...
I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8:

OK. Any volunteer to contact the authors of these 20 systems and get each issue fixed? Or even to split the job in 2, 3, 4, 5?

OK, I contacted all the authors or maintainers of the offending software that Orivej identified as requiring a fix to compile.

In at least one case (regex), the original author's address Michael Parker <mparker762@hotmail.com> is invalid, and I contacted a maintainer instead Michael Weber <michaelw@foldr.org> who might not be the source that Xach uses judging from the version number in Quicklisp (michaelw uses darcs, but quicklisp has a version 1 instead of $date-darcs). I'm not convinced that portable-hemlock is actively maintained either.

I updated the status page at https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-...

Finally, I created a mostly useless asdf-encodings at: ssh://common-lisp.net/project/asdf/git/asdf-encodings.git git://common-lisp.net/projects/asdf/asdf-encodings.git I invite those people who want to use non-UTF-8 encodings to submit patches to this project.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org A computer is like an Old Testament god, with a lot of rules and no mercy. — Joseph Campbell

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

Faré

7:36 a.m.

Abstract: I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify their encoding in a portable way, with a sane default that is already almost universally accepted (i.e. :utf-8). On Sun, Apr 8, 2012 at 07:31, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

The portable-hemlock is still maintained and was updated a few months ago to avoid the use of non-ascii characters in the source so it builds cleanly with UTF-8 as the input external-format. The code is not in great shape, but is being improved. See: http://gitorious.org/hemlock/pages/Home

Oh, I hadn't noticed this new page for hemlock. Is CMUCL using the portable hemlock these days, or still including its own?

...

Even if you get all the quick lisp projects converted to be UTF-8 clean, this still represents a subset of ASDF users. I wish you would reconsider these changes to ASDF because I fear it is divisive.

Well, I recognize that not all code is in Quicklisp and that there is a need for a backward compatibility mode. Putting :encoding :default in your defsystem will achieve just that. At the same time, if :encoding :default rather than :encoding :utf-8 were the default, then we'd gain nothing, and it would still be a horrible mess to ascertain which system has been compiled with which encoding.

...

It is not reasonable to expect users of ASDF to hack on external support code just to use non-UTF-8 external-formats, and the external library you plan for can never be complete because the external-format is user extensible.

Well, on the one hand, for portability's sake, one should probably one's lisp file to a universally supported external format. On the other hand, where portability is not a problem, one can either use :encoding :default and be back to the current semantics, or extend asdf-encodings as one extends external formats.

...

ASDF could easily be flexible regarding the external-format and not a limited bastion of portable open source code.

Agreed. Currently, ASDF is not flexible at all -- rather it is uncontrolled.

...

It would be very easy and workable to just name this :external-format, and to pass through encodings not recognised - all the quicklisp projects would work just fine using :utf-8 and other CL users could use encodings as needed. Unhappily, passing through external formats is not portable, if only for CLISP. But if you're doing non-portable things, you can keep doing whatever you were previously doing with :encoding :default, or you can now define methods on asdf::component-external-format to do whatever you want, to override the default behavior of checking *encoding-external-format-hook*. Or then again, you can extend asdf-encodings to make it smarter.

In practice, how many people do you know who use a non-UTF-8 encoding, and how many of them will be majorly annoyed by having to either recode their source, explicitly specify their encoding, or add :encoding :default to preserve backwards compatibility? —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org "I've finally learned what `upward compatible' means. It means we get to keep all our old mistakes." — Dennie van Tassel

Nikodemus Siivola

12:28 p.m.

On 8 April 2012 17:36, Faré <fahree@gmail.com> wrote:

...

I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify

I disagree. Consider this: X has a system that used to be in, say, LATIN-9. He uses latin-9 at home, and everything works fine. His users either use it as well, or at least another single-byte encoding. ASDF is updated, and X's user reports breakage. Everything works fine for X, because he didn't update ASDF yet. So he updates ASDF, and X updates his system to specify :LATIN-9 (or :DEFAULT, or whatever). Now another of his users reports breakage, because /they/ didn't update ASDF yet -- and their ASDF doesn't support :ENCODING, so things break. They update ASDF, which in turn breaks another :LATIN-N system they were using. The potential cost is non-trivial, and I really don't pretend to know eg. how many Japanese hackers user non-UTF-encodings in their source. IMO encouraging people to add :encoding :utf-8 is much saner. Cheers, -- Nikodemus

Faré

5:37 p.m.

On Sun, Apr 8, 2012 at 15:28, Nikodemus Siivola <nikodemus@random-state.net> wrote:

...

On 8 April 2012 17:36, Faré <fahree@gmail.com> wrote:

...
I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify

I disagree. Consider this:

X has a system that used to be in, say, LATIN-9. He uses latin-9 at home, and everything works fine. His users either use it as well, or at least another single-byte encoding.

ASDF is updated, and X's user reports breakage. Everything works fine for X, because he didn't update ASDF yet. So he updates ASDF, and X updates his system to specify :LATIN-9 (or :DEFAULT, or whatever).

Now another of his users reports breakage, because /they/ didn't update ASDF yet -- and their ASDF doesn't support :ENCODING, so things break. They update ASDF, which in turn breaks another :LATIN-N system they were using.

The potential cost is non-trivial, and I really don't pretend to know eg. how many Japanese hackers user non-UTF-encodings in their source.

IMO encouraging people to add :encoding :utf-8 is much saner.

I agree that transition costs must be considered. Let's examine the two scenarios, where the default is :default vs where the default is :utf-8. In both cases, crucial points follow: (a) currently :encoding is NOT supported by ASDF. (b) therefore, whenever anyone modifies his defsystem to use :encoding, his system will NOT be backward compatible anymore. (c) we want to make most code as backward-compatible as possible. (d) the application programmer controls what version of ASDF is installed, the library developer doesn't. If the default is :utf-8 (my recommendation), then * A few programmers of non-UTF-8 applications may hit a snag upgrading ASDF; * then they can either use asdf-encodings or use :encoding :default. * Their code is then not compatible with older ASDFs anymore, but * as application programmers, they fully control which ASDF they use, and * even if they need to support old CL implementations, ASDF still supports them (the exception being GCL, that looks quite dead). * Meanwhile, library authors can already start migrating to UTF-8, and everyone who upgrades ASDF can reliably enjoy now the benefits of non-ASCII, while preserving backwards-compatibility. If the default is :default (your recommendation), then * library developers can't ensure their code use a predictable encoding; * this makes any attempt to actually use of non-ASCII characters unreliable. * Sure, they might be tempted to use :encoding :utf-8, but then their libraries will be gratuitously incompatible with anyone who hasn't upgraded his ASDF, which is a pain to users. * thus, library developers can do nothing but wait for EVERYONE to be using a recent ASDF before they can do anything. * Therefore, noone will enjoy any benefit of :encoding for a year, and when we do, it will cause massive backward incompatibility. Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way. In both cases, library developers are encouraged to use UTF-8, which already most people do, if only that tends to be the default these days for users of SBCL and they send bug reports to library authors. A default of :default allows a few odd non-UTF8 application developers to continue using unportable hacks for a few months, while forcing everyone else to wait to do anything and bringing massive backwards-incompatibility of libraries in the end. A default of :utf-8 forces these few odd non-UTF8 application developers to do a documented step before they continue doing what they are doing, at their own upgrade pace (they control when to upgrade ASDF); they can then replace a lot of non-portable hacks with a portable :encoding. Meanwhile, everyone starts enjoying reliable non-ASCII today. I agree there's no solution that makes everyone happy. I believe that a default of :utf-8 doesn't actually make anyone terribly unhappy, and empowers everyone to make the changes they need, without requiring for anyone to wait for other people to make changes (except indeed for a few stray libraries). And so my plans are unchanged for now (but of course, please keep sending the complaints if you think it's wrongheaded; it's still time to not do it). NB: I'd especially like the opinion of people who actually develop non-ASCII and non-UTF8 libraries or applications. PS: I just made asdf-encodings much less dumb. I added good support for: sbcl, clozure, clisp, ecl, cmu I added dubious support for: abcl, allegro, scl, lispworks I think these will remain 8-bit only: cormanlisp, gcl, genera, rmcl, xcl. Precious little testing so far, except that it doesn't break everything on SBCL. Help welcome to test and expand it. ssh://common-lisp.net/project/asdf/git/asdf-encodings.git git://common-lisp.net/projects/asdf/asdf-encodings.git http://common-lisp.net/gitweb?p=projects/asdf/asdf-encodings.git —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Every major horror of history was committed in the name of an altruistic motive. — Ayn Rand

Robert Goldman

6:22 p.m.

On 4/8/12 Apr 8 -7:37 PM, Faré wrote:

...

On Sun, Apr 8, 2012 at 15:28, Nikodemus Siivola <nikodemus@random-state.net> wrote:

...
On 8 April 2012 17:36, Faré <fahree@gmail.com> wrote:

...
I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify

I disagree. Consider this:

X has a system that used to be in, say, LATIN-9. He uses latin-9 at home, and everything works fine. His users either use it as well, or at least another single-byte encoding.

ASDF is updated, and X's user reports breakage. Everything works fine for X, because he didn't update ASDF yet. So he updates ASDF, and X updates his system to specify :LATIN-9 (or :DEFAULT, or whatever).

Now another of his users reports breakage, because /they/ didn't update ASDF yet -- and their ASDF doesn't support :ENCODING, so things break. They update ASDF, which in turn breaks another :LATIN-N system they were using.

The potential cost is non-trivial, and I really don't pretend to know eg. how many Japanese hackers user non-UTF-encodings in their source.

IMO encouraging people to add :encoding :utf-8 is much saner.

I agree that transition costs must be considered.

This is somewhat OT, since it's really about general transition costs, but should we add a continuable error to parse-defsystem for handling unrecognized options? I like beating people over the head that this might not do what they want, but I don't like leaving them with no way to proceed. Possibly even better to have a continuable error that /remembers/ a defsystem option as something to be ignored. Then we wouldn't /keep/ complaining about :encoding over and over --- one continuable error continue and you'd be done. cheers, r

Faré

6:57 p.m.

...

...
I agree that transition costs must be considered.

This is somewhat OT, since it's really about general transition costs, but should we add a continuable error to parse-defsystem for handling unrecognized options?

I like beating people over the head that this might not do what they want, but I don't like leaving them with no way to proceed.

Possibly even better to have a continuable error that /remembers/ a defsystem option as something to be ignored. Then we wouldn't /keep/ complaining about :encoding over and over --- one continuable error continue and you'd be done.

That's a great idea. Patch welcome :-/ —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org You think war is economically beneficial? Then why share those benefits with dirty foreigners? Let's have a civil war — A war we're sure to win!

Robert Goldman

7:29 p.m.

On 4/8/12 Apr 8 -8:57 PM, Faré wrote:

...

...
...
I agree that transition costs must be considered.

This is somewhat OT, since it's really about general transition costs, but should we add a continuable error to parse-defsystem for handling unrecognized options?

I like beating people over the head that this might not do what they want, but I don't like leaving them with no way to proceed.

Possibly even better to have a continuable error that /remembers/ a defsystem option as something to be ignored. Then we wouldn't /keep/ complaining about :encoding over and over --- one continuable error continue and you'd be done.

That's a great idea. Patch welcome :-/

OK, I will try to get that done soon. I'm still up to my neck in work, but hope to squeak in some ASDF time later this week. cheers, r

Douglas Crosher

9 Apr 9 Apr

8:37 a.m.

On 04/09/2012 10:37 AM, Faré wrote:

...

On Sun, Apr 8, 2012 at 15:28, Nikodemus Siivola <nikodemus@random-state.net> wrote:

...
On 8 April 2012 17:36, Faré <fahree@gmail.com> wrote:

...
I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify

I disagree. Consider this:

X has a system that used to be in, say, LATIN-9. He uses latin-9 at home, and everything works fine. His users either use it as well, or at least another single-byte encoding.

ASDF is updated, and X's user reports breakage. Everything works fine for X, because he didn't update ASDF yet. So he updates ASDF, and X updates his system to specify :LATIN-9 (or :DEFAULT, or whatever).

Now another of his users reports breakage, because /they/ didn't update ASDF yet -- and their ASDF doesn't support :ENCODING, so things break. They update ASDF, which in turn breaks another :LATIN-N system they were using.

The potential cost is non-trivial, and I really don't pretend to know eg. how many Japanese hackers user non-UTF-encodings in their source.

IMO encouraging people to add :encoding :utf-8 is much saner.

I agree that transition costs must be considered.

Let's examine the two scenarios, where the default is :default vs where the default is :utf-8.

In both cases, crucial points follow: (a) currently :encoding is NOT supported by ASDF. (b) therefore, whenever anyone modifies his defsystem to use :encoding, his system will NOT be backward compatible anymore. (c) we want to make most code as backward-compatible as possible. (d) the application programmer controls what version of ASDF is installed, the library developer doesn't.

If the default is :utf-8 (my recommendation), then * A few programmers of non-UTF-8 applications may hit a snag upgrading ASDF; * then they can either use asdf-encodings or use :encoding :default. * Their code is then not compatible with older ASDFs anymore, but * as application programmers, they fully control which ASDF they use, and * even if they need to support old CL implementations, ASDF still supports them (the exception being GCL, that looks quite dead). * Meanwhile, library authors can already start migrating to UTF-8, and everyone who upgrades ASDF can reliably enjoy now the benefits of non-ASCII, while preserving backwards-compatibility.

Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8? The external-format support helps write portable libraries using non-ASCII characters and is only available after an upgrade. I do see a concern that if developers are required to change their definitions to add :encoding :default then they will be forced to also make sure their user base has upgraded now. Further if their users do upgrade ASDF then it breaks again - there is no migration path for them.

...

If the default is :default (your recommendation), then * library developers can't ensure their code use a predictable encoding; * this makes any attempt to actually use of non-ASCII characters unreliable. * Sure, they might be tempted to use :encoding :utf-8, but then their libraries will be gratuitously incompatible with anyone who hasn't upgraded his ASDF, which is a pain to users.

Perhaps the difference is that portable UTF-8 source is new source and requires an upgrade of ASDF anyway, whereas making the default :utf-8 forces :encoding :default on current users and affects legacy code that is already written without a migration path.

...

* thus, library developers can do nothing but wait for EVERYONE to be using a recent ASDF before they can do anything.

Wouldn't this be the reality for portable libraries no matter which default is chosen?

...

* Therefore, noone will enjoy any benefit of :encoding for a year, and when we do, it will cause massive backward incompatibility.

I don't appreciate the 'massive backward incompatibility' so perhaps do not understand your perspective? I see that future projects using UTF-8 source would need to declare this in the system definition, but this would not seem to qualify. Choosing :default would seem to cause the least backward incompatibility as this is the current behaviour, and offers a migration path to get ASDF upgrades in place.

...

Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way.

It would be great if some suggestions like this could be offered to ease the transition.

...

In both cases, library developers are encouraged to use UTF-8, which already most people do, if only that tends to be the default these days for users of SBCL and they send bug reports to library authors.

...

A default of :default allows a few odd non-UTF8 application developers to continue using unportable hacks for a few months, while forcing everyone else to wait to do anything and bringing massive backwards-incompatibility of libraries in the end.

What 'massive backwards-incompatibility' would be caused by making :default the default? Most portable libraries are ASCII, and there would be some benefit in libraries needing UTF-8 support to declare this in the system definition.

...

A default of :utf-8 forces these few odd non-UTF8 application developers to do a documented step before they continue doing what they are doing, at their own upgrade pace (they control when to upgrade ASDF); they can then replace a lot of non-portable hacks with a portable :encoding. Meanwhile, everyone starts enjoying reliable non-ASCII today.

There may be a concern that their users would have to upgrade ASDF now. How can everyone enjoy reliable non-ASCII today, without the user base having upgraded ASDF?

...

I agree there's no solution that makes everyone happy. I believe that a default of :utf-8 doesn't actually make anyone terribly unhappy, and empowers everyone to make the changes they need, without requiring for anyone to wait for other people to make changes (except indeed for a few stray libraries). And so my plans are unchanged for now (but of course, please keep sending the complaints if you think it's wrongheaded; it's still time to not do it).

NB: I'd especially like the opinion of people who actually develop non-ASCII and non-UTF8 libraries or applications.

PS: I just made asdf-encodings much less dumb. I added good support for: sbcl, clozure, clisp, ecl, cmu I added dubious support for: abcl, allegro, scl, lispworks I think these will remain 8-bit only: cormanlisp, gcl, genera, rmcl, xcl. Precious little testing so far, except that it doesn't break everything on SBCL. Help welcome to test and expand it. ssh://common-lisp.net/project/asdf/git/asdf-encodings.git git://common-lisp.net/projects/asdf/asdf-encodings.git http://common-lisp.net/gitweb?p=projects/asdf/asdf-encodings.git

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Every major horror of history was committed in the name of an altruistic motive. — Ayn Rand

Stelian Ionescu

10:01 a.m.

On Tue, 2012-04-10 at 01:37 +1000, Douglas Crosher wrote: [...]

...

Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8? The external-format support helps write portable libraries using non-ASCII characters and is only available after an upgrade. [...]

...
* thus, library developers can do nothing but wait for EVERYONE to be using a recent ASDF before they can do anything.

As a library writer, I treat ASDF as any other dependency: if I think that I like/need some new ASDF feature I just use it; if users don't want to or can't upgrade, that's not my problem [...]

...

...
Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way.

It would be great if some suggestions like this could be offered to ease the transition.

...

There may be a concern that their users would have to upgrade ASDF now.

How can everyone enjoy reliable non-ASCII today, without the user base having upgraded ASDF?

Not possible. Even then "#+asdf-unicode" solution would only preserve backwards-compatibility at the cost of no providing "reliable non-ASCII" -- Stelian Ionescu a.k.a. fe[nl]ix Quidquid latine dictum sit, altum videtur. http://common-lisp.net/project/iolib

Faré

3:05 p.m.

On Mon, Apr 9, 2012 at 11:37, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8?

No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8" That's out of 700 libraries in Quicklisp. Only 9 have been found to be an actual problem, and two are fixed already. https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-... The only issue is to make the results *reliable* for these systems that depend on UTF-8.

...

I do see a concern that if developers are required to change their definitions to add :encoding :default then they will be forced to also make sure their user base has upgraded now. Further if their users do upgrade ASDF then it breaks again - there is no migration path for them.

Yes. No one in their right mind would use :encoding :default for a library. Each author knows what encoding he uses, say :latin1, :koi8-r, :mac-roman or :euc-jp, and would specify just that, not :default. I was thinking of :default 1- because I hadn't written asdf-encodings yet, and needed *some* way to support those setups 2- for full backwards compatibility: "if it's not backwards, it's not compatible"

...

Perhaps the difference is that portable UTF-8 source is new source and requires an upgrade of ASDF anyway, whereas making the default :utf-8 forces :encoding :default on current users and affects legacy code that is already written without a migration path.

UTF-8 is not just for new source. It doesn't require an upgrade of ASDF. There is plenty of UTF-8 source already, though mostly for comments (but not only for comments: see e.g. λ-reader). All modern implementations support UTF-8, though not always as the default. Let's just make it a reliable default so we can WORM (write once run everywhere). And the migration path is clear: recode l1..u8 foo.lisp

...

...
* thus, library developers can do nothing but wait for EVERYONE to be using a recent ASDF before they can do anything.

Wouldn't this be the reality for portable libraries no matter which default is chosen?

Whatever the default encoding is, libraries can't use :encoding until all their users use a recent ASDF. But if :utf-8 becomes the default and they use it, they can already enjoy the benefits of deterministic encoding, and tell users who have encoding issues "just upgrade your ASDF".

...

...
* Therefore, noone will enjoy any benefit of :encoding for a year, and when we do, it will cause massive backward incompatibility.

I don't appreciate the 'massive backward incompatibility' so perhaps do not understand your perspective? I see that future projects using UTF-8 source would need to declare this in the system definition, but this would not seem to qualify.

If the default is :default and you want to enjoy reliable utf-8, then you'll need to specify :encoding :utf-8, at which point your library ceases to be compatible with users who haven't upgraded ASDF. I call that massive backward incompatibility. If the default is :utf-8 and your library has a latin1 character, you use recode, and your new code still works on old ASDFs as well as new ones. That's massive backward compatibility.

...

Choosing :default would seem to cause the least backward incompatibility as this is the current behaviour, and offers a migration path to get ASDF upgrades in place.

It's compatible for now, but setting us up for massive incompatibility later.

...

...
Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way.

It would be great if some suggestions like this could be offered to ease the transition.

I inserted this suggestion in the ASDF documentation. I can't retroactively modify old ASDF installations to point people at precisely the paragraph they need to consult in the docs when they upgrade and things break for them, but I trust that Google will help them.

...

Most portable libraries are ASCII, and there would be some benefit in libraries needing UTF-8 support to declare this in the system definition.

ASCII libraries will work everywhere anyway whatever we do about the default. That is, until some maniac writes a Lisp using EBCDIC; and still making UTF-8 the default will ensure he can still just download source from the net and use it without having to transcode it for his implementation. Of course, a lot of code that assumes ASCII or ASCII-like continuity of letter ranges with fail, but that's a given if he uses EBCDIC.

...

There may be a concern that their users would have to upgrade ASDF now.

No. Making :utf-8 the default means no one needs to upgrade ASDF now, but a few people may have to upgrade a few libraries when they upgrade ASDF. Making :default the default and forcing people to use :encoding :utf-8 to enjoy any reliability means people who use libraries that want to be reliable will be forced to upgrade ASDF.

...

How can everyone enjoy reliable non-ASCII today, without the user base having upgraded ASDF?

Mostly, they can setup their system defaults to UTF-8 and enjoy most Lisp code already on most implementations. When they stray from this default setup I want to formalize, nothing works reliably today. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Merely having an open mind is nothing; the object of opening the mind, as of opening the mouth, is to shut it again on something solid. — G.K. Chesterton

Douglas Crosher

4:53 p.m.

Let me given an example so we can all test your idea: 1. Library developer upgrades ASDF, and starts adding UTF-8 characters. Lets say the developer assumes a default of UTF-8 so does not add a declaration, which I think is your suggestion. The library is intended to be portable. 2. Users download the library, but have not yet upgraded ASDF. They start up an arbitrary CL implementation, which does not default to UTF-8. The code may fail to compile, or may have incorrect characters. I hope this can be accepted and that it is clear that library developers will need to wait until the user base has upgraded before add UTF-8 to portable libraries. This is why we need the external-format support in ASDF - to make this reliable. Regards Douglas Crosher On 04/10/2012 08:05 AM, Faré wrote:

...

On Mon, Apr 9, 2012 at 11:37, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...
Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8?

No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8" That's out of 700 libraries in Quicklisp. Only 9 have been found to be an actual problem, and two are fixed already. https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-...

The only issue is to make the results *reliable* for these systems that depend on UTF-8.

...
I do see a concern that if developers are required to change their definitions to add :encoding :default then they will be forced to also make sure their user base has upgraded now. Further if their users do upgrade ASDF then it breaks again - there is no migration path for them.

Yes. No one in their right mind would use :encoding :default for a library. Each author knows what encoding he uses, say :latin1, :koi8-r, :mac-roman or :euc-jp, and would specify just that, not :default.

I was thinking of :default 1- because I hadn't written asdf-encodings yet, and needed *some* way to support those setups 2- for full backwards compatibility: "if it's not backwards, it's not compatible"

...
Perhaps the difference is that portable UTF-8 source is new source and requires an upgrade of ASDF anyway, whereas making the default :utf-8 forces :encoding :default on current users and affects legacy code that is already written without a migration path.

UTF-8 is not just for new source. It doesn't require an upgrade of ASDF. There is plenty of UTF-8 source already, though mostly for comments (but not only for comments: see e.g. λ-reader). All modern implementations support UTF-8, though not always as the default. Let's just make it a reliable default so we can WORM (write once run everywhere). And the migration path is clear: recode l1..u8 foo.lisp

...
...
* thus, library developers can do nothing but wait for EVERYONE to be using a recent ASDF before they can do anything.

Wouldn't this be the reality for portable libraries no matter which default is chosen?

Whatever the default encoding is, libraries can't use :encoding until all their users use a recent ASDF. But if :utf-8 becomes the default and they use it, they can already enjoy the benefits of deterministic encoding, and tell users who have encoding issues "just upgrade your ASDF".

...
...
* Therefore, noone will enjoy any benefit of :encoding for a year, and when we do, it will cause massive backward incompatibility.

I don't appreciate the 'massive backward incompatibility' so perhaps do not understand your perspective? I see that future projects using UTF-8 source would need to declare this in the system definition, but this would not seem to qualify.

If the default is :default and you want to enjoy reliable utf-8, then you'll need to specify :encoding :utf-8, at which point your library ceases to be compatible with users who haven't upgraded ASDF. I call that massive backward incompatibility.

If the default is :utf-8 and your library has a latin1 character, you use recode, and your new code still works on old ASDFs as well as new ones. That's massive backward compatibility.

...
Choosing :default would seem to cause the least backward incompatibility as this is the current behaviour, and offers a migration path to get ASDF upgrades in place.

It's compatible for now, but setting us up for massive incompatibility later.

...
...
Admittedly, in either case, library developers could use such conditional reading as in #+asdf-unicode #:asdf-unicode :encoding :utf-8 or #+asdf-unicode :encoding #:asdf-unicode :latin1 to make their libraries safer in a backwards-compatible way.

It would be great if some suggestions like this could be offered to ease the transition.

I inserted this suggestion in the ASDF documentation. I can't retroactively modify old ASDF installations to point people at precisely the paragraph they need to consult in the docs when they upgrade and things break for them, but I trust that Google will help them.

...
Most portable libraries are ASCII, and there would be some benefit in libraries needing UTF-8 support to declare this in the system definition.

ASCII libraries will work everywhere anyway whatever we do about the default. That is, until some maniac writes a Lisp using EBCDIC; and still making UTF-8 the default will ensure he can still just download source from the net and use it without having to transcode it for his implementation. Of course, a lot of code that assumes ASCII or ASCII-like continuity of letter ranges with fail, but that's a given if he uses EBCDIC.

...
There may be a concern that their users would have to upgrade ASDF now.

No. Making :utf-8 the default means no one needs to upgrade ASDF now, but a few people may have to upgrade a few libraries when they upgrade ASDF.

Making :default the default and forcing people to use :encoding :utf-8 to enjoy any reliability means people who use libraries that want to be reliable will be forced to upgrade ASDF.

...
How can everyone enjoy reliable non-ASCII today, without the user base having upgraded ASDF?

Mostly, they can setup their system defaults to UTF-8 and enjoy most Lisp code already on most implementations. When they stray from this default setup I want to formalize, nothing works reliably today.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Merely having an open mind is nothing; the object of opening the mind, as of opening the mouth, is to shut it again on something solid. — G.K. Chesterton

Faré

6:27 p.m.

On Mon, Apr 9, 2012 at 19:53, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

Let me given an example so we can all test your idea:

1. Library developer upgrades ASDF, and starts adding UTF-8 characters. Lets say the developer assumes a default of UTF-8 so does not add a declaration, which I think is your suggestion. The library is intended to be portable.

No need to upgrade ASDF to use UTF-8. It's already working in most places. A hundred different systems already use UTF-8 without any declaration. It's mostly portable. Except when it's not. Only when it's not do you want to recommend your users to upgrade ASDF.

...

2. Users download the library, but have not yet upgraded ASDF. They start up an arbitrary CL implementation, which does not default to UTF-8. The code may fail to compile, or may have incorrect characters.

That's the current situation already.

...

I hope this can be accepted and that it is clear that library developers will need to wait until the user base has upgraded before add UTF-8 to portable libraries.

It can, and has, been accepted. The answer to users for which it breaks is currently "sucks to be you, good luck configuring your system". The answer to users for which it breaks will instead be "upgrade ASDF". And if that still fails, the answer will be "Wow, I didn't know anyone still used CormanLisp, GCL, Genera, RMCL, or XCL. Good luck adding UTF-8 support to it."

...

This is why we need the external-format support in ASDF - to make this reliable.

Indeed. That, and to make it possible at all to mix such UTF-8 libraries and decidedly non-UTF8 code and get desired results.

...

...
Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8" That's out of 700 libraries in Quicklisp. Only 9 have been found to be an actual problem, and two are fixed already. https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-...

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org "Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." — Howard Aiken

Nikodemus Siivola

11:48 p.m.

Re. specifying encodings in backwards-compatible manner: (defmethod system-external-format ((sys (eql (find-system :my-system)))) :ebcdic-us) Cheers, -- Nikodemus

Raymond Toy

10 Apr 10 Apr

11:28 p.m.

On 4/9/12 3:05 PM, Faré wrote:

...

On Mon, Apr 9, 2012 at 11:37, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...
Won't library authors need to wait until their user base has upgraded ASDF before they can start migrating to UTF-8?

No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8"

I saw those statistics. I have no idea what "assume non-ASCII" means. That there are files that have non-ascii characters in them? And that only 31 files are not in utf-8 already? Ray

Faré

11 Apr 11 Apr

7:43 a.m.

...

...
No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8"

I saw those statistics. I have no idea what "assume non-ASCII" means. That there are files that have non-ascii characters in them? And that only 31 files are not in utf-8 already?

Yes, of which only 13 files were actually managed by ASDF as opposed to examples, one is a MCL-only file that doesn't support UTF-8 anyway, two have already been fixed, and the rest are only latin1 or such in comments. Bugs filed for all the other systems (but no response so far). IOW, I believe we're mostly arguing about a non-issue. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

Douglas Crosher

12 Apr 12 Apr

1:51 a.m.

It may be significant that a number of the quicklisp releases use non-ascii in the system definition files. Can this be addresses in ASDF alone? Should an attempt be made to add an encoding argument to 'find-system, and to have quicklisp record the encoding in its release database and use this when calling 'find-system? If so then perhaps this could be stored as a default encoding for a system. Looking at non-ascii usage in quicklisp releases shows that the UTF-8 usage is not that significant. Releases considered: 716 Releases with UTF-8 lisp source files: 86 (12%) Releases with UTF-8 in comments only : 34 Releases using UTF-8 in their system definitions: 21 Releases for which all the UTF-8 could be recoded to ISO-8859-1: 59 Releases with other non-ascii source files: 21 (3%) Releases with other non-ascii source files in comments only: 12 Releases using non-ascii characters from only the ISO-8859-1 set: 59 + 12? = 71? (10%) Releases using only ASCII in source files: 716 - 86 - 21 = 609 (85%) Some of the UTF-8 is rather gratuitous and if portability was a concert there would have been suitable ASCII substitutes. There does not appear to be much respect for portability in some of these releases, so even adding encoding support to ASDF system definitions files many not help for some of these releases. If you accept that library authors will choose their encoding, even for the system definition files, then the only solution seems to be to add an encoding option to 'find-system and suggest this be used to load the system definition. Regards Douglas Crosher On 04/12/2012 12:43 AM, Faré wrote:

...

...
...
No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8"

I saw those statistics. I have no idea what "assume non-ASCII" means. That there are files that have non-ascii characters in them? And that only 31 files are not in utf-8 already?

Yes, of which only 13 files were actually managed by ASDF as opposed to examples, one is a MCL-only file that doesn't support UTF-8 anyway, two have already been fixed, and the rest are only latin1 or such in comments. Bugs filed for all the other systems (but no response so far).

IOW, I believe we're mostly arguing about a non-issue.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

Douglas Crosher

11:44 p.m.

Dealing with non-ASCII encoding in system definition files does look easy to solve. It does not seem practical to just extend 'find-system to accept the encoding because 'find-system can in turn attempt to load other systems, and there are other entry points. The only practical solution seems to be to detect the encoding from the file. I could write portable code for ASDF to read an ASCII header line and look for encoding declarations, and handle a few common headers (emacs has 'coding', LispWorks seems to use 'encoding' or 'external-format'). Auto-detection could handle some of the common codings, but could be a big chunk of code. The quicklisp project may be prepared to patch in headers to system definition file using non-ASCII encodings, and this could be largely automated. If infrastructure is added for the system definition files then it would be only a small step to also use this for the lisp source files. Perhaps this suggests an alternative path to address the coding issues. Lispworks appears to be able to automatically detect file coding, and it would be interesting to know if the ASDF encoding problems are not an issue for LispWorks users? If so then this would appear to add more support to making the default :default. http://www.lispworks.com/documentation/lw61/LW/html/lw-659.htm#39723 It seems the issue could be dealt with by the CL implementations adding file external-format detection. Regards Douglas Crosher On 04/12/2012 06:51 PM, Douglas Crosher wrote:

...

It may be significant that a number of the quicklisp releases use non-ascii in the system definition files. Can this be addresses in ASDF alone? Should an attempt be made to add an encoding argument to 'find-system, and to have quicklisp record the encoding in its release database and use this when calling 'find-system? If so then perhaps this could be stored as a default encoding for a system.

Looking at non-ascii usage in quicklisp releases shows that the UTF-8 usage is not that significant.

Releases considered: 716 Releases with UTF-8 lisp source files: 86 (12%) Releases with UTF-8 in comments only : 34 Releases using UTF-8 in their system definitions: 21 Releases for which all the UTF-8 could be recoded to ISO-8859-1: 59 Releases with other non-ascii source files: 21 (3%) Releases with other non-ascii source files in comments only: 12

Releases using non-ascii characters from only the ISO-8859-1 set: 59 + 12? = 71? (10%)

Releases using only ASCII in source files: 716 - 86 - 21 = 609 (85%)

Some of the UTF-8 is rather gratuitous and if portability was a concert there would have been suitable ASCII substitutes. There does not appear to be much respect for portability in some of these releases, so even adding encoding support to ASDF system definitions files many not help for some of these releases.

If you accept that library authors will choose their encoding, even for the system definition files, then the only solution seems to be to add an encoding option to 'find-system and suggest this be used to load the system definition.

Regards Douglas Crosher

On 04/12/2012 12:43 AM, Faré wrote:

...
...
...
No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8"

I saw those statistics. I have no idea what "assume non-ASCII" means. That there are files that have non-ascii characters in them? And that only 31 files are not in utf-8 already?

Yes, of which only 13 files were actually managed by ASDF as opposed to examples, one is a MCL-only file that doesn't support UTF-8 anyway, two have already been fixed, and the rest are only latin1 or such in comments. Bugs filed for all the other systems (but no response so far).

IOW, I believe we're mostly arguing about a non-issue.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

Pascal Costanza

13 Apr 13 Apr

12:36 p.m.

I agree. Encoding is a per-file property, not a per-system property. If it is defined per system, then maintaining the information also becomes an overhead. For example, I have source files that are shared by different system definitions. If I would now want to change the encoding of that file, I would also have to update the system definition(s) that it is part of. I don't think that's a good idea. Pascal On 13 Apr 2012, at 08:44, Douglas Crosher wrote:

...

Dealing with non-ASCII encoding in system definition files does look easy to solve. It does not seem practical to just extend 'find-system to accept the encoding because 'find-system can in turn attempt to load other systems, and there are other entry points.

The only practical solution seems to be to detect the encoding from the file. I could write portable code for ASDF to read an ASCII header line and look for encoding declarations, and handle a few common headers (emacs has 'coding', LispWorks seems to use 'encoding' or 'external-format'). Auto-detection could handle some of the common codings, but could be a big chunk of code. The quicklisp project may be prepared to patch in headers to system definition file using non-ASCII encodings, and this could be largely automated.

If infrastructure is added for the system definition files then it would be only a small step to also use this for the lisp source files. Perhaps this suggests an alternative path to address the coding issues.

Lispworks appears to be able to automatically detect file coding, and it would be interesting to know if the ASDF encoding problems are not an issue for LispWorks users? If so then this would appear to add more support to making the default :default. http://www.lispworks.com/documentation/lw61/LW/html/lw-659.htm#39723

It seems the issue could be dealt with by the CL implementations adding file external-format detection.

Regards Douglas Crosher

On 04/12/2012 06:51 PM, Douglas Crosher wrote:

...
It may be significant that a number of the quicklisp releases use non-ascii in the system definition files. Can this be addresses in ASDF alone? Should an attempt be made to add an encoding argument to 'find-system, and to have quicklisp record the encoding in its release database and use this when calling 'find-system? If so then perhaps this could be stored as a default encoding for a system.

Looking at non-ascii usage in quicklisp releases shows that the UTF-8 usage is not that significant.

Releases considered: 716 Releases with UTF-8 lisp source files: 86 (12%) Releases with UTF-8 in comments only : 34 Releases using UTF-8 in their system definitions: 21 Releases for which all the UTF-8 could be recoded to ISO-8859-1: 59 Releases with other non-ascii source files: 21 (3%) Releases with other non-ascii source files in comments only: 12

Releases using non-ascii characters from only the ISO-8859-1 set: 59 + 12? = 71? (10%)

Releases using only ASCII in source files: 716 - 86 - 21 = 609 (85%)

Some of the UTF-8 is rather gratuitous and if portability was a concert there would have been suitable ASCII substitutes. There does not appear to be much respect for portability in some of these releases, so even adding encoding support to ASDF system definitions files many not help for some of these releases.

If you accept that library authors will choose their encoding, even for the system definition files, then the only solution seems to be to add an encoding option to 'find-system and suggest this be used to load the system definition.

Regards Douglas Crosher

On 04/12/2012 12:43 AM, Faré wrote:

...
...
...
No. Library authors have *already* largely adopted UTF-8. See previous analysis by Orivej Desh: "I did a ckeck of quicklisp systems. There are 263 lisp files in 107 systems which assume non-ASCII, and only 31 of them in 20 systems assume non-UTF-8"

I saw those statistics. I have no idea what "assume non-ASCII" means. That there are files that have non-ascii characters in them? And that only 31 files are not in utf-8 already?

Yes, of which only 13 files were actually managed by ASDF as opposed to examples, one is a MCL-only file that doesn't support UTF-8 anyway, two have already been fixed, and the rest are only latin1 or such in comments. Bugs filed for all the other systems (but no response so far).

IOW, I believe we're mostly arguing about a non-issue.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

_______________________________________________ asdf-devel mailing list asdf-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/asdf-devel

-- Pascal Costanza

Faré

14 Apr 14 Apr

6:07 p.m.

On Fri, Apr 13, 2012 at 15:36, Pascal Costanza <pc@p-cos.net> wrote:

...

I agree. Encoding is a per-file property, not a per-system property. If it is defined per system, then maintaining the information also becomes an overhead.

As for many other things (the :around-compile hook, for instance), ASDF allows you to specify inherited per-system or per-module defaults that can be overridden in individual file components.

...

For example, I have source files that are shared by different system definitions. If I would now want to change the encoding of that file, I would also have to update the system definition(s) that it is part of. I don't think that's a good idea.

Tough for you, but a very little inconvenience, and not a violation of ASDF 2's design principle for configuration: the same people who decide and know are those who specify the information. How often do you not only share a source file between multiple systems (usually a bad idea, because of how caching and change detection works), but also change its encoding? PS: did you try the latest ASDF, with logical-pathname fixes just for you? —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The uploaded man cut himself from the world. The optimizer saw the simulation would have no more side-effects and optimized it all away.

Faré

6 p.m.

On Fri, Apr 13, 2012 at 02:44, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

The only practical solution seems to be to detect the encoding from the file. I could write portable code for ASDF to read an ASCII header line and look for encoding declarations, and handle a few common headers (emacs has 'coding', LispWorks seems to use 'encoding' or 'external-format'). Auto-detection could handle some of the common codings, but could be a big chunk of code. The quicklisp project may be prepared to patch in headers to system definition file using non-ASCII encodings, and this could be largely automated.

Yes, this is a valid approach, though it is somewhat heavy in coding and will grow ASDF by a few hundred more lines of code. Don't forget to support the way Emacs detects encoding, etc. It is certainly more than I am willing to code, and making the semantics of loading more complex than I am comfortable with. Before you code it yourself, I'd like to hear about other users here what they think. An additional small thing I don't like about the approach is that you have to open a file twice, once to detect encoding, the other time to load or compile-file it, which is not atomic and can be slightly nasty (if e.g. the file is actually a URL or mounted on a weird filesystem or whatnot). But that's secondary. Also, I'm not sure how big the market for such support is. There again, I'd like to hear from potential users.

...

If infrastructure is added for the system definition files then it would be only a small step to also use this for the lisp source files. Indeed.

Alternatively, this could be an :automatic mode added to asdf-encodings, rather than a part of ASDF itself, at which point it would be available to source files, but not system files.

...

Lispworks appears to be able to automatically detect file coding, and it would be interesting to know if the ASDF encoding problems are not an issue for LispWorks users? If so then this would appear to add more support to making the default :default. http://www.lispworks.com/documentation/lw61/LW/html/lw-659.htm#39723

If you want your code to be portable, you can't rely on users using LispWorks. Deterministic well-defined semantics require that the meaning of your code should not depend on magic that may or may not happen. PS: This long discussion on a relatively minor topic reminds me of Parkinson's Law of Triviality. What color should the bikeshed be painted? —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Classical Liberalism: the only truly subversive ideology.

Pascal J. Bourguignon

7:17 p.m.

Faré <fahree@gmail.com> writes:

...

On Fri, Apr 13, 2012 at 02:44, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...
The only practical solution seems to be to detect the encoding from the file. I could write portable code for ASDF to read an ASCII header line and look for encoding declarations, and handle a few common headers (emacs has 'coding', LispWorks seems to use 'encoding' or 'external-format'). Auto-detection could handle some of the common codings, but could be a big chunk of code. The quicklisp project may be prepared to patch in headers to system definition file using non-ASCII encodings, and this could be largely automated.

Yes, this is a valid approach, though it is somewhat heavy in coding and will grow ASDF by a few hundred more lines of code. Don't forget to support the way Emacs detects encoding, etc. It is certainly more than I am willing to code, and making the semantics of loading more complex than I am comfortable with. Before you code it yourself,

There's no need to code it yourself, I've already done it: https://gitorious.org/com-informatimago/com-informatimago/blobs/master/tools...

...

I'd like to hear about other users here what they think.

An additional small thing I don't like about the approach is that you have to open a file twice, once to detect encoding, the other time to load or compile-file it, which is not atomic and can be slightly nasty (if e.g. the file is actually a URL or mounted on a weird filesystem or whatnot). But that's secondary.

You have to do what you have to do. FILE-LENGTH has to read the whole file to compute the number of characters in an UTF-8 file.

...

Also, I'm not sure how big the market for such support is. There again, I'd like to hear from potential users.

Having written the above, I'd tend to be in favor of this approach. On the other hand, nowadays I put -*- coding:utf-8 -*- in all my files…

...

PS: This long discussion on a relatively minor topic reminds me of Parkinson's Law of Triviality. What color should the bikeshed be painted?

It's not entirely trivial, you can spend quite some time on encoding problems. -- __Pascal Bourguignon__ http://www.informatimago.com/ A bad day in () is better than a good day in {}.

Douglas Crosher

15 Apr 15 Apr

5:11 a.m.

A draft version adding support for reading the encoding from the file options header is available at: http://www.scieneer.com/files/asdf-encoding-file-option.lisp It has a bias towards UTF-8 which is used if other encodings are not detected or declared in the file options and if the file is valid UTF-8 with UTF-8 specific sequences. I don't expect too many false positives from the UTF-8 detector. I would not suggest trying to detect any further encodings. For UTF-8 files, no action needs to be taken when upgrading ASDF - ASDF will reliably detect them and load and compile them as UTF-8 rather than using the :default CL external-format. Files with other encodings, that are not detected, will load and compile using the :default external-format as is currently the case - library authors can add file options headers in order for such files to load and compile reliably across systems with a range of default external-formats. There would not appear to be any migration loss or inconvenience for anyone, except if there incorrect encoding file options that need to be fixed. For 8 bit CL implementations, the encoding detection and file options reading could probably just be disabled, or perhaps it could remain and issue warnings for clearly incompatible encodings. This may offer a solution to the problem of defining the system definition file encoding, is convenient for UTF-8 users, and provides a reliable mechanism for writing portable libraries in other encodings. An encoding file option could also be handy for other tools, such as editors, web servers, tools for recoding lisp source files, etc. I think it warrants some consideration. You do a lot for ASDF and deserve thanks. Regards Douglas Crosher On 04/15/2012 11:00 AM, Faré wrote:

...

On Fri, Apr 13, 2012 at 02:44, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...
The only practical solution seems to be to detect the encoding from the file. I could write portable code for ASDF to read an ASCII header line and look for encoding declarations, and handle a few common headers (emacs has 'coding', LispWorks seems to use 'encoding' or 'external-format'). Auto-detection could handle some of the common codings, but could be a big chunk of code. The quicklisp project may be prepared to patch in headers to system definition file using non-ASCII encodings, and this could be largely automated.

Yes, this is a valid approach, though it is somewhat heavy in coding and will grow ASDF by a few hundred more lines of code. Don't forget to support the way Emacs detects encoding, etc. It is certainly more than I am willing to code, and making the semantics of loading more complex than I am comfortable with. Before you code it yourself, I'd like to hear about other users here what they think.

An additional small thing I don't like about the approach is that you have to open a file twice, once to detect encoding, the other time to load or compile-file it, which is not atomic and can be slightly nasty (if e.g. the file is actually a URL or mounted on a weird filesystem or whatnot). But that's secondary.

Also, I'm not sure how big the market for such support is. There again, I'd like to hear from potential users.

...
If infrastructure is added for the system definition files then it would be only a small step to also use this for the lisp source files. Indeed.

Alternatively, this could be an :automatic mode added to asdf-encodings, rather than a part of ASDF itself, at which point it would be available to source files, but not system files.

...
Lispworks appears to be able to automatically detect file coding, and it would be interesting to know if the ASDF encoding problems are not an issue for LispWorks users? If so then this would appear to add more support to making the default :default. http://www.lispworks.com/documentation/lw61/LW/html/lw-659.htm#39723

If you want your code to be portable, you can't rely on users using LispWorks. Deterministic well-defined semantics require that the meaning of your code should not depend on magic that may or may not happen.

PS: This long discussion on a relatively minor topic reminds me of Parkinson's Law of Triviality. What color should the bikeshed be painted?

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Classical Liberalism: the only truly subversive ideology.

Faré

1:43 p.m.

OK, so I think it's worth it adding automatic detection to ASDF and making it the default. However, before I merge anything into ASDF itself, I'd like to see it implemented as part of asdf-encodings. This will also make it easier to experiment with various approaches and see which works best while leading to maintainable code. I admit I like pjb's approach of heeding the emacs declaration of coding: it's consistent with the Lisp tradition, compatible with an existing productivity tool (emacs), and can be re-used for more features.

...

From a cursory glance, I'm not convinced by his specific implementation; it seems to me the thing could be cleaner. When no encoding is specified, autodetection can be used as per Douglas's code, with latin-1 and/or whatever 8-bit default is available as a fallback if it's not UTF-8.

Then, maybe the plan should be for 2.21 to add the :encoding keyword, but keep :default as the default for now, to be changed to either :utf-8 or :autodetect when the dust has settled. People who want to get deterministic utf-8 behavior can then use #+asdf-unicode #+asdf-unicode :encoding :utf-8 for now. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Wealth, like happiness, is never attained when sought after directly. It comes as a by-product of providing a useful service. — Henry Ford

Daniel Herring

1:46 p.m.

I'm a bit out of it and haven't followed this conversation closely... I think the situation is roughly - people need non-ASCII characters in files - the ASDF definition file handles charset info for source files - the new "charset-detection" patch attempts to do the same for ASD files - non-ASCII is mostly used in ASD files for names and documentation - this is a portability issue I'm doubtful that automatic charset detection is practical. I've never seen a web browser that did it right in the cases I cared about (mostly Asian language sites). A "compatible" charset may parse without error but render completely bogus characters, clearly wrong even to one do doesn't know the language. It seems safer to make the charset explicit in an ASD file rather than require the end user to somehow know and specify the charset, or to require ASDF to guess the charset. One proposal was to use a special sequence at the start of the file, like Emacs or other tools use for specifying configuration settings. However, this needs to then be compatible with those tools. This implies that ASDF needs to know a wide variety of such formats or has to introduce a sufficiently unique format to coexist with others. What if ASDF allowed one ASD file to set up a dynamic scope when including another ASD file? I think this might also clean up some places where people are currently invoking CLOS. Something like system.asd: (defsystem system (asdf-let ((charset :some-charset)) (load-sytem system-impl))) system-impl.asd: ;; non-ASCII characters here (defsystem ...) Then all primary ASD files could be ASCII-only. Later, Daniel

Robert Goldman

3:21 p.m.

On 4/15/12 Apr 15 -3:46 PM, Daniel Herring wrote: ....

...

One proposal was to use a special sequence at the start of the file, like Emacs or other tools use for specifying configuration settings. However, this needs to then be compatible with those tools. This implies that ASDF needs to know a wide variety of such formats or has to introduce a sufficiently unique format to coexist with others.

I disagree. Since way back in the 1970s the mode line format that emacs uses has been standard in this community. It would be fine to use that. If people with alternative tools want to provide extensions to handle different specification techniques, that's fine, and we could incorporate them. But using the mode line should be acceptable; we don't need to overthink this.

...

What if ASDF allowed one ASD file to set up a dynamic scope when including another ASD file? I think this might also clean up some places where people are currently invoking CLOS.

I think that suggestion won't work for a couple of reasons: 1. As Faré points out, it's not the person who loads the system who knows what the charset should be, it's the author of that system. So it's wrong to use this kind of mechanism to allow the loader to control the charset of someone else's code: we need to allow the AUTHOR (rather than the loader) of the system to specify the charset. 2. It's not at all clear what that dynamic scoping would mean. Remember that the defsystem is not necessarily parsed inside the loading, and that there is the whole confounding issue of the construction and then execution of the operation plan.

Daniel Herring

6:55 p.m.

On Sun, 15 Apr 2012, Robert Goldman wrote:

...

On 4/15/12 Apr 15 -3:46 PM, Daniel Herring wrote: ...

...
What if ASDF allowed one ASD file to set up a dynamic scope when including another ASD file? I think this might also clean up some places where people are currently invoking CLOS.

I think that suggestion won't work for a couple of reasons:

1. As Faré points out, it's not the person who loads the system who knows what the charset should be, it's the author of that system. So it's wrong to use this kind of mechanism to allow the loader to control the charset of someone else's code: we need to allow the AUTHOR (rather than the loader) of the system to specify the charset.

My suggestion was to have the author distribute two system files. The first is portably readable using ASCII, and it specifies that the second uses a different character set.

...

2. It's not at all clear what that dynamic scoping would mean. Remember that the defsystem is not necessarily parsed inside the loading, and that there is the whole confounding issue of the construction and then execution of the operation plan.

Well, this isn't a problem when tweaking reader code; but I agree that it could be a problem for most other ASDF operations. Oh well, it was an idea. Given that it does not generalize as I had hoped, the mode lines do look simpler. - Daniel

Faré

14 Apr 14 Apr

5:48 p.m.

On Thu, Apr 12, 2012 at 04:51, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

If you accept that library authors will choose their encoding, even for the system definition files, then the only solution seems to be to add an encoding option to 'find-system and suggest this be used to load the system definition.

Unhappily, things like :depends-on (:this-system) cannot be all updated by users to specify a correct encoding, even less so when said encoding can change. Therefore it is a bad idea to have user-specified encodings for system files, unlike system-specified encodings for source files, since the system is maintained by the same people as part of the same project. The principle behind ASDF 2 has always been: those who know and those who specify should be the same person. So far, asdf 2.20.13 doesn't specify an encoding for asd files, so it's always :default, which in practice means that it's not portable to use non-ASCII in component names or file names. So far, that's what common practice is, and the diversity of filesystems makes it not portable to use anything but a subset of ASCII for filenames, anyway. I'm tempted to enforce a non-configurable UTF-8 for .asd files, too, so that it becomes possible to name components beyond ASCII, and that if and when filesystems become more standardized or some user is daring, ASDF is already ready to take advantage of that. That's a mostly independent change though, and my current take is to see with ASDF 2.21 how this change goes, and if it's successful, push for UTF-8 in system files in 2.22 -- and if it's a disaster, back out and issue a 2.22 immediately with :default as default. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org It is not recognized in the full amplitude of the word that all freedom is essentially self-liberation — that I can have only so much freedom as I procure for myself by my ownness. — Max Stirner

Raymond Toy

10 Apr 10 Apr

11:22 p.m.

On 4/8/12 12:28 PM, Nikodemus Siivola wrote:

...

On 8 April 2012 17:36, Faré <fahree@gmail.com> wrote:

...
I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify

I disagree. Consider this:

X has a system that used to be in, say, LATIN-9. He uses latin-9 at home, and everything works fine. His users either use it as well, or at least another single-byte encoding.

ASDF is updated, and X's user reports breakage. Everything works fine for X, because he didn't update ASDF yet. So he updates ASDF, and X updates his system to specify :LATIN-9 (or :DEFAULT, or whatever).

Now another of his users reports breakage, because /they/ didn't update ASDF yet -- and their ASDF doesn't support :ENCODING, so things break. They update ASDF, which in turn breaks another :LATIN-N system they were using.

The potential cost is non-trivial, and I really don't pretend to know eg. how many Japanese hackers user non-UTF-encodings in their source.

IMO encouraging people to add :encoding :utf-8 is much saner.

I agree with this. If the library needs a special encoding, let the library specify it. ASDF won't break any existing definitions and will support systems just fine. I think it's a strong indication that the current asdf behavior has worked without too many complaints about encodings is a good sign that whatever the default is works pretty well as the default. (Being illiterate, ASCII is all I need, except when I want to play with other encodings on purpose.) Ray

Douglas Crosher

9 Apr 9 Apr

8:33 a.m.

Attached is my suggestion for adding external-format support. * A table of translations is included, based on asdf-encodings, but if not found then the external-format is passed through. The intention is to increase the range of aliases supported to make it easier to write portable system definitions. It is not the intention that the list be exhaustive or that any attempt be made to verify the encoding. * It uses :external-format. Users will be working with external-formats, perhaps for a foreign CL implementation but still external-formats. Introducing new terminology of 'encoding' seems a mistake. * No attempt is made to verify the external format. This is does seem necessary and not even possible. * A declarative system definition can be used for both portable :utf-8 and implementation dependant (non-portable) external-formats. There is no need to add code methods or extend asdf-encodings to use user defined or implementation dependant external formats. Supporting declarative definitions has many advantages over the alternative of requiring asdf-encoding code or asdf methods to support user defined or implementation dependant external-formats. * The default is :default. The external-format support in ASDF would seem to be needed to write 'portable' libraries with UTF-8 source files so it will not be possible until users have upgraded anyway. Portability is not gained now by making :utf-8 the default, so I just don't see the advantage of making :utf-8 the default when this would break backward compatibility and make migration problematic and run contra to the ANSI CL standard. * At less than 200 lines of code it is just included in asdf.lisp. Regards Douglas Crosher On 04/09/2012 12:36 AM, Faré wrote:

...

Abstract: I think requiring a few marginal hackers doing weird things to specifiy :encoding :default is a small price to pay for everyone to be able to specify their encoding in a portable way, with a sane default that is already almost universally accepted (i.e. :utf-8).

On Sun, Apr 8, 2012 at 07:31, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...
The portable-hemlock is still maintained and was updated a few months ago to avoid the use of non-ascii characters in the source so it builds cleanly with UTF-8 as the input external-format. The code is not in great shape, but is being improved. See: http://gitorious.org/hemlock/pages/Home

Oh, I hadn't noticed this new page for hemlock. Is CMUCL using the portable hemlock these days, or still including its own?

...
Even if you get all the quick lisp projects converted to be UTF-8 clean, this still represents a subset of ASDF users. I wish you would reconsider these changes to ASDF because I fear it is divisive.

Well, I recognize that not all code is in Quicklisp and that there is a need for a backward compatibility mode. Putting :encoding :default in your defsystem will achieve just that.

At the same time, if :encoding :default rather than :encoding :utf-8 were the default, then we'd gain nothing, and it would still be a horrible mess to ascertain which system has been compiled with which encoding.

...
It is not reasonable to expect users of ASDF to hack on external support code just to use non-UTF-8 external-formats, and the external library you plan for can never be complete because the external-format is user extensible.

Well, on the one hand, for portability's sake, one should probably one's lisp file to a universally supported external format. On the other hand, where portability is not a problem, one can either use :encoding :default and be back to the current semantics, or extend asdf-encodings as one extends external formats.

...
ASDF could easily be flexible regarding the external-format and not a limited bastion of portable open source code.

Agreed. Currently, ASDF is not flexible at all -- rather it is uncontrolled.

...
It would be very easy and workable to just name this :external-format, and to pass through encodings not recognised - all the quicklisp projects would work just fine using :utf-8 and other CL users could use encodings as needed. Unhappily, passing through external formats is not portable, if only for CLISP. But if you're doing non-portable things, you can keep doing whatever you were previously doing with :encoding :default, or you can now define methods on asdf::component-external-format to do whatever you want, to override the default behavior of checking *encoding-external-format-hook*. Or then again, you can extend asdf-encodings to make it smarter.

In practice, how many people do you know who use a non-UTF-8 encoding, and how many of them will be majorly annoyed by having to either recode their source, explicitly specify their encoding, or add :encoding :default to preserve backwards compatibility?

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org "I've finally learned what `upward compatible' means. It means we get to keep all our old mistakes." — Dennie van Tassel

Faré

1:46 p.m.

On Mon, Apr 9, 2012 at 11:33, Douglas Crosher <dtc-asdf@scieneer.com> wrote:

...

Attached is my suggestion for adding external-format support.

Thanks a lot for taking the time to express your suggestion as a patch. It is much appreciated. You've convinced me that we should consider making asdf-encodings part of asdf itself, at the cost of ~200 lines. I can see arguments both ways, and I'm open to opinions. However, I still think that ASDF needs a portable file encoding layer well distinguished from non-portable external-formats (see below). If anyone here cares, please speak. Unless I start seeing many people concur against me, I'll stick to my plan.

...

* A table of translations is included, based on asdf-encodings, but if not found then the external-format is passed through.

Note that this may not be obvious from the lacking documentation, but asdf-encodings already passes through code encodings that are not found.

...

It is not the intention that the list be exhaustive or that any attempt be made to verify the encoding.

Unhappily, the "verification" step is the only way I've found to determine which of many many known aliases a given implementation uses, so as to portably map other, unsupported, names, to that alias. Since some implementation (say lispworks) insist in calling :latin-1 what another implementation (say clisp) insists in calling charset:iso-8859-1, whereas yet another implementation (say ecl) wrongfully thinks :latin-6 is :iso-8859-6 when it is actually :iso-8859-10, unless you're going to build a big table valid for 9 to 14 implementations and all past and future versions thereof, you're not going to win. Detection works, and allows me to provide a list of encoding names that actually works portably. You want :latin1 ? I'll give it to you on sbcl, ccl, lispworks, clisp, ecl, scl, allegro, etc.

...

* It uses :external-format. Users will be working with external-formats, perhaps for a foreign CL implementation but still external-formats. Introducing new terminology of 'encoding' seems a mistake.

That's one case where I don't see you convincing me, for the reason I detailed above: passing through encoding names basically requires the user to know a common name for all the implementations that he may want to target. Such common name doesn't even exist in the common case of latin1. Therefore there MUST be a translation layer. Giving the same name to two very different things is NOT going to help anyone, but only to cause confusion. Also, note that ASDF's encodings play a much more limited role than external-formats: they are not meant as a way to express arbitrary transformations an implementation may provide for a variety of input and output uses, but only as a way to *portably* specify an character encoding for the input only reading of Lisp source files for the purposes of LOADing of COMPILE-FILEing. Therefore, ASDF encodings are NOT meant as a full portable replacement for any implementation's external-format system. For that, you'll want flexi-streams, iolib (using babel), or some other library to be determined. They are only meant to allow the portable use of non-UTF8 code.

...

* No attempt is made to verify the external format. This is does seem necessary and not even possible.

I proved in my code that detection is possible on all implementations that support multiple external-formats, except for abcl (bug filed): allegro, clozure, clisp, cmucl, ecl, sbcl, lispworks (mostly, based on documentation), scl (kind of; help required). Remain unsupported (possibly forever): cormanlisp, gcl, genera, rmcl, xcl

...

* A declarative system definition can be used for both portable :utf-8 and implementation dependant (non-portable) external-formats.

The current system already allows that. asdf-encodings is already pass-through when it doesn't recognize an encoding. If I merge asdf-encodings into asdf, the will become the default behavior. However, in the current setup where asdf-encodings is separate, I explicitly decided against making the default behavior pass-through, otherwise many library authors will be confused into believing their code is portable when really it is not, just because their current implementation recognizes the name they use, when other implementations won't, but asdf-encodings could if it were loaded. One of my goals as asdf maintainer is to make its semantics more predictable in a portable way, rather than pushing the responsibility of portability upon the user.

...

There is no need to add code methods or extend asdf-encodings to use user defined or implementation dependant external formats. Supporting declarative definitions has many advantages over the alternative of requiring asdf-encoding code or asdf methods to support user defined or implementation dependant external-formats.

That's a valid argument for merging asdf-encodings into asdf, whatever we decide is the semantics of asdf-encodings.

...

* The default is :default. The external-format support in ASDF would seem to be needed to write 'portable' libraries with UTF-8 source files so it will not be possible until users have upgraded anyway. Portability is not gained now by making :utf-8 the default, so I just don't see the advantage of making :utf-8 the default when this would break backward compatibility and make migration problematic and run contra to the ANSI CL standard.

Most library users are *already* using UTF-8, in a way that in practice works well in the common case. My goal with asdf-encodings is to (1) make this common case work *reliably*, and (2) (also reliably) support the uncommon case on non-UTF8 source code. Yes, portability *would* be gained by making UTF-8 the official default, rather than requiring every user to somehow magically setup his Lisp environment before he starts invoking ASDF, in a way that makes libraries with contrary encoding assumptions become mutually incompatible.

...

* At less than 200 lines of code it is just included in asdf.lisp.

That's a valid argument for merging asdf-encodings into asdf, whatever we decide is the semantics of asdf-encodings. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org The truth of a proposition has nothing to do with its credibility. And vice versa. — Robert Heinlein, "Time Enough For Love"

Raymond Toy

10 Apr 10 Apr

11:34 p.m.

On 4/9/12 1:46 PM, Faré wrote:

...

Also, note that ASDF's encodings play a much more limited role than external-formats: they are not meant as a way to express arbitrary transformations an implementation may provide for a variety of input and output uses, but only as a way to *portably* specify an character encoding for the input only reading of Lisp source files for the purposes of LOADing of COMPILE-FILEing.

Just out of curiosity how will your :encoding handle files with, say Mac end-of-line, which is just CR? If the first line is a comment, and my lisp expects LF, the entire file is just a comment, I think. Well, I don't think there are too many files nowadays like that, but I have run into an occasional file with CR as the end-of-line. Ray

Faré

11 Apr 11 Apr

7:48 a.m.

...

Just out of curiosity how will your :encoding handle files with, say Mac end-of-line, which is just CR? If the first line is a comment, and my lisp expects LF, the entire file is just a comment, I think.

I don't do anything special for line endings, which is the same thing that ASDF doesn't do currently. People who want portable code should probably assume the Unix LF convention, but whatever works now will keep working and whatever doesn't will keep not working. If we find that extending ASDF enables more people to share more code, we'll do it. But I have no such plan for now.

...

Well, I don't think there are too many files nowadays like that, but I have run into an occasional file with CR as the end-of-line.

Same here, though most are MCL-specific and going away. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org

Robert Goldman

7:58 a.m.

On 4/11/12 Apr 11 -9:48 AM, Faré wrote:

...

...
Just out of curiosity how will your :encoding handle files with, say Mac end-of-line, which is just CR? If the first line is a comment, and my lisp expects LF, the entire file is just a comment, I think.

I don't do anything special for line endings, which is the same thing that ASDF doesn't do currently. People who want portable code should probably assume the Unix LF convention, but whatever works now will keep working and whatever doesn't will keep not working. If we find that extending ASDF enables more people to share more code, we'll do it. But I have no such plan for now.

...
Well, I don't think there are too many files nowadays like that, but I have run into an occasional file with CR as the end-of-line.

Same here, though most are MCL-specific and going away.

I encountered this several times working with people on older Macs that pushed stuff into a shared repository with bad line endings. Gives very odd errors, since everything after the first ";" gets dropped by the reader! But those seemed better to fix with the various line-endings-massaging techniques of revision control systems, than enshrining in the ASDF systems.... best, r

4870

Age (days ago)

5313

Last active (days ago)

List overview

Download

61 comments

13 participants

participants (13)

Anton Vodonosov
Cyrus Harmon
Daniel Herring
Douglas Crosher
Faré
Nikodemus Siivola
Orivej Desh
Orivej Desh
Pascal Costanza
Pascal J. Bourguignon
Raymond Toy
Robert Goldman
Stelian Ionescu