This topic came up at the Boston Lisp meeting last night. Here was my takeaway from the discussion.
ASDF does not need to support arbitrary encodings in system definition files. UTF-8 is the simplest and most portable solution for non-ASCII characters in system definitions.
Thus ASDF can simply declare that all system definitions are read using UTF-8. On systems where unicode support is not available, system definitions will be read as "extended ASCII", and the non-ASCII characters will simply print incorrectly. Filenames should only be specified using ASCII characters to avoid portability issues.
Authors do need a way to support arbitrary encodings in other source files, but this should be restricted to cases where there is an actual need to use a non-UTF-8 character set. Example use cases are locale-specific libraries, a toolchain that converts between character sets, or possibly a character-set-detection library. Again, other character sets can be a major portability issue.
There are only a handful of libraries that use something other than ASCII or UTF-8. These should be easy to fix, given approval from their authors.
Hopefully that's an accurate record of the general consensus and this writeup will be helpful to others in some way.
- Daniel
On 4/20/12 Apr 20 -6:43 PM, Daniel Herring wrote:
This topic came up at the Boston Lisp meeting last night. Here was my takeaway from the discussion.
ASDF does not need to support arbitrary encodings in system definition files. UTF-8 is the simplest and most portable solution for non-ASCII characters in system definitions.
This seems mostly true. The only exception I can really think of is for metadata. What if the author's name uses non UTF-8?
But it's not a perfect world, and we should probably learn to live with this.
best, r
On Fri, 20 Apr 2012, Robert Goldman wrote:
On 4/20/12 Apr 20 -6:43 PM, Daniel Herring wrote:
ASDF does not need to support arbitrary encodings in system definition files. UTF-8 is the simplest and most portable solution for non-ASCII characters in system definitions.
This seems mostly true. The only exception I can really think of is for metadata. What if the author's name uses non UTF-8?
What character in the author's name cannot be encoded in UTF-8?
Yes, there are some hieroglyphs and other ancient character sets not in UTF-8. Are there any gaps in coverage for current languages that have other character sets?
This will be an inconvenience for those who usually use a different local-specific encoding, and I think it is less than the inconvenience to the whole community if we try to support other encodings in the definition files.
- Daniel