On 04/22/2012 09:38 AM, Faré wrote:
Dear all,
asdf-encodings now includes a version of Douglas Crosher's encoding detection algorithm. It is also automatically disabled on implementations without unicode support.
As compared to Douglas's version, the detection algorithm uses :ascii or :latin1 instead of :default as a fallback when no declaration was found and no UTF-n encoding was detected. It also uses a 1024-byte buffer rather than 320-byte buffer, to imitate what Emacs does with respect to the beginning of a file.
Increasing this to 1024 bytes seems a good idea, just in case it is a few lines down.
Note that a file having all octects below #x80 does not ensure it is ASCII, just that it does not have any UTF-8 specific codes.
The default should still be :default because it may be an encoding the CL implementation is aware of.
I suggest that asdf-encodings is now ready for testing, and invite you to test it.
An example package that uses it is my lambda-reader, heavily modified from Brian Mastenbrook's original to make it hopefully portable to all implementations with or without utf-8 support.
From the lambda-reader source code:
;;; Note that this file uses UTF-8. ;;; But if you use an implementation that does not recognize UTF-8, ;;; and instead has 8-bit characters, it should still work, ;;; and other files should still be able to use its functionality, provided ;;; (1) you do NOT transcode either this file or the files that use it ;;; (2) you do not care that lambdas be read a sequence of characters CEBB ;;; or CE9B for uppercase lambda rather than a single character.
Making code dependent on the file encoding is not recommended, and writing a library that requires code that uses it to be in the same encoding is hardly defensible. If another author decides to do this then the Quicklisp releases become fractured. Please do not let this into a Quicklisp release.
A tool to automatically add the coding file option has been written. There is no need to contact library authors any further, requesting them to recode their files, as I am confident we can work with their code as it is. The tool can also recode files to UTF-8 or attempt to recode to ISO-8859-1.
Having libraries in Quicklisp that are sensitive to the source encoding makes such recoding fragile.
Usage of the ASDF :encoding declaration will likely break the reading of files recoded using this tool as it is not practical to update the system definitions, so the use of the :encoding declaration is not recommended and it really is a liability.
With substitutions and ignoring UTF-8 in comments, the percentage of UTF-8 files required in Quicklisp is below 0.8% and much of this is concentrated into an even smaller number of releases.
Regards Douglas Crosher