Ever since ABCL raised its CHAR-CODE-LIMIT from 256 to #x10000, 2 tests started failing: char-upcase.1 and char-upcase.2.
These 2 tests iterate through all integers between 0 and CHAR-CODE-LIMIT. While doing so, they test for the property that upcasing and downcasing returns the same character again ("round-tripping"). This property of characters is specified in section 13.1.4.3 (http://www.lispworks.com/documentation/lw51/CLHS/Body/13_adc.htm) "Characters with case". In short: characters with case are defined in pairs; additional characters with case have to be defined in pairs too.
The spec provides char-upcase and char-downcase to convert characters-with-case to their 'other-case equivalent'.
However, in section 13.1.10, there seems to be an escape hatch: "Documentation of implementation-defined scripts". A script is a subtype of CHARACTER, nothing more nothing less. An implementation-defined script gets to document the effect on CHAR-UPCASE and CHAR-DOWNCASE.
Now, if I were to define our Unicode script to be every character except those in the base set, char-upcase and char-downcase may have different semantics, except for the standard characters. That way, there's no need to have the round-tripping requirement apply to most of unicode - as can't be expected, see latin-small-letter-dotless-i for an example.
In the light above, is it really portable for the tests to assume all characters must be round-tripped? I think it's not.
What are your opinions?
Bye,
Erik.
On 4/3/10, Erik Huelsmann ehuels@gmail.com wrote:
However, in section 13.1.10, there seems to be an escape hatch: "Documentation of implementation-defined scripts". A script is a subtype of CHARACTER, nothing more nothing less. An implementation-defined script gets to document the effect on CHAR-UPCASE and CHAR-DOWNCASE.
I don't think this gives you a license to discard the round-tripping invariant.
there's no need to have the round-tripping requirement apply to most of unicode - as can't be expected, see latin-small-letter-dotless-i for an example.
why not make it its own upper case? this is not exactly correct from the unicode pov, but, I think, it is better that the alternative. this round-tripping requirement is, i think, pretty important in symbol i/o.
Hi Sam,
On Sun, Apr 4, 2010 at 10:58 AM, Sam Steingold sds@gnu.org wrote:
On 4/3/10, Erik Huelsmann ehuels@gmail.com wrote:
However, in section 13.1.10, there seems to be an escape hatch: "Documentation of implementation-defined scripts". A script is a subtype of CHARACTER, nothing more nothing less. An implementation-defined script gets to document the effect on CHAR-UPCASE and CHAR-DOWNCASE.
I don't think this gives you a license to discard the round-tripping invariant.
I read the same section again and on second reading I think the section indeed does not allow that freedom.
there's no need to have the round-tripping requirement apply to most of unicode - as can't be expected, see latin-small-letter-dotless-i for an example.
why not make it its own upper case? this is not exactly correct from the unicode pov, but, I think, it is better that the alternative. this round-tripping requirement is, i think, pretty important in symbol i/o.
I hadn't thought about the reader and printer behaviours regarding *readtable-case* and *print-case*. However, it would be logical by analogy that if a string doesn't get recoded in a round-trip, then the symbol name won't either.
But I agree now this isn't CLHS compliant. Does clisp handle this by making it non-alphabetical or by making it a character without case (ie a character which up/lowercases to itself)?
I think now that the tests are correct, but that the requirement in the CLHS is out-dated. However, that's something to address in the implementation itself. I'll discuss on the ABCL list.
Thanks for your time!
Bye,
Erik.
Hi Erik,
On 4/4/10, Erik Huelsmann ehuels@gmail.com wrote:
Does clisp handle this by making it non-alphabetical or by making it a character without case (ie a character which up/lowercases to itself)?
it is an alphabetical character without case in clisp.
On 4 April 2010 15:04, Erik Huelsmann ehuels@gmail.com wrote:
I think now that the tests are correct, but that the requirement in the CLHS is out-dated. However, that's something to address in the implementation itself. I'll discuss on the ABCL list.
I would suggest that it is better to provide
EXT:UNICODE-UPCASE
or similar, instead of making standard functions behave in a manner not consistent with the spec.
Cheers,
-- Nikodemus
armedbear-devel@common-lisp.net