On Sat, 11 Apr 2009, Lu�s Oliveira wrote:
On Fri, Apr 10, 2009 at 11:56 PM, Dan Weinreb dlw@itasoftware.com wrote:
I don't understand why.� If code-char is allowed to return nil, explicitly, in the CL standard, why consider that to be a babel test failure?
Suppose (code-char 237) returned NIL instead of #\�. That's allowed by the CL standard, but I'm positive some Babel test should fail because of that.
Assuming that the implementation in question used Unicode (or some subset of it) and that CHAR-CODE-LIMIT was > 237, it's hard to see how this case (where a character is associated with a code in Unicode) is analogous to the case that we're discussing (where Unicode says that no character is or ever can be associated with a particular code.)
The spec does quite clearly say that CODE-CHAR is allowed to return NIL if no character with the specified code attribute exists or can be created. CCL's implementation of CODE-CHAR returns NIL in many (unfortunately not all) cases where the Unicode standard says that no character corresponds to its code argument; other implementations currently do not return NIL in this case. There are a variety of arguments in favor of and against either behavior, ANSI CL allows either behavior, and code can't portably assume either behavior.
I believe that it's preferable for CODE-CHAR to return NIL in cases where it can reliably and efficiently detect that its argument doesn't denote a character, and CCL does this. Other implementations behave differently, and there may be reasons that I can't think of for finding that behavior preferable. I'm not really sure that I understand the point of this email thread and I'm sure that I must have missed some context, but some part of it seems to be an attempt to convince me (or someone) that CODE-CHAR should never return NIL because of some combination of:
- in other implementations, it never returns NIL - there is some otherwise useful code which fails (or its test suite fails) because it assumes that CODE-CHAR always returns a non-NIL value.
If I understand this much correctly, then I can only say that I didn't personally find these arguments persuasive when I was trying to decide how CODE-CHAR should behave in CCL a few years ago and don't find them persuasive now.
If there were a lot of otherwise useful code out there that made the same non-portable assumption and if it was really hard to write character-encoding utilities without assuming that all codes between 0 and CHAR-CODE-LIMIT denote characters, then I'd be less dismissive of this than I'm being. As it is, I'm sorry that I can't say anything more constructive than "I hope that you or someone will have the opportunity to change your code to remove non-portable assumptions that make it less useful with CCL than it would otherwise be."
If the point of this email thread is something else ... well, I'm sorry to have missed that point and will try to say something more responsive if/when I understand what that point is.