Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
I always thought these names were standardised within the Unicode standard, for example:
"LATIN CAPITAL LETTER U WITH OGONEK” for Ų
The rest being a matter of porting these names to symbols, but it’s just replacing spaces with underscores in a CL character literal:
#\LATIN_CAPITAL_LETTER_U_WITH_OGONEK
[a]
PS. both characters work in sbcl, but neither in LispWorks, which I might be running a limited version in my OpusModus environment, though
On 13 Oct 2024, at 19:48, Marco Antoniotti marco.antoniotti@unimib.it wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it http://dcb.disco.unimib.it/ Viale Sarca 336 I-20126 Milan (MI) ITALY
Yep. "It works in SBCL". I think I wrote a rant about it on my blog some time ago.
Cheers
MA
On Sun, Oct 13, 2024 at 10:39 PM Antoni Grzymała antoni@grzymala.info wrote:
I always thought these names were standardised within the Unicode standard, for example:
"LATIN CAPITAL LETTER U WITH OGONEK” for Ų
The rest being a matter of porting these names to symbols, but it’s just replacing spaces with underscores in a CL character literal:
#\LATIN_CAPITAL_LETTER_U_WITH_OGONEK
[a]
PS. both characters work in sbcl, but neither in LispWorks, which I might be running a limited version in my OpusModus environment, though
On 13 Oct 2024, at 19:48, Marco Antoniotti marco.antoniotti@unimib.it wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
It does work in ECL, too. Does not in ABCL.
Don’t have other impls handy at the moment to check.
On 13 Oct 2024, at 22:48, Marco Antoniotti marco.antoniotti@unimib.it wrote:
Yep. "It works in SBCL". I think I wrote a rant about it on my blog some time ago.
Cheers
MA
On Sun, Oct 13, 2024 at 10:39 PM Antoni Grzymała <antoni@grzymala.info mailto:antoni@grzymala.info> wrote:
I always thought these names were standardised within the Unicode standard, for example:
"LATIN CAPITAL LETTER U WITH OGONEK” for Ų
The rest being a matter of porting these names to symbols, but it’s just replacing spaces with underscores in a CL character literal:
#\LATIN_CAPITAL_LETTER_U_WITH_OGONEK
[a]
PS. both characters work in sbcl, but neither in LispWorks, which I might be running a limited version in my OpusModus environment, though
On 13 Oct 2024, at 19:48, Marco Antoniotti <marco.antoniotti@unimib.it mailto:marco.antoniotti@unimib.it> wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it http://dcb.disco.unimib.it/ Viale Sarca 336 I-20126 Milan (MI) ITALY
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it http://dcb.disco.unimib.it/ Viale Sarca 336 I-20126 Milan (MI) ITALY
In my opinion ANSI CL would'a could'a should'a standardized on Unicode (at least the 8 and/or 16-bit ranges) except that Unicode was not available yet. Since virtually the entire computing universe has since standardized on Unicode, it would be insane to do anything else if an updated CL standard could somehow be established. Since Unicode standardizes all character names, CHAR-NAME and NAME-CHAR should be defined to use those standard names, described here in Wikipedia https://en.wikipedia.org/wiki/Unicode_character_property#Name_and_alias. The only obvious ugliness from the perspective of CL is that a char name really wants to be something the reader will swallow as a string designator, but standard Unicode names can and do contain spaces and hyphens (but not underscores). An obvious solution is that CL would translate space into underscore inside char names. (This is what Allegro does.) One could also escape spaces with backslash, but that is unbearably ugly: *#\Latin\ Capital\ letter\ A* . Unfortunately, the current ANS gives implementations freedom not to support names for graphic (printing) chars. That should also be reconsidered in a revised Unicode-cognizant standard.
On Sun, Oct 13, 2024 at 1:54 PM Antoni Grzymała antoni@grzymala.info wrote:
It does work in ECL, too. Does not in ABCL.
Don’t have other impls handy at the moment to check.
On 13 Oct 2024, at 22:48, Marco Antoniotti marco.antoniotti@unimib.it wrote:
Yep. "It works in SBCL". I think I wrote a rant about it on my blog some time ago.
Cheers
MA
On Sun, Oct 13, 2024 at 10:39 PM Antoni Grzymała antoni@grzymala.info wrote:
I always thought these names were standardised within the Unicode standard, for example:
"LATIN CAPITAL LETTER U WITH OGONEK” for Ų
The rest being a matter of porting these names to symbols, but it’s just replacing spaces with underscores in a CL character literal:
#\LATIN_CAPITAL_LETTER_U_WITH_OGONEK
[a]
PS. both characters work in sbcl, but neither in LispWorks, which I might be running a limited version in my OpusModus environment, though
On 13 Oct 2024, at 19:48, Marco Antoniotti marco.antoniotti@unimib.it wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
Hi Marco, SMH, and the rest of the list,
I assume you are familiar with Edi Weitz's CL-UNICODE:
"It also provides the ability to replace the standard syntax for reading Lisp characters with one that is Unicode-aware and is used to enhance CL-PPCRE with Unicode properties"
http://edicl.github.io/cl-unicode/#enable-alternative-character-syntax
I would hope that it would allow for some level of portable code.
Cheers,
Elliott
On 10/13/24 10:48 AM, Marco Antoniotti wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
Hi Elliot
yes I am familiar with Saint Edi's (always blessed be his parentheses!) CL-UNICODE.
After a few wrong turns it appears that the incantation you need in Lispworks to make it swallow Unicode names is
(eval-when (:load-toplevel :compile-toplevel :execute) (cl-unicode:enable-alternative-character-syntax) (setf cl-unicode:*try-lisp-names-p* t) )
You need both, otherwise you get funny errors like #= is not a character.
Unicode names with spaces can be read using an underscore in their place.
CL-USER 16 >* #\LATIN_SMALL_LETTER_E_GRAVE* #\è
All the best
MA
Marco
On Mon, Oct 14, 2024 at 6:02 AM Elliott Johnson elliott@elliottjohnson.net wrote:
Hi Marco, SMH, and the rest of the list,
I assume you are familiar with Edi Weitz's CL-UNICODE:
"It also provides the ability to replace the standard syntax for
reading Lisp characters with one that is Unicode-aware and is used to enhance CL-PPCRE with Unicode properties"
http://edicl.github.io/cl-unicode/#enable-alternative-character-syntax
I would hope that it would allow for some level of portable code.
Cheers,
Elliott
On 10/13/24 10:48 AM, Marco Antoniotti wrote:
Hello parenthetical crowd
Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)?
Cf., #\INFINITY
All the best
MA
PS Do not even think to use the "hey, it is an implementation-dependent thing" argument!
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
Marco,
His libraries have definitely been inspirational. Glad it's working for you!
Best,
Elliott
On 10/15/24 11:28 AM, Marco Antoniotti wrote:
Hi Elliot
yes I am familiar with Saint Edi's (always blessed be his parentheses!) CL-UNICODE.
After a few wrong turns it appears that the incantation you need in Lispworks to make it swallow Unicode names is
(eval-when (:load-toplevel :compile-toplevel :execute) (cl-unicode:enable-alternative-character-syntax) (setf cl-unicode:*try-lisp-names-p* t) )
You need both, otherwise you get funny errors like #= is not a character.
Unicode names with spaces can be read using an underscore in their place.
CL-USER 16 >*#\LATIN_SMALL_LETTER_E_GRAVE* #\è
All the best
MA
Marco
On Mon, Oct 14, 2024 at 6:02 AM Elliott Johnson elliott@elliottjohnson.net wrote:
Hi Marco, SMH, and the rest of the list, I assume you are familiar with Edi Weitz's CL-UNICODE: "It also provides the ability to replace the standard syntax for reading Lisp characters with one that is Unicode-aware and is used to enhance CL-PPCRE with Unicode properties" http://edicl.github.io/cl-unicode/#enable-alternative-character-syntax I would hope that it would allow for some level of portable code. Cheers, Elliott On 10/13/24 10:48 AM, Marco Antoniotti wrote:
Hello parenthetical crowd Is there a consensus about how to "name" Unicode characters, or every implementation does whatever it likes (thus breaking otherwise perfectly portable code)? Cf., #\INFINITY All the best MA PS Do not even think to use the "hey, it is an implementation-dependent thing" argument! -- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
-- Marco Antoniotti, Professor tel. +39 - 02 64 48 79 01 DISCo, University of Milan-Bicocca U14 2043 http://dcb.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY