Rather than allowing any CL S-expression (except that all symbols must be keywords), I propose that you allow only a more restricted variety of S-expressions so that they can be readily read and processed in any other language, while being compatible with a whole bunch of Lisp's read (without full validation) and print/write functions.
Specifically I propose POSE < https://github.com/s-expressions/pose/blob/master/README.adoc#specification%..., which allows only a restricted set of symbols, arbitrary integers, floats, and strings, and lists. (Most of that page is the rationale, which does not need to be read unless it interests you).
Code is provided to parse POSE in many languages, but the CL is not up to date, as it does not yet handle keywords, only colon-free identifiers. (But of course there is no reason not to use your own code if you want.)
John Cowan cowan@ccil.org writes:
Rather than allowing any CL S-expression (except that all symbols must be keywords), I propose that you allow only a more restricted variety of S-expressions so that they can be readily read and processed in any other language, while being compatible with a whole bunch of Lisp's read (without full validation) and print/write functions.
Specifically I propose POSE < https://github.com/s-expressions/pose/blob/master/README.adoc#specification%..., which allows only a restricted set of symbols, arbitrary integers, floats, and strings, and lists. (Most of that page is the rationale, which does not need to be read unless it interests you).
Hi John, thanks for the suggestion! I do agree that CLPI doesn't need the full expressivity of CL sexps and it's in fact not a good idea. Who wants to deal with reader macros or improper lists?
That being said, I see some potential issues with POSE that would be good to get your thoughts on.
First, I believe this is a typo otherwise keywords wouldn't be very useful:
colonsym = ':' | signsym
Second, it seems there's some disagreement between the spec and implementations when it comes to reading strings. The implementations all seem to treat the character sequences #\ #\n and #\ #\t as #\Newline and #\Tab, respectively. Additionally a #\ not followed by an #\n #\t #" or #\ will signal an error. This is at odds with CL:READ (and the POSE spec) and could cause issues if something like a system description has any of those characters. Which behavior is intended for POSE?
Third, I'm not sure how to deal with the case sensitivity of symbols. Is there any plan to add a :case argument or similar to POSE:READ? If not, it seems like we'd have to either mandate all symbols should be PRINTed upper case in CLPI, or do a post processing step on the POSE output to upcase everything, because writing |version|, |name|, etc. everywhere in the CL code would get real tired, real fast.
-Eric
On Sat, Sep 11, 2021 at 4:56 PM Eric Timmons etimmons@mit.edu wrote:
colonsym = ':' | signsym
Definitely a thinko on my part: fixed.
The implementations
all seem to treat the character sequences #\ #\n and #\ #\t as #\Newline and #\Tab, respectively.
Other people's thinkos: the spec prevails. If you could make a PR for the CL version to eliminate them, that would be handy.
Additionally a #\ not followed by an
#\n #\t #" or #\ will signal an error.
I think it's appropriate to signal an error on every backslash not followed by a quote or another backslash, as they are POSE syntax violations, just as a stray ) or a # outside a string would be.
Third, I'm not sure how to deal with the case sensitivity of symbols. Is there any plan to add a :case argument or similar to POSE:READ?
The problem is deeper than that: if CL's POSE parser receives (Foo foo), it will lose the distinction. I am arguing (in #s-expressions in the libera.chat IRC network, if you want to join in) that since symbols are basically labels for things, we don't really lose anything by making them case insensitive.