Well-formed XML is not allowed to include character entities that are not legal XML characters, according to the definition of character entity references in section 4.1 of the XML 1.0 specification:
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
Well-formedness constraint: Legal Character - Characters referred to using character references MUST match the - production for Char.
The production for Char is defined in section 2.2:
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Strings with any other characters ought to be base64-encoded. I just encountered a problem when lisppaste sent an XML-RPC response with a paste that contains form feed characters to client with an extremely strict XML parser.
Hi Taylor,
Thanks for the feedback!
Could you please be a bit more specific ? It has been a while since I have been reading these specs ;-)
The function s-xml:print-string-xml does a bit of escaping and is used from s-xml-rpc::encode-xml-rpc-value, what exactly are they doing wrong ? Could you give some concrete example, a CL listener session maybe ? I know that s-xml-rpc is used by a couple of other projects/people, so changing the string encoding must be done carefully.
Regards,
Sven
On 26 Jul 2006, at 23:08, Taylor R Campbell wrote:
Well-formed XML is not allowed to include character entities that are not legal XML characters, according to the definition of character entity references in section 4.1 of the XML 1.0 specification:
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
Well-formedness constraint: Legal Character
- Characters referred to using character references MUST match the
- production for Char.
The production for Char is defined in section 2.2:
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Strings with any other characters ought to be base64-encoded. I just encountered a problem when lisppaste sent an XML-RPC response with a paste that contains form feed characters to client with an extremely strict XML parser. _______________________________________________ s-xml-rpc-devel site list s-xml-rpc-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/s-xml-rpc-devel
Date: Fri, 28 Jul 2006 15:18:16 +0200 From: Sven Van Caekenberghe scaekenberghe@common-lisp.net
Could you please be a bit more specific ? It has been a while since I have been reading these specs ;-)
The function s-xml:print-string-xml does a bit of escaping and is used from s-xml-rpc::encode-xml-rpc-value, what exactly are they doing wrong ?
S-XML:PRINT-STRING-XML does do certain escaping, but some code points are invalid even if escaped as entity references. The particular problem I had was that lisppaste was sending me the contents of a paste with an ASCII form feed character, i.e. U+0C, which is not allowed in well-formed XML, even as an entity reference -- that is, the XML specification forbids . To get around this, any XML-RPC message containing such a character (or any character below U+0C that is not U+09, U+0A, or U+0D) must be base64-encoded first.
Could you give some concrete example, a CL listener session maybe ? I know that s-xml-rpc is used by a couple of other projects/people, so changing the string encoding must be done carefully.
CL-USER> (s-xml:print-xml-string (string (code-char #x0C))) "" CL-USER> (s-xml-rpc::encode-xml-rpc-value (string (code-char #x0C)) t) <value><string></string></value> "</value>" CL-USER>
S-XML:PRINT-XML-STRING should signal an error if any such characters are encountered, and S-XML-RPC::ENCODE-XML-RPC-VALUE should instead base64-encode the string and generate this output:
<value><base64>DA==</base64></value>
On 28 Jul 2006, at 18:10, Taylor R Campbell wrote:
S-XML:PRINT-STRING-XML does do certain escaping, but some code points are invalid even if escaped as entity references. The particular problem I had was that lisppaste was sending me the contents of a paste with an ASCII form feed character, i.e. U+0C, which is not allowed in well-formed XML, even as an entity reference -- that is, the XML specification forbids . To get around this, any XML-RPC message containing such a character (or any character below U+0C that is not U+09, U+0A, or U+0D) must be base64-encoded first.
CL-USER> (s-xml:print-xml-string (string (code-char #x0C))) "" CL-USER> (s-xml-rpc::encode-xml-rpc-value (string (code-char #x0C)) t) <value><string></string></value> "</value>" CL-USER>
S-XML:PRINT-XML-STRING should signal an error if any such characters are encountered, and S-XML-RPC::ENCODE-XML-RPC-VALUE should instead base64-encode the string and generate this output:
<value><base64>DA==</base64></value>
Where exactly in the XML-RPC spec does it say this ? I never saw something like that. From my point of view, the XML-RPC spec is softer and more down to earth than others. This is both a strong as well as a weak point. I also find their use of XML as being simplified as well.
If what you say is correct (and it looks that way), then I like XML even less than before ;-) This would make any serialization of string overly complex, for any language or encoding. Applying this change would kill some of my code is border cases. Maybe we could add something like a 'strict' flag to toggle the behavior you suggest.
Sven
Date: Fri, 28 Jul 2006 20:49:26 +0200 From: Sven Van Caekenberghe scaekenberghe@common-lisp.net
Where exactly in the XML-RPC spec does it say this ? I never saw something like that. From my point of view, the XML-RPC spec is softer and more down to earth than others. This is both a strong as well as a weak point. I also find their use of XML as being simplified as well.
It's not the XML-RPC specification that mandates this; it is simply what must be done to accomodate clauses in the XML specification. That is, <value><string>&#C;</string></value> is malformed XML, while <value><base64>DA==</base64></value> is well-formed, and since XML-RPC is encoded in XML, the only way to encode the value of the Lisp code (STRING (CODE-CHAR #X0C)) in XML-RPC is to use base64.
If what you say is correct (and it looks that way), then I like XML even less than before ;-) This would make any serialization of string overly complex, for any language or encoding. Applying this change would kill some of my code is border cases. Maybe we could add something like a 'strict' flag to toggle the behavior you suggest.
I'm no big fan of XML myself, but the XML specification is very specific about this, and it's much more stringent than its predecessors about these details. Much as I dislike XML, I think that it is at least important to try to comply with it, or at least to follow the universal internet guideline of being lenient in what is accepted and conservative in what is produced. (I had to fix MIT Scheme's XML-RPC support, by the way -- it, too, was failing to encode strings properly, although its XML parser refused the  from lisppaste.)
s-xml-rpc-devel@common-lisp.net