So I'm trying to parse a schema that has the following in it:
property-attach = element attach {
element parameters { fmttypeparam? & encodingparam? }?,
value-uri | value-binary }
In order to get this to work with cxml-rng, I have to do the following:
property-attach = element attach {
element parameters { fmttypeparam? & encodingparam? }?,
( value-uri | value-text ) }
Is the former invalid RNC or is cxml-rng's parser barfing where it shouldn't? A complete, minimal-ish example is shown below and attempting to parse it gives:
failed to parse compact syntax at char 552, file://+/Users/sly/projects/cl-vcard/foo.rnc: Unexpected terminal CXML-RNG::||| (value CXML-RNG::|||). Expected one of: (NIL CXML-RNG::} CXML-RNG::DOCUMENTATION-LINE CXML-RNG::[ :INCLUDE CXML-RNG::IDENTIFIER :START :DIV CXML-RNG::|)| CXML-RNG::CNAME CXML-RNG::|,|) [Condition of type CXML-RNG:RNG-ERROR]
Thanks,
Cyrus
# Hacked together from the RELAX NG Schema for iCalendar in XML
default namespace = "urn:ietf:params:xml:ns:icalendar-2.0-hack"
value-text = element text { xsd:string }
value-binary = element binary { xsd:string }
value-uri = element uri { xsd:anyURI }
encodingparam = element encoding { element text { "8BIT" | "BASE64" } }
fmttypeparam = element fmttype { value-text }
property-attach = element attach {
element parameters { fmttypeparam? & encodingparam? }?,
value-uri | value-text }
start = element icalendar { property-attach+ }
Quoting Cyrus Harmon (ch-lisp@bobobeach.com):
Is the former invalid RNC or is cxml-rng's parser barfing where it shouldn't? A complete, minimal-ish example is shown below and attempting to parse it gives:
The RNC is invalid:
| There is no notion of operator precedence. It is an error for patterns | to combine the |, &, , and - operators without using parentheses to | make the grouping explicit. For example, foo | bar, baz is not allowed; | instead, either (foo | bar), baz or foo | (bar, baz) must be used. A | similar restriction applies to name classes and the use of the | and - | operators.
The fun part: The spec has not one but two BNFs.
The first BNF is incorrect and you're supposed to basically ignore it. The second BNF correct but much less readable:
| These restrictions are not expressed in the above EBNF but | they are made explicit in the BNF in Section 1.
d.
Wow. Thanks!
On Mar 6, 2012, at 10:20 AM, David Lichteblau wrote:
Quoting Cyrus Harmon (ch-lisp@bobobeach.com):
Is the former invalid RNC or is cxml-rng's parser barfing where it shouldn't? A complete, minimal-ish example is shown below and attempting to parse it gives:
The RNC is invalid:
| There is no notion of operator precedence. It is an error for patterns | to combine the |, &, , and - operators without using parentheses to | make the grouping explicit. For example, foo | bar, baz is not allowed; | instead, either (foo | bar), baz or foo | (bar, baz) must be used. A | similar restriction applies to name classes and the use of the | and - | operators.
The fun part: The spec has not one but two BNFs.
The first BNF is incorrect and you're supposed to basically ignore it. The second BNF correct but much less readable:
| These restrictions are not expressed in the above EBNF but | they are made explicit in the BNF in Section 1.
d.
While we're at it… Any idea why the following gives an error saying "restriction on string sequences violated"?
# Hacked together from the RELAX NG Schema for iCalendar in XML
default namespace = "urn:ietf:params:xml:ns:icalendar-2.0-hack"
type-weekday = ( "SU" | "MO" | "TU" | "WE" | "TH" | "FR" | "SA" )
type-byday = element byday { xsd:integer?, type-weekday }
start = element icalendar { type-byday+ }
thanks again,
Cyrus
On Mar 6, 2012, at 10:20 AM, David Lichteblau wrote:
Quoting Cyrus Harmon (ch-lisp@bobobeach.com):
Is the former invalid RNC or is cxml-rng's parser barfing where it shouldn't? A complete, minimal-ish example is shown below and attempting to parse it gives:
The RNC is invalid:
| There is no notion of operator precedence. It is an error for patterns | to combine the |, &, , and - operators without using parentheses to | make the grouping explicit. For example, foo | bar, baz is not allowed; | instead, either (foo | bar), baz or foo | (bar, baz) must be used. A | similar restriction applies to name classes and the use of the | and - | operators.
The fun part: The spec has not one but two BNFs.
The first BNF is incorrect and you're supposed to basically ignore it. The second BNF correct but much less readable:
| These restrictions are not expressed in the above EBNF but | they are made explicit in the BNF in Section 1.
d.
Quoting Cyrus Harmon (ch-lisp@bobobeach.com):
While we're at it? Any idea why the following gives an error saying "restriction on string sequences violated"?
...
type-byday = element byday { xsd:integer?, type-weekday }
IIUC, the restrictions on string sequences are mainly designed to annoy the user (or equivalently, to simplify implementation :-)) and enforce a particularly narrow-minded idea of what XML is for.
In this case, the restriction is (sort of) understandable though, I think.
Relax NG checks that the tree of XML nodes in a document conforms to a certain grammar. Compare this to an ordinary regex, which checks that the sequence of characters in a string conform to a certain grammar.
These are two different levels of thinking. You can, in fact, work at both levels and have Relax NG check that the string contained in an XML text node conforms to a certain regex.
What you cannot do is to mix both features, i.e. you cannot describe the regex for a text node's string using schema patterns. And that is what the fragment above, if Relax NG supported it, would do: It would take the single child of the byday element (a text node), and check that this text node's string starts with a substring matching something like /-?[0-9]/ and continues with something like /SU|MO|TU|WE|TH|FR|SA/.
As a workaround, I believe you need to check for both parts together using an ad-hoc regex:
type-byday = element byday { xsd:string { pattern='-?[0-9]+(SU|MO|TU|WE|TH|FR|SA)' } }
(It's been years since I looked at Relax NG, so I could be completely wrong about all of this. I have no idea why that relatively recent RFC would include a grammar that doesn't work. Maybe there are newer versions of Relax NG that I am not aware of. The homepage mentions something about a version 2 that I can't find any information on.)
d.