On Dec 1, 2009, at 15:55 , David Lichteblau wrote:
Quoting Marco Antoniotti (marcoxa@cs.nyu.edu):
2 - since this is not CXML default behavior, is there a way to get CXML to do the "obvious" thing?
There is no single obvious thing. You need to define which kind of whitespace stripping you want.
AFAIU the DTD specifies how to deal with whitespaces. The examples in the documentation seem to say that.
a. Strip all text nodes, including those that have non-whitespace in them?
b. Strip all text nodes that are made up of whitespace exclusively?
c. Take text nodes that have non-whitespace and whitespace, and remove the whitespace from them while keeping the non-whitespace?
d. Same as c, but "compress" such whitespace rather than removing it entirely?
e. Choose between c and d depending on what the parent element is?
f. Do b only depending on what the parent element is?
Case study:
XSLT basically does b, with a couple of customization features.
HTML does e
the DTD-based thing is f
I know that I could possibly remove the TEXT elements by hand, after having built the internal structure; but it does not feel right.
There are two technical approaches to normalize whitespace with cxml's APIs:
- Do it on the fly, either in a SAX handler or a KLACKS source
- Do it after the fact in the object model or application
The DTD-based thing is implemented as a SAX handler (first approach), see cxml/xml/space-normalizer.lisp
XSLT-style normalization is available in Xuriella XSLT, implemented using STP; see the function STRIP-STYLESHEET in xuriella/space.lisp.
Note that both implementation types I listed above are done entirely in user code. You don't need to change cxml to implement yet another variety of whitespace stripping.
Just copy&paste the code and change it to suit your needs -- or rewrite it. STRIP-STYLESHEET is a total of 23 lines of code long, I think.
Ok, that is a lot of work on my part AFAIAC. I think I understand the mechanics of what you are saying, but you are not answering my question.
I gave you the first two lines of theSBML document. SBML comes with a XSchema definition. I am assuming that having the xsd will be equivalent to having the DTD (I think I am right on this) and therefore have the correct indication about what is what and how it should be parsed.
<?xml version="1.0" encoding="UTF-8"?> <sbml xmlns="http://www.sbml.org/sbml/level1" level="1" version="1"> ... </sbml>
is what I have.
Can XML be coerced into accessing the xmlns="http://www.sbml.org/sbml/level1 " (with DRAKMA), understanding it and using it or not? (Thus - hopefully - stripping the TEXT elements automatically?)
If yes, how?
IMHO, it would be quite a plus to be able to deal with a case like this automatically (i.e., SBML) without much user intervention, especially as a post-processing step.
Cheers
-- Marco Antoniotti