Quoting Marco Antoniotti (marcoxa@cs.nyu.edu):
2 - since this is not CXML default behavior, is there a way to get CXML to do the "obvious" thing?
There is no single obvious thing. You need to define which kind of whitespace stripping you want.
a. Strip all text nodes, including those that have non-whitespace in them?
b. Strip all text nodes that are made up of whitespace exclusively?
c. Take text nodes that have non-whitespace and whitespace, and remove the whitespace from them while keeping the non-whitespace?
d. Same as c, but "compress" such whitespace rather than removing it entirely?
e. Choose between c and d depending on what the parent element is?
f. Do b only depending on what the parent element is?
Case study:
- XSLT basically does b, with a couple of customization features.
- HTML does e
- the DTD-based thing is f
I know that I could possibly remove the TEXT elements by hand, after having built the internal structure; but it does not feel right.
There are two technical approaches to normalize whitespace with cxml's APIs: - Do it on the fly, either in a SAX handler or a KLACKS source - Do it after the fact in the object model or application
The DTD-based thing is implemented as a SAX handler (first approach), see cxml/xml/space-normalizer.lisp
XSLT-style normalization is available in Xuriella XSLT, implemented using STP; see the function STRIP-STYLESHEET in xuriella/space.lisp.
Note that both implementation types I listed above are done entirely in user code. You don't need to change cxml to implement yet another variety of whitespace stripping.
Just copy&paste the code and change it to suit your needs -- or rewrite it. STRIP-STYLESHEET is a total of 23 lines of code long, I think.
d.