a. Strip all text nodes, including those that have non-whitespace in
them?
b. Strip all text nodes that are made up of whitespace exclusively?
c. Take text nodes that have non-whitespace and whitespace, and remove
the whitespace from them while keeping the non-whitespace?
d. Same as c, but "compress" such whitespace rather than removing it
entirely?
e. Choose between c and d depending on what the parent element is?
f. Do b only depending on what the parent element is?
Case study:
- XSLT basically does b, with a couple of customization features.
- HTML does e
- the DTD-based thing is f
I know that I could possibly remove the TEXT elements by hand, after
having built the internal structure; but it does not feel right.
There are two technical approaches to normalize whitespace with cxml's APIs:
- Do it on the fly, either in a SAX handler or a KLACKS source
- Do it after the fact in the object model or application
The DTD-based thing is implemented as a SAX handler (first approach),
see cxml/xml/space-normalizer.lisp
XSLT-style normalization is available in Xuriella XSLT, implemented
using STP; see the function STRIP-STYLESHEET in xuriella/space.lisp.
Note that both implementation types I listed above are done entirely in
user code. You don't need to change cxml to implement yet another
variety of whitespace stripping.
Just copy&paste the code and change it to suit your needs -- or rewrite
it. STRIP-STYLESHEET is a total of 23 lines of code long, I think.