On Dec 1, 2009, at 15:55 , David Lichteblau wrote:

Quoting Marco Antoniotti (marcoxa@cs.nyu.edu):
2 - since this is not CXML default behavior, is there a way to get
CXML to do the "obvious" thing?

There is no single obvious thing. You need to define which kind of
whitespace stripping you want.

AFAIU the DTD specifies how to deal with whitespaces. The examples in the documentation seem to say that.

a. Strip all text nodes, including those that have non-whitespace in
    them?

b. Strip all text nodes that are made up of whitespace exclusively?

c. Take text nodes that have non-whitespace and whitespace, and remove
    the whitespace from them while keeping the non-whitespace?

d. Same as c, but "compress" such whitespace rather than removing it
    entirely?

e. Choose between c and d depending on what the parent element is?

f. Do b only depending on what the parent element is?

Case study:

- XSLT basically does b, with a couple of customization features.

- HTML does e

- the DTD-based thing is f

I know that I could possibly remove the TEXT elements by hand, after
having built the internal structure; but it does not feel right.

There are two technical approaches to normalize whitespace with cxml's APIs:
- Do it on the fly, either in a SAX handler or a KLACKS source
- Do it after the fact in the object model or application

The DTD-based thing is implemented as a SAX handler (first approach),
see cxml/xml/space-normalizer.lisp

XSLT-style normalization is available in Xuriella XSLT, implemented
using STP; see the function STRIP-STYLESHEET in xuriella/space.lisp.

Note that both implementation types I listed above are done entirely in
user code. You don't need to change cxml to implement yet another
variety of whitespace stripping.

Just copy&paste the code and change it to suit your needs -- or rewrite
it. STRIP-STYLESHEET is a total of 23 lines of code long, I think.