Greetings,
I noticed a post by David Lichteblau on c.l.l where it was mentioned that an XPath impl for CXML is on a todo list.
This is something that I've had a run at, although my end point turned out to be far from the XPath standard. Rather than implementing XPath itself, I implemented a "lispy" element set addressing scheme. By lispy, I mean that rather than addressing nodesets using strings, a declarative DSL is used. Here's an example that yields the title element in an XHTML document:
(xmatch :root "html" "head" "title")
Here's one that yields all of the TD elements in an XHTML document that have a colspan attribute:
(xmatch :root (:desc "td" (:@ "colspan")))
And here's one that yields the second TD element in each row of a table, only if it has a colspan of '1':
(xmatch "tr" (:child "td" (:@value "colspan" "1")))
The first two examples are context-free, thanks to the :root directive; the third would need to be run on a table element (or list thereof).
If it isn't already clear, xmatch is a macro that builds a closure that, when provided with an element, document, or list as context, returns the node-set that matches the XPath-esque definition. This makes it very similar in style and usage to cl-ppcre:create-scanner.
It supports a subset of XPath predicates (such as :@, :@value, :index, and a few others) that I have needed in my application so far, but is by no means complete.
I'm passing this information on, not because I have code I'm ready to contribute at the moment (although that could be arranged given some time), but because I think this approach (while not standards- compliant) is superior to any potential "direct" XPath implementation for CXML. Perhaps CXML could grow something like xmatch; in addition to it being used directly, a "proper" XPath implementation could be built on top of an xmatch-like facility.
I don't want to belabor the point, but this approach is far more flexible, allows for a much richer set of predicates (and custom ones, at that), and doesn't confine the match definition to a flat string -- sexps are good here, for the same reasons why they are good elsewhere. For example, I have a couple of other macros that generate xmatch definitions themselves; XPath strings *can* be generated dynamically, but that is a Dark Path (at least by my standards).
I hope I'm not suggesting anything patently obvious -- I'm functionally new to Lisp (again, this being my second tour of duty, after a long hiatus), so this may all be elementary to others.
Thanks for your time,
Chas Emerick Founder, Snowtide Informatics Systems Enterprise-class PDF content extraction
cemerick@snowtide.com http://snowtide.com