Allrighty fine, I see how it is. ;-)
These patches deprecate characters.lisp in favor of the xml-name-rune-p.lisp. I pulled over #'VALID-NAME-P and #'VALID-NMTOKEN-P that were the new functions in characters.lisp, and these functions now use the ones provided by xml-name-rune-p.lisp. http://www.unwashedmeme.com/cxml/asd-remove-chars.diff (diff to cxml.asd) http://www.unwashedmeme.com/cxml/characters-merge.diff (against xml/xml-name-rune-p.lisp) And characters.lisp can be deleted now.
Before doing any of the changes I made sure that #'RUNE-NAME-CHAR-P and #'NAME-RUNE-P functions returned #'EQ results for every character between 0 and +max+.
I also changed the compile-time behavior of xml-name-rune-p.lisp, but it should be run-time equivalent(inlined bit-vector lookup).
At compile time the code now looks a little bit more like the code in characters.lisp. The special character ranges are now vectors that are separate from the code (the binary search) that examine them, rather than ORing everything together.
At compile time it will then evaluate the predicates over every possible value and save the result in a bitvector (this is how it was working before) and garbage collection should be able to reclaim the character-range vectors. This change resulted in the compilation on my machine dropping from ~8.9s to about ~.25s.
Changes that might be significant:
PREDICATE-TO-BV - (dotimes (i #x10000 r) + (dotimes (i +max+ r) Where +max+ = #xD800. The vector only goes up to max, so there is no point trying any characters above that right? The predicates that use the bit-vector fail for anything above max too.
NAME-RUNE-P and NAME-START-RUNE-P (the public inlined functions) - (DEFINLINE NAME-RUNE-P (RUNE) - (SETF RUNE (RUNE-CODE RUNE)) - (AND (<= 0 RUNE ,+max+) - (LOCALLY (DECLARE (OPTIMIZE (SAFETY 0) (SPEED 3))) - (= 1 (SBIT ',(predicate-to-bv #'name-rune-p) - (THE FIXNUM RUNE)))))) + (DEFINLINE NAME-RUNE-P (RUNE) + (SETF RUNE (RUNE-CODE RUNE)) + (LOCALLY (DECLARE (OPTIMIZE (SAFETY 0) (SPEED 3)) + (type fixnum rune)) + (AND (<= 0 RUNE ,+max+) + (= 1 (SBIT ',(predicate-to-bv #'name-rune-p) + RUNE))))) I moved the locally declarations up by a line to include the <= check and declare the type of the rune a little bit earlier (so that <= might be able to take advantage of this). Does anyone know of problems this will cause?
TEST RESULTS: Xmlconf:run-all-tests 0/1829 tests failed; 333 tests were skipped Domtest:run-all-tests 0/763 tests failed; 43 tests were skipped
Timing the xmlconf tests there wasn't really any significant change in speed or memory. Profiling it showed a bit of a drop in the memory usage, but not by much. For the xmlconf tests, these functions are called a decent amount, but don't contribute all that much to the runtime. These weren't terribly exact tests, but a part of the point here was to try and clean-up and remove some of the duped code. http://www.unwashedmeme.com/cxml/characters-merge-profile.results
Heh, at least compilation is faster. Whad'ya think?
Nathan
(My apologies if you got this twice, it looks like it didn't send the first time.)
-----Original Message----- From: David Lichteblau [mailto:david@lichteblau.com] Sent: Tuesday, June 13, 2006 2:02 PM To: Nathan Bird Cc: cxml-devel@common-lisp.net Subject: Re: [cxml-devel] characters.lisp improvements
However, I have to admit that my characters.lisp duplicates work already done by Gilbert. Gilbert's functions are in xml-name-rune-p.lisp and are using inline functions accessing bitvectors. Embarrassingly, I didn't notice the latter file before writing characters.lisp and then just stuck a comment into the file instead of fixing it right away. My apologies for that -- and sorry for writing that comment in german. :-(
That's why we have google translate... to get bad translations of offhand comments :-)