Hello there.
thanks a lot for all the work. I have read through the messages and I agree with your analysis with a few comments.
I think all the changes you propose are worth including.
Let me comment on a few of these.
On Jan 16, 2010, at 06:43 , Pixel // pinterface wrote:
Greetings! I was looking into improving the efficiency of MATCH- CASE and while working on that I've ended up with a few questions regarding some implementation decisions, and drafted up some patches both to help you see my thoughts and for consideration of inclusion if you think they're in the right direction.
Patches are numbered in the order created, and generally depend on lower-numbered patches (not always intentionally--it's hard to avoid inter-patch dependencies when making multiple changes to a small area). I thought I made the patches in a reasonable order while I was making them, but I find myself referring to them in a different sequence than what I made them in, so apologies if that causes any confusion.
I've also put up a darcs mirror (in darcs2 format) of the CVS repository plus the attached patches, in case you find that more convenient. You can find it at http://repo.kepibu.org/cl-unification/
** Duplicate code
There's a fair bit of code duplication between MATCH, MATCHF, and MATCHING in the form of GENERATE-VAR-BINDINGS and the production of the let forms that surround FORMS in those macros. (Nevermind that MATCH and MATCHF are identical except in what they do with template.)
Yep. Most of the code in the MATCH-BLOCK file is cut'n'paste.
Patches 0 and 2 help somewhat.
- Extract template handling of MATCH[ING] into %TEMPLATE-FOR-MATCH
Should be fairly self-explanatory; not much of a win in lines.
- Extract the bits that wrap forms with bindings for template
variables This patch extracts the assorted flet GENERATE-VAR-BINDINGS into a single common function. It's slightly less straightforward than that, however, because it also extracts the (let ...) forms which actually used those bindings.
This results in a decent win for code size, but in some cases it swaps the order of execution of %TEMPLATE-FOR-MATCH and COLLECT-TEMPLATE-VARS. I'm pretty sure this doesn't have any noticable effect, but thorough testing is probably wise.
This is good. However, I would set up a bunch of regression tests before moving on and change things. I usually use the test suite from Franz, but I'd be willing to change to another test suite.
** (unify* ...) and (ignore-errors (unify ...))
The MATCHING macro produces the form (ignore-errors (unify ...)), presumably because it existed before the convenience function UNIFY*. But I'm curious about UNIFY* and/or the ignore-errors form in MATCHING. Is ignore- errors the proper form in either of these places, or was it just easier to type than (handler-case (unify ...) (unification-failure () nil)) ?
Most implementations expand IGNORE-ERRORS in something like that. But I agree that (as you point out later on) that UNIFY* should be written as
(defmethod unify* (a b &optional (env (make-empty-environment))) (handler-case (unify a b env) (unification-failure () nil))
In this way other conditions are passed on.
Looking at some of the assorted UNIFY methods, I see UNIFICATION- FAILUREs aren't the only possible condition which might be thrown during unification (e.g., LAMBDA-LIST-PARSING-ERROR is also a possibility). However, while MATCHING treats any error as a failure to unify, MATCH, MATCHF, MATCH-CASE, and MATCHF-CASE all limit themselves to only UNIFICATION-FAILUREs. This inconsistency leads me to wonder which is correct.
I assume only UNIFICATION-FAILUREs should be treated as such in patch 5 because all ERRORs seems a bit heavy-handed in a language with such an excellent condition system, but can see the argument that any error / is/ a failure to unify.
- Use (unify* ...) rather than (ignore-errors (unify ...))
Same thing, so might as well use the convenience function.
Not particularly exciting. And this bit is changed again in patch 5 to ignore only UNIFICATION-FAILUREs.
Yep. See above.
- Make MATCHING agree with MATCH[F][-CASE] about the conditions of
failure Rather than skipping to the next clause on any error, UNIFICATION-FAILUREs--and /only/ UNIFICATION-FAILUREs--skip to the next clause.
Under the assumption that UNIFY* is implemented as intended, this creates a new utility function UNIFY** (terrible name, I know!) which is like UNIFY* but ignores /only/ UNIFICATION-FAILUREs rather than all ERRORs.
This of course results in a library-user-visible change to how MATCHING operates, which may or may not be considered acceptable.
Yep.
** (matching (otherwise ...))
- Fix (matching (otherwise ...))
(matching (otherwise ...)) expands into (cond (otherwise ...)), which generates an unbound-variable error when executed, because COND does not special-case OTHERWISE as CASE does.
Pretty self-explanatory and not very interesting.
Good catch.
** &body vs. &rest
- Use &body instead of &rest for (arguably) prettier auto-indentation
I prefer the smaller indentation provided with using &body instead of &rest. Not a big deal either way, but I was in there so I figured I'd throw it in the mix. Fortunately, if you disagree, this patch shouldn't affect any of the others. :)
I agree. It is better self documentation as well.
** Multiple #:UNIFICATION-ENVs in MATCHING
- Only use one variable to store the unification environment in
MATCHING Because of the way MATCHING expands, and what UNIFY** returns, each (setf #:env (unify** ...)) call will do one of two things: it will set #:env to NIL or it will set #:env to an ENVIRONMENT structure.
If #:env is set to NIL--the same value it entered the (setf) with!--the COND will continue on to the next clause.
If #:env is set to an ENVIRONMENT structure, none of the remaining (setf) clauses will be evaluated.
Thus, because the variable will only ever be set to a non-nil value once, this is perfectly safe.
I think this is ok as well.
** MATCH-CASE
MATCH-CASE has the surprising behavior of (ignore-errors (match-case ("foo") ("foo" (error 'unification-failure)) (t :default))) => :default Hardly CASE-like behavior!
I think it sensible to redefine MATCH-CASE in terms of MATCHING, rather than MATCH. E.g., so the above would expand into something like (ignore-errors (let ((#:object-var "foo")) (matching (:errorp nil :default-substitution nil :matching- named nil) (("foo" #:object-var) (error 'unification-failure)) (t :default)))) which greatly simplifies the definition of MATCH-CASE and makes it much more CASE-like in behavior.
- Redefine MATCH-CASE in terms of MATCHING
This both greatly simplifies the MATCH-CASE macro as well as its expansion.
HOWEVER, this version is *NOT* 100% compatible with the previous version. Specifically, UNIFICATION-FAILUREs signalled from within clause-forms will /not/ cause the next unification clause to be attempted, but will instead propogate outward as the -case name suggests they should.
That is, (ignore-errors (match-case ("foo") ("foo" (error 'unification-failure ...)) (t :default))) => :default ;; before patch => nil, #<unification-failure> ;; after patch
Patch 7 suggests that MATCHING, MATCH-CASE, MATCHF-CASE, and the possible future MATCH-COND and MATCHF-COND are all fairly trivial variations on each other. I'll be happy to explore that possibility in future patches if it's of interest, but I'm already asking you to look at too many patches at once, so I'll refrain for now. :)
I agree with this as well. The behavior you suggest is the one that's needed.
** MATCH
Though documented, MATCH's behavior when its body forms signal UNIFICATION-FAILURE seems at odds with expectations, for pretty much the same reason the behavior seems nonsensical in MATCH-CASE. Was it a deliberate design decision, or a side-effect of how MATCH was implemented?
Never attribute to malice what can be attributed to incompetence. Of course it is a side-effect of the implementation :) I knew I needed a test suite :)
If a side effect of implementation, would it make sense to redefine MATCH to have an expansion more like (match ("foo" "foo" :errorp <e?> :error-value <ev>) <body>) => (m-v-b (#:env #:error) (unify** "foo" "foo") (cond ((and #:error <e?>) (error #:error)) (#:error <ev>) (t (let* (#| template vars |#) <body>)))) so UNIFICATION-FAILUREs inside <body> can be independent of errorp?
(No patch for this one yet.)
** :MATCH-NAMED, :MATCHING-NAMED, :MATCH-CASE-NAMED
It seems a bit redundant to have <macro-name>-named arguments for block names rather than simply having a :named argument as in, say, LOOP. Any particular reason it wasn't done that way?
Having the :NAMED option would be preferable, with NIL or the macro name as fall-back.
(No patch for this one, either.)
** Apologies
Sorry about the massive brain dump. I /may/ have gotten carried away. :)
Absolutely not. All of this is most welcome. I am just a little swamped (what an euphemism!) and maybe not all that reactive. But let's keep the discussion going.
all the best
Marco