On Thu, Jan 2, 2014 at 11:44 PM, Robert P. Goldman rpgoldman@sift.info wrote:
While there are bug fixes waiting to reach our users, I'm quite concerned by the loss of backwards compatibility in systems that defined their own OPERATION subclasses.
This backward incompatibility already happened a year ago: it was consubstantial with the refactoring that begat ASDF3, and necessary to fix the broken dependency model of ASDF2. NO, most operation MUST NOT propagate downward, or even sideways; indeed some like prepare-op must propagate upward instead. The downward and sideways propagations were baked into TRAVERSE; that was one of the deep conceptual bugs in the ASDF model. Now they are configurable via the (ill-named) COMPONENT-DEPENDS-ON; and the previous behavior is mere inheritance specification away, with the DOWNWARD-OPERATION and SIDEWAYS-OPERATION mixins.
(The others deep conceptual bugs in the ASDF model were the lack of transitive timestamp checking, and the mess of DEPENDS-ON vs DO-FIRST dependencies. These were loosely related deep bugs. Then there were shallower bugs like the IF-COMPONENT-FAILS horror, the inconsistency between system :depends-on and other :depends-on, and probably more small bugs I can't remember.)
As far as I can tell, *all* such systems will break, since the old solution was to subclass OPERATION and the new solution is to subclass DOWNWARD-OPERATION to achieve the same results. *ALL* programmers' locally-defined operation subclasses are now broken by this change.
We broke those that depended on downward and/or sideways propagation (don't forget sideways-operation). But we *FIXED* all those who didn't want it, and had to deal with it anyway! And that includes the vast majority of the operations I audited from quicklisp (see below), but also asdf's very test-op, its many bundle operations, and of course the model-fixing prepare-op.
I have personally seen multiple systems that are broken by this change, and I would like to see some hard thought put into repairing this.
Did you see the previous email where I audited the damage (and the fixes)? http://thread.gmane.org/gmane.lisp.asdf.devel/3581/focus=3582 Only 4-5 victims (depending on how you count), I fixed one (dependency-op), two (clean-op and revert-op) were never really working, and one (parenscript-compile-op) indeed needs love, but the fix is trivial, and no one complained so far.
Meanwhile 5-10 extensions were fixed by this change, and 4-5 were unaffected (or, for 1-2 of them, fixed already).
A somewhat drastic solution would be to make the name OPERATION now denote DOWNWARD-OPERATION (which would remain as a canonical name), and rename the common superclass of DOWNWARD-OPERATION and UPWARD-OPERATION to something like ABSTRACT-OPERATION or COMMON-OPERATION.
No, no, no. That would be really bad. The real solution is to fix the handful of broken extensions.
The current refactoring is quite problematic, since it moves some of the previously-existing characteristics of OPERATION out and into a sub-class that no one has ever heard of before.
That's what ASDF 3 has been doing for a year, and if no one has complained about those handful of broken systems, it's probably easier to fix them than to rebreak all the MANY MORE extensions that were either fixed or unaffected, and would now need to be fixed instead. The minimal change is to keep fixing things, not to revert to ASDF2 brokenness.
Unfortunately, the above solution is not ready for prime-time, either, since if we add COMMON-OPERATION, all programmers' methods that dispatch on OPERATION will break if used with PREPARE-OP. On the one hand, that's probably not a big deal, since no one will have been customizing UPWARD-OPERATIONs, since they haven't existed. On the other hand, programmers who want to write extensions that really are generic to *all* types of operation (e.g., EXPLAIN type methods) would be broken by this proposed repair.
Yes, many extensions rely on OPERATION being the top of the hierarchy, and you don't want to break all of them. That includes POIU, ITA's now published QUUX through its qres-build system, and at least 6 quicklisp systems that I can easily find grepping through ~/quicklisp/dist/quicklisp/software/**/*asd*. Several of these defmethods are probably obsolete, since most extensions shouldn't specialize operation-done-p anymore.
Really, the current ASDF3 architecture is much improved over ASDF2, though indeed it there are still active issues.
(This reminds me how deferred warnings broke 50-odd systems in quicklisp, out of which only about 25 were fixed, and 25 had unresponsive authors, even a year afterwards. In the end, I had the deferred warnings disabled by default. Good luck if you want to enable the feature at long last, either by getting everything fixed, or allowing out-of-band disabling of the warnings.)
This problem also exposes a HUGE hole in our regression-testing methods: we have nothing that tests extensions to the ASDF protocol.
I disagree. ASDF is defined in an incremental way, and all the code in ASDF itself is "extensions" to the protocol as defined by previous pieces. Consider asdf-bundle and concatenate-source, if nothing else. It is a testament to the overall good design of the original ASDF and its CLOS based architecture that the code is so clean and small, compared to other code that does equivalent things much worse (have you looked at mk-defsystem? Ugh! And let's not discuss some horrors in C or Java). I salute Dan Barlow, who did a lot of experimentation, and whose bigger success overshadows his smaller failures.
Now, we can always add more tests to the ASDF test suite. Hopefully, we can also add tests to the software in quicklisp, and get cl-test-grid to run them somehow. Then there's the yak shaving of establishing some standard for CL testing that would let us have test reports and detect regressions at the individual test level depending on various environment parameters, instead of having a coarse grain of erroring an entire system. Oh, the dream of Consolidating CL Test Libraries!
I also find that OUTPUT-FILES may now be called more eagerly than before, perhaps because of the new PREPARE-OP. I have seen systems (perhaps not well-built systems....) that defined OUTPUT-FILES on an operation, O, in a way that assumed that earlier operations (COMPILE-OP and LOAD-OP) would be called before the OUTPUT-FILES method on O. That actually seems wrong, since OUTPUT-FILES is part of the plan-making part of the ASDF protocol, rather than the plan-execution part, but it *used to work* and given the extremely sketchy information available to users who wish to customize ASDF, such disruptions need to be very gently handled.
OUTPUT-FILES is called much more than before, a bit because of PREPARE-OP, but a lot because of COMPUTE-ACTION-STAMP. Moreover, particularly with BUNDLE-OP, OUTPUT-FILES can be expensive to compute. That's why for performance reasons, we now cache the results of OUTPUT-FILES. (Many thanks to stassats for pushing me into caring for performance, and making ASDF3 no significantly slower than ASDF2, sometimes faster, even though it is now doing much more work, more correctly).
Finally, as the responsible party now, I'm not comfortable sending out another release until I have come to understand the new protocol better than I do now. Indeed, it was culpably negligent of me to release the last couple of versions, and I apologize for doing so.
I understand your concern. On the other hand, consider that
* This particular change has been here for a year with no bug report, despite indeed breaking a handful of extensions that no one uses. Only one known useful piece of code remains broken.
* Your standard should probably not be perfection, but improvement and non-regression, and I believe the current release candidate meets it. The failures that we are experiencing are either news tests, or new platforms that were previously untested or non-supported; meanwhile plenty of bugs have been fixed, with many tests added, and new functionality is at hand.
* Throughout all the history of ASDF1 and ASDF2, all authors and maintainers including danb and including me obviously didn't have a clear understanding of the old protocol, since it was so fundamentally buggy. ASDF3 fixed the protocol. Your lack of understanding of the new protocol is not worse than the previous maintainer's lack of understanding of the old protocol.
Of course, you're in charge now, and may validly decide that it's a blocking issue.
I believe the ASDF sources are much more readable than they used to be; if you need explanations on any part of it, I'll gladly explain, add new comments or revise existing ones, or give you an interactive walkthrough. That might be a good thing to do, anyway, to pass the torch; and we could even record that event for any interested party. Walking through ASDF/DEFSYSTEM should take a couple of hours. less $(make defsystem-files)
Yours in procrastination,
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. — Alan Perlis