Faré wrote:
On Thu, Jan 2, 2014 at 11:44 PM, Robert P. Goldman rpgoldman@sift.info wrote:
While there are bug fixes waiting to reach our users, I'm quite concerned by the loss of backwards compatibility in systems that defined their own OPERATION subclasses.
This backward incompatibility already happened a year ago: it was consubstantial with the refactoring that begat ASDF3, and necessary to fix the broken dependency model of ASDF2. NO, most operation MUST NOT propagate downward, or even sideways; indeed some like prepare-op must propagate upward instead. The downward and sideways propagations were baked into TRAVERSE; that was one of the deep conceptual bugs in the ASDF model. Now they are configurable via the (ill-named) COMPONENT-DEPENDS-ON; and the previous behavior is mere inheritance specification away, with the DOWNWARD-OPERATION and SIDEWAYS-OPERATION mixins.
I get this, and the recoding is a HUGE improvement. Furthermore, you have convinced me that we can't unwind this change in any way that won't create yet more damage.
However: a. This change is not backward-compatible b. It's not announced to maintainers of ASDF systems in a way that gets it done and c. The bugs it gives rise to are subtle and confusing. There's virtually no way a person whose system suddenly stops performing the expected operations is going to say "I bet I need to change the superclasses of my specialized OPERATION subclasses."
Unfortunately, because we reused the old name, OPERATION, it is *very* difficult to trap this in a helpful way. Ideally, I would suggest doing something like raising some sort of signal when the user makes a new OPERATION subclass to suggest that s/he might want DOWNWARD- or UPWARD-OPERATION instead. But I don't see that alley as open to us.
So what can we do for the poor programmer whose system suddenly starts to perform only a subset of the behaviors it did in the past? Expecting that programmer to grovel over the ASDF source and figure out what went wrong is too much.
I'm relatively familiar with ASDF's guts, and an update to Allegro (getting ASDF 3 in place of 2) still cost my company about four programmer days, and was only resolved because I thought to IM you about my problems.
(The others deep conceptual bugs in the ASDF model were the lack of transitive timestamp checking, and the mess of DEPENDS-ON vs DO-FIRST dependencies. These were loosely related deep bugs. Then there were shallower bugs like the IF-COMPONENT-FAILS horror, the inconsistency between system :depends-on and other :depends-on, and probably more small bugs I can't remember.)
As far as I can tell, *all* such systems will break, since the old solution was to subclass OPERATION and the new solution is to subclass DOWNWARD-OPERATION to achieve the same results. *ALL* programmers' locally-defined operation subclasses are now broken by this change.
We broke those that depended on downward and/or sideways propagation (don't forget sideways-operation). But we *FIXED* all those who didn't want it, and had to deal with it anyway!
I get it, but breaking bug-compatibility is still breaking compatibility. For people whose code created new OPERATION subclasses, and whose code *worked*, we have broken their code, and done so in a way that quietly does garbage-in-garbage-out, instead of indicating an error.
[...snip...]
Did you see the previous email where I audited the damage (and the fixes)? http://thread.gmane.org/gmane.lisp.asdf.devel/3581/focus=3582 Only 4-5 victims (depending on how you count), I fixed one (dependency-op), two (clean-op and revert-op) were never really working, and one (parenscript-compile-op) indeed needs love, but the fix is trivial, and no one complained so far.
You only audited publicly available libraries. Applications, like ours, which perform tasks of no interest to the general world, but critical to us, are invisible to you.
At any rate, an upgrade to the build system that requires the build system's author to review all the available community libraries -- not to mention libraries that are not shared with the community! -- is simply not an approach that will scale.
We need a patch to ASDF that will indicate to programmers in a clear way that the behavior of ASDF has changed, and give them a hint about how to fix this change.
Ideally, we would be able to fix this socially, rather than with code: we would simply shout off the rooftops that the OPERATION classes had changed, indicate the needed revisions, and all would be well.
Unfortunately, we do not have the capability to shout off the rooftops in this way: ASDF quietly slips out into the community mediated by the implementation suppliers.
I suppose one could introduce an ASDF banner that would print once, as a means of shouting off the rooftops.
I'm open to alternative suggestions; this one does not seem excellent.
One possibility might be adopting Allegro's style of incompatible update. When they changed the behavior of the reader for consecutive reader macros being able to make multiple s-expressions skippable (like #-allegro #-allegro :foo :bar), they introduced a new variable, something like
EXCL:*READER-MACRO-COMPATIBILITY*
If this had a value like :OLD, it would raise a continuable error when it encountered one of these consecutive reader, explaining the change in behavior. Once you knew about the behavior, you would set (or bind) this variable, and the warning would go away.
So people who made new OPERATION subclasses would get a warning until they introduced a form into their system definition files indicating that they knew about the change.
That would let us explain the situation to programmers, and shut up when the programmers understand the situation.
Meanwhile 5-10 extensions were fixed by this change, and 4-5 were unaffected (or, for 1-2 of them, fixed already).
A somewhat drastic solution would be to make the name OPERATION now denote DOWNWARD-OPERATION (which would remain as a canonical name), and rename the common superclass of DOWNWARD-OPERATION and UPWARD-OPERATION to something like ABSTRACT-OPERATION or COMMON-OPERATION.
No, no, no. That would be really bad. The real solution is to fix the handful of broken extensions.
The current refactoring is quite problematic, since it moves some of the previously-existing characteristics of OPERATION out and into a sub-class that no one has ever heard of before.
That's what ASDF 3 has been doing for a year, and if no one has complained about those handful of broken systems, it's probably easier to fix them than to rebreak all the MANY MORE extensions that were either fixed or unaffected, and would now need to be fixed instead. The minimal change is to keep fixing things, not to revert to ASDF2 brokenness.
I agree that it's easier to fix such libraries -- after all, it's typically just replacing
OPERATION
with
#-asdf3 OPERATION #+asdf DOWNWARD-OPERATION
To me the key issue (to reiterate) is how do we find the people who need to do this, and make it known to them that they need to make this change?
Unfortunately, the above solution is not ready for prime-time, either, since if we add COMMON-OPERATION, all programmers' methods that dispatch on OPERATION will break if used with PREPARE-OP. On the one hand, that's probably not a big deal, since no one will have been customizing UPWARD-OPERATIONs, since they haven't existed. On the other hand, programmers who want to write extensions that really are generic to *all* types of operation (e.g., EXPLAIN type methods) would be broken by this proposed repair.
Yes, many extensions rely on OPERATION being the top of the hierarchy, and you don't want to break all of them. That includes POIU, ITA's now published QUUX through its qres-build system, and at least 6 quicklisp systems that I can easily find grepping through ~/quicklisp/dist/quicklisp/software/**/*asd*. Several of these defmethods are probably obsolete, since most extensions shouldn't specialize operation-done-p anymore.
Really, the current ASDF3 architecture is much improved over ASDF2, though indeed it there are still active issues.
(This reminds me how deferred warnings broke 50-odd systems in quicklisp, out of which only about 25 were fixed, and 25 had unresponsive authors, even a year afterwards. In the end, I had the deferred warnings disabled by default. Good luck if you want to enable the feature at long last, either by getting everything fixed, or allowing out-of-band disabling of the warnings.)
This problem also exposes a HUGE hole in our regression-testing methods: we have nothing that tests extensions to the ASDF protocol.
I disagree. ASDF is defined in an incremental way, and all the code in ASDF itself is "extensions" to the protocol as defined by previous pieces. Consider asdf-bundle and concatenate-source, if nothing else. It is a testament to the overall good design of the original ASDF and its CLOS based architecture that the code is so clean and small, compared to other code that does equivalent things much worse (have you looked at mk-defsystem? Ugh! And let's not discuss some horrors in C or Java). I salute Dan Barlow, who did a lot of experimentation, and whose bigger success overshadows his smaller failures.
[...snip...]
Now, we can always add more tests to the ASDF test suite.
Yes, this is one of the things I would most like to see happen.
[...snip...]
Finally, as the responsible party now, I'm not comfortable sending out another release until I have come to understand the new protocol better than I do now. Indeed, it was culpably negligent of me to release the last couple of versions, and I apologize for doing so.
I understand your concern. On the other hand, consider that
- This particular change has been here for a year with no bug report, despite indeed breaking a handful of extensions that no one uses. Only one known useful piece of code remains broken.
At least for us, the reason that there has been no bug report is that Allegro pushed an intermediate version of ASDF that had a broken EXCL:RUN-SHELL-COMMAND. When we figured this out, we ripped out that patch, so all of our production code has been running on ASDF2 until this past week.
I don't think silence in this case can be taken to indicate the absence of bugs. There can be a huge lag between ASDF releases and ASDF penetration into the user community.
- Your standard should probably not be perfection, but improvement and non-regression, and I believe the current release candidate meets it. The failures that we are experiencing are either news tests, or new platforms that were previously untested or non-supported; meanwhile plenty of bugs have been fixed, with many tests added, and new functionality is at hand.
I'm willing to see less than perfection, but a key desideratum for me is "fail loudly and obviously," instead of "quietly and confusingly." I want the quiet and confusing OPERATION failures to be moved to being understandable before the next release.
While I see the advantages of getting bug fixes out there, I don't believe that we get that many opportunities to achieve uptake through the implementations (with the possible exception of SBCL). So I want the most improvement/release possible, and if there's something that seems critical to me, I'm disinclined to let it slip past a release.
- Throughout all the history of ASDF1 and ASDF2, all authors and maintainers including danb and including me obviously didn't have a clear understanding of the old protocol, since it was so fundamentally buggy. ASDF3 fixed the protocol. Your lack of understanding of the new protocol is not worse than the previous maintainer's lack of understanding of the old protocol.
Of course, you're in charge now, and may validly decide that it's a blocking issue.
There was at least a time when I understood the *actual* (as opposed to the desirable) protocol. This has slipped away. I'd like it to come back before I put my name behind another release.
I definitely welcome your offer of a walkthrough. I think that will be hugely helpful.
Thanks for all of your work, R