[asdf-devel] An interesting ASDF conundrum

newer
[asdf-devel] cl-test-grid questions

Robert Goldman

6 Jan 2013 6 Jan '13

8:26 p.m.

This one I think is relevant to the proposed reorganization of ASDF to have changes in upstream systems trigger recompilation in systems that depend on them. I have made a crude component which is a MADE-FILE. The made-file is built by invoking make to produce some output file. MADE-FILE's compile-op OPERATION-DONE-P method always returns NIL, because ASDF does not understand what files the made file depends on -- that information is buried in the makefile. We don't want to duplicate that logic in ASDF. If we always invoke make, it will do the Right Thing and only rebuild the file when necessary. *Unfortunately*, any components in a system that have a :depends-on relationship to a MADE-FILE, will always be recompiled, whether necessary or not. If I remember correctly, the logic in traversal is that you do the child if its input files are fresher than the output file *or* if one of its dependencies was run. Now, with current ASDF, I can work around this for MADE-FILEs with stable APIs: I just shove them out into systems of their own, so they don't trigger constant recompilations of systems that depend on them. But this behavior will be changing with the new ASDF. I don't *believe* the existing controls on :force will solve this problem, either. We can't control this by simply saying "recompile everything if any upstream system changes" or "don't recompile anything based on upstream system changes" (although the latter will restore the current buggy behavior). I can think of two solutions: 1. KLUDGY -- allow individual system definitions (like my MADE-FILE wrapper systems) to specify that they do not propagate recompilation forcing onto their dependents. 2. BETTER, but bigger effect on the protocol: Separate the question of whether an operation is performed on a dependency from whether that dependency propagates change downstream. We could do this by adding a function that is run to test whether an <operation,component> pair propagates forcing onto its dependents. By default it would be the same as "did the operation get performed?" as now, but it could be overridden by component classes like MADE-FILE where, e.g., I could say "did the file-write-date of the OUTPUT-FILES change?" Does this make sense? cheers, r

Show replies by date

Faré

6 Jan 6 Jan

8:44 p.m.

On Sun, Jan 6, 2013 at 3:26 PM, Robert Goldman <rpgoldman@sift.info> wrote:

...

This one I think is relevant to the proposed reorganization of ASDF to have changes in upstream systems trigger recompilation in systems that depend on them.

I have made a crude component which is a MADE-FILE. The made-file is built by invoking make to produce some output file.

MADE-FILE's compile-op OPERATION-DONE-P method always returns NIL, because ASDF does not understand what files the made file depends on -- that information is buried in the makefile. We don't want to duplicate that logic in ASDF. If we always invoke make, it will do the Right Thing and only rebuild the file when necessary.

*Unfortunately*, any components in a system that have a :depends-on relationship to a MADE-FILE, will always be recompiled, whether necessary or not. If I remember correctly, the logic in traversal is that you do the child if its input files are fresher than the output file *or* if one of its dependencies was run.

Now, with current ASDF, I can work around this for MADE-FILEs with stable APIs: I just shove them out into systems of their own, so they don't trigger constant recompilations of systems that depend on them.

But this behavior will be changing with the new ASDF.

I don't *believe* the existing controls on :force will solve this problem, either. We can't control this by simply saying "recompile everything if any upstream system changes" or "don't recompile anything based on upstream system changes" (although the latter will restore the current buggy behavior).

I can think of two solutions:

1. KLUDGY -- allow individual system definitions (like my MADE-FILE wrapper systems) to specify that they do not propagate recompilation forcing onto their dependents.

2. BETTER, but bigger effect on the protocol: Separate the question of whether an operation is performed on a dependency from whether that dependency propagates change downstream. We could do this by adding a function that is run to test whether an <operation,component> pair propagates forcing onto its dependents. By default it would be the same as "did the operation get performed?" as now, but it could be overridden by component classes like MADE-FILE where, e.g., I could say "did the file-write-date of the OUTPUT-FILES change?"

Does this make sense?

cheers, r

OK, so since it's already working, I assume it's respecting the constraints of ASDF, which is that planning happens before any Lisp computation. I see two cases: 1- The computation that creates your MADE-FILE doesn't depend on any Lisp computation. Then you could either 1a- systematically perform this Make computation BEFORE the Lisp build, and have OPERATION-DONE-P return T, or 1b- (that's ugly) have this computation does during OPERATION-DONE-P itself, and return T or NIL depending on whether things have changed. Actually, I don't recommend that at all: since OPERATION-DONE-P is called *after* OUTPUT-FILES timestamps have been checked, this would be bad. But maybe that's a bug in ASDF, and we should run OPERATION-DONE-P first (unless we are JUST-DONE). 2- The computation that creates your MADE-FILE *does* depend on some previous Lisp computation. Then the only solution is to have this computation be a deterministic product of files that you can all declare as COMPONENTs of your system as STATIC-FILEs if need be, and/or have its INPUT-FILES return (a superset of) the set files from which it will be computed. Then OPERATION-DONE-P can just return T. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Any sufficiently advanced misquotation is indistinguishable from an original statement. — John McCarthy, misquoted

Robert Goldman

9:03 p.m.

On 1/6/13 Jan 6 -2:44 PM, Faré wrote:

...

On Sun, Jan 6, 2013 at 3:26 PM, Robert Goldman <rpgoldman@sift.info> wrote:

...
This one I think is relevant to the proposed reorganization of ASDF to have changes in upstream systems trigger recompilation in systems that depend on them.

I have made a crude component which is a MADE-FILE. The made-file is built by invoking make to produce some output file.

[...snip...] OK, so since it's already working, I assume it's respecting the constraints of ASDF, which is that planning happens before any Lisp computation.

I see two cases: 1- The computation that creates your MADE-FILE doesn't depend on any Lisp computation. Then you could either 1a- systematically perform this Make computation BEFORE the Lisp build, and have OPERATION-DONE-P return T, or 1b- (that's ugly) have this computation does during OPERATION-DONE-P itself, and return T or NIL depending on whether things have changed. Actually, I don't recommend that at all: since OPERATION-DONE-P is called *after* OUTPUT-FILES timestamps have been checked, this would be bad. But maybe that's a bug in ASDF, and we should run OPERATION-DONE-P first (unless we are JUST-DONE). 2- The computation that creates your MADE-FILE *does* depend on some previous Lisp computation. Then the only solution is to have this computation be a deterministic product of files that you can all declare as COMPONENTs of your system as STATIC-FILEs if need be, and/or have its INPUT-FILES return (a superset of) the set files from which it will be computed. Then OPERATION-DONE-P can just return T.

I'm not entirely sure I understand your point 2. If the makefile is correct, it will have all of the dependencies, and these can be computed by examining the file system. Assuming that we keep the current technique of always running make and allowing IT to determine whether the made-file needs rebuilding, the only thing we need to do is to ensure that make is not run too early. That is, we must ensure that all the necessary lisp outputs are written into the filesystem before "make" is invoked. Hm. I suppose that does rule out your 1a as a technique, assuming we'd like to develop a MADE-FILE class that can be used correctly whether or not it depends on lisp computations. WRT your 1b, why would it be bad to do this because it's called after OUTPUT-FILES timestamps have been checked? Since we don't have the INPUT-FILES for the MADE-FILE, we can't use the INPUT-FILES/OUTPUT-FILES relationship to determine whether or not the operation needs doing, right? I was thinking that your 1b could be implemented as (defmethod operation-done-p ((op compile-op) (c made-file)) (assert (= 1 (length (output-files op c)))) (let* ((output-file (first (output-files c))) (old-write-date (file-write-date output-file))) <invoke make> (= (file-write-date output-file) old-write-date))) This would be coupled (here's the icky part) with a PERFORM method that does nothing, because by the time it's invoked the make will already have run. Cheers, r

Faré

9:19 p.m.

...

...
I see two cases: 1- The computation that creates your MADE-FILE doesn't depend on any Lisp computation. Then you could either 1a- systematically perform this Make computation BEFORE the Lisp build, and have OPERATION-DONE-P return T, or 1b- (that's ugly) have this computation does during OPERATION-DONE-P itself, and return T or NIL depending on whether things have changed. Actually, I don't recommend that at all: since OPERATION-DONE-P is called *after* OUTPUT-FILES timestamps have been checked, this would be bad. But maybe that's a bug in ASDF, and we should run OPERATION-DONE-P first (unless we are JUST-DONE). 2- The computation that creates your MADE-FILE *does* depend on some previous Lisp computation. Then the only solution is to have this computation be a deterministic product of files that you can all declare as COMPONENTs of your system as STATIC-FILEs if need be, and/or have its INPUT-FILES return (a superset of) the set files from which it will be computed. Then OPERATION-DONE-P can just return T.

I'm not entirely sure I understand your point 2. If the makefile is correct, it will have all of the dependencies, and these can be computed by examining the file system.

I'm not sure what you mean by "examining the file system" unless you put all files on a VESTA-like NFS server to intercept all accesses. But point 2 is just "use ASDF as designed", i.e. declare your inputs (or a superset thereof) and your outputs.

...

Assuming that we keep the current technique of always running make and allowing IT to determine whether the made-file needs rebuilding, the only thing we need to do is to ensure that make is not run too early. That is, we must ensure that all the necessary lisp outputs are written into the filesystem before "make" is invoked.

I would recommend running anything that doesn't depend on any Lisp output separately as a step before your build. And as for things that use Lisp output, hopefully be able to list a superset of its inputs. Yet another way would be to have your build happen in several steps, so there's no interleaving of Make and ASDF, but rather one then the other.

...

WRT your 1b, why would it be bad to do this because it's called after OUTPUT-FILES timestamps have been checked? Since we don't have the INPUT-FILES for the MADE-FILE, we can't use the INPUT-FILES/OUTPUT-FILES relationship to determine whether or not the operation needs doing, right?

If the timestamps don't reflect reality, then the results of compute-action-stamp will be plain wrong. For more extensibility, I suppose I could allow OPERATION-DONE-P to return two values, a DONE-P flag and a STAMP. If that would help, I better implement it before we release 2.27.

...

This would be coupled (here's the icky part) with a PERFORM method that does nothing, because by the time it's invoked the make will already have run.

That's not the icky part. The icky part is OPERATION-DONE-P lying to ASDF. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org Classical liberalism is not an economic doctrine. It is a theory of Law.

Robert Goldman

10:24 p.m.

...

...
This would be coupled (here's the icky part) with a PERFORM method that does nothing, because by the time it's invoked the make will already have run.

That's not the icky part. The icky part is OPERATION-DONE-P lying to ASDF.

I have been mulling this over, and here's what is interesting to me about this case: The ASDF protocol has OPERATION-DONE-P, which is used as NOT-OPERATION-NEEDS-DOING-P. I.e., if it returns a non-NIL value, when applied to operation O and component C, we may skip doing O on C. For my MADE-FILE, I cannot[*] compute the answer to this question: all I can do is compute a conservative bound on OPERATION-NEEDS-DOING-P. In fact, all I can compute is a degenerate bound: I assume that OPERATION-NEEDS-DOING-P is always true. Now, let's assume we have another operation, O', and component C', and we have to do O'(C') if O(C) did something. The problem here arises because ASDF assumes DID-SOMETHING-P(O,C) == OPERATION-NEEDS-DOING-P(O,C). That is, ASDF assumes that if it BELIEVED it needed to do O to C, then something will have changed AFTER it does O to C. In the case of a MADE-FILE (and some other possible ASDF components?), this is not the case. Since we are conservative about whether or not O(C) needs doing, sometimes we are wrong and after we apply O to C, nothing has changed. Furthermore, for MADE-FILE, it is possible to examine the filesystem *AFTER* O(C) and determine a *better answer* to the DID-SOMETHING-P query than we can for OPERATION-NEEDS-DOING. Unfortunately, there's no room in the ASDF protocol to do this. But it would not be impossible to modify the protocol to make it happen, and this could be done in a backwards-compatible way, by having OEPRATION-DID-SOMETHING return the value of OPERATION-DONE-P before the operation by default... I'm not sure that this is worth doing: it would depend on whether there are enough interesting cases where OPERATION-DONE-P (approximate!) and OPERATION-DID-SOMETHING diverge. cheers, r

Faré

11:30 p.m.

Unhappily, ASDF as it stands follows a "plan then perform" model that does not allow for interleaving dependency detection and computation. The best it can provide is to have two stages, with a second-stage system that :defsystem-depends-on the first-stage system for each "step" that requires such interleaving, with the loading of the first stage causing all the effects needed for the second stage. Unwieldy, but that's all you can do with ASDF. NB: in the new ASDF 2.27, OPERATION-DONE-P will be a boolean that complements the builtin stamp computations. Make it T normally, NIL if there's a reason *besides* timestamps to invalidate the previous computation. —♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org It may be bad manners to talk with your mouth full, but it isn't too good either if you speak when your head is empty. On Sun, Jan 6, 2013 at 5:24 PM, Robert Goldman <rpgoldman@sift.info> wrote:

...

...
...
This would be coupled (here's the icky part) with a PERFORM method that does nothing, because by the time it's invoked the make will already have run.

That's not the icky part. The icky part is OPERATION-DONE-P lying to ASDF.

I have been mulling this over, and here's what is interesting to me about this case:

The ASDF protocol has OPERATION-DONE-P, which is used as NOT-OPERATION-NEEDS-DOING-P. I.e., if it returns a non-NIL value, when applied to operation O and component C, we may skip doing O on C.

For my MADE-FILE, I cannot[*] compute the answer to this question: all I can do is compute a conservative bound on OPERATION-NEEDS-DOING-P. In fact, all I can compute is a degenerate bound: I assume that OPERATION-NEEDS-DOING-P is always true.

Now, let's assume we have another operation, O', and component C', and we have to do O'(C') if O(C) did something.

The problem here arises because ASDF assumes DID-SOMETHING-P(O,C) == OPERATION-NEEDS-DOING-P(O,C). That is, ASDF assumes that if it BELIEVED it needed to do O to C, then something will have changed AFTER it does O to C.

In the case of a MADE-FILE (and some other possible ASDF components?), this is not the case. Since we are conservative about whether or not O(C) needs doing, sometimes we are wrong and after we apply O to C, nothing has changed.

Furthermore, for MADE-FILE, it is possible to examine the filesystem *AFTER* O(C) and determine a *better answer* to the DID-SOMETHING-P query than we can for OPERATION-NEEDS-DOING.

Unfortunately, there's no room in the ASDF protocol to do this. But it would not be impossible to modify the protocol to make it happen, and this could be done in a backwards-compatible way, by having OEPRATION-DID-SOMETHING return the value of OPERATION-DONE-P before the operation by default...

I'm not sure that this is worth doing: it would depend on whether there are enough interesting cases where OPERATION-DONE-P (approximate!) and OPERATION-DID-SOMETHING diverge.

cheers, r

Robert Goldman

7 Jan 7 Jan

1:09 a.m.

Faré reminds me that I was confused about the protocol, and that the two checks I was referring to are BOTH done in the pre-planning phase. This strictly limits what we can do in an external "solver" like make. I will see if I can pull together a version of MADE-FILE that is tidy enough for sharing, in the hopes it will provide an ASDF-contrib and a test case. cheers, r

4587

Age (days ago)

4588

Last active (days ago)

List overview

Download

6 comments

2 participants

participants (2)

Faré
Robert Goldman