Peter Hildebrandt wrote:
[A great article! <g>]
Ken,
thanks a lot for all the insight. First things first, using c?n for the kids works like a charm. setf on a c? cell still produces an error, suggesting to initialize the cell with c-in.
OK, I thought that was only in debug mode, but I did not actually check.
Anyway, that's settled now.
As to the not-to-be issue,...
Ohhhh, it's in the not-to-be. Understood.
Meanwhile over here working on OpenAIR I ended up trying to read dead instances. Have not figured out why, but I have a guess. Still, this comes up so often I think we might want to make this less painful for developers.
... everything can be reproduced using test-gtk. I can share that in the cells cvs, just let me know whether it would be ok to restructure the cvs to match the old cells-gtk directory structure (I'd rather keep it the way it is in the old cells-gtk, so that I can commit it in there one day).
Sure, go for it.
I did some narrowing it down and going through the back trace. So let us look at an example:
root node 1 ---- observer 1 node 1.1 ---- observer 1.1 node 2 ---- observer 2
The basic code for the observer is
(defmodel family-observer (family) ;; we'll use the "value" slot for the observed ((row :reader row :initarg :row) (:default-initargs :kids (kids-list? (bwhen (val (^value)) (mapcar #'(lambda (src) (mk-observer self src)) (kids val)))))))) :row (c? (when-bind* ((parent (upper self)) (pos (position self (kids parent)))) (let ((new-row (tree-row-create (row parent) (id parent)))) (when (tree-row-valid new-row) (tree-row-set-path new-row (row parent) pos) new-row)))))
Where mk-observer is a method specializing on both parameters, so that we can have different kinds of observers on the same type of targets.
May I just observe at this juncture that I wish you had chosen a name other than observer for this class given that Cells already has the observer name in play. :)
The row is some gui object to be kept in sync.
Now we remove node1:
(with-integrity (:change 'tv-del-node) (setf (kids (upper node1)) (remove node1 (kids (upper node1)))))
Then first node2 and observer2 die, ok. Then node1 dies, and so does observer1.
The interesting part:
not-to-be :before on observer 1.1 is called -- and at this point observer 1.1 itself is already :eternal-rest, in other words, not-to-be is called on a dead object. Now not-to-be of the observer wishes to do something, so it accesses a (ruled) slot of the passed object:
(defmethod not-to-be :before ((self cells-tree-node)) (tree-row-destroy (row self))
The call to the accessor (row self) with the dead self triggers a bunch of cells calls:
Backtrace: 0: (CELLS::ENSURE-VALUE-IS-CURRENT NIL #<unused argument> #<unused argument>) 1: ((LABELS CELLS::CHECK-REVERSED) (NIL)) 2: (CELLS::ENSURE-VALUE-IS-CURRENT (NIL . <vld>)=492/OPTIMIZED-AWAY/ROW/DEAD!NODE-TREE-NODE3440] #<unused argument> #<unused argument>) 3: ((LAMBDA (CELLS::OPCODE CELLS::DEFER-INFO)) #<unused argument> #<unused argument>) 4: (CELLS::CALL-WITH-INTEGRITY NIL NIL #<CLOSURE (LAMBDA (CELLS::OPCODE CELLS::DEFER-INFO)) {BBBD3DD}>) 5: (CELLS::CELL-READ (NIL . <vld>)=492/OPTIMIZED-AWAY/ROW/DEAD!NODE-TREE-NODE3440]) 6: (CELLS::MD-SLOT-VALUE DEAD!NODE-TREE-NODE3440 CELLS-GTK::ROW)
The last call is in
(defun ensure-value-is-current (c debug-id ensurer) (declare (ignorable debug-id ensurer)) (count-it :ensure-value-is-current) (when (and (not (symbolp (c-model c)))(eq :eternal-rest (md-state (c-model c)))) (break "model ~a of cell ~a is dead" (c-model c) c))
.... in particular the form:
(c-model c)
which breaks with:
The value NIL is not of type CELL. [Condition of type TYPE-ERROR]
To sum up, I believe the problem is that at a change of the kids list
- First the kids are declared dead
- Then not-to-be is called recursively
and thus not-to-be is passed a dead self.
Yeah, I ran into this once in an observer.
However, I wish to do some cleanup work when kids are kicked out, and for this I need to access a few slots.
Yep.
What I'd like to have is
- not-to-be being called before the object is declared dead
or
- another method (last-will?) to be executed right before the kids die
or
- an interims state (:zombie?) in which cell slots are still
accessible with their last cached value
Or is the solution to have an observer on the kids slot instead of a not-to-be-method, which does the cleanup work for the kids?
I used to do that with kids, but not-to-be is a good place for it.
I might have just made a mistake declaring instances dead before not-to-be. And I am not even sure why I am so unpleasant about allowing access to slot-values of dead stuff. More on this below (you guessed right).
I was thinking while doing the dishes. I can image the observer class being interesting in its own right, and slots over there ending up dependent on
the same original kids-list in some way, as well of course as the value of
the observer. Propagation would then try to update this slot and when it got to the value of the observer find a dead instance. Any rule that got to the
value of the observer by accessing the list of observers (say something
iterating over them) would not encounter such an observer/value, but rules lower down that get at the value directly will.
This is an interesting idea, but I doubt that this is my problem here. It really seems to be the relation between ruled kids slots, declaring cells dead, and not-to-be.
Yeah, I did not know it was a not-to-be.
Now normally this is not a problem because such lower down rules would tend
not to depend also on the original list, and indeed the reason I have left this unaddressed is that in most cases I have seen a simple way to rewrite
my rules that was even better and which did not end up with these widespread
dependencies (if you have followed me so far on that, and if I hasten to add all this guesswork is on the money). I like to hold out for Real Problems
before whacking away at the code, I think that is a slippery slope.
That is surely true, as this case proves. I feel what we're looking for here is something like family-finalizers which are specified to be called everytime a kid dies.
btw, all that stuff in their that worries about dead instances is preemptive safeguard stuff -- I think if you disable that most things will
just work. The rules that are failing now will run harmlessly and in a few
cycles everything gets cleaned up anyway. Cells ran for /years/ with this happening to no ill effect (until RoboCup, of all things).
Yep, which is why it broke after introducing cells3 :-)
OTOH, I see why it is good to have these safe guards.
When this all happened I was days away from having to demo RoboCells at an ILC and I just did what I had to to deal with the many attempts to get to dead instances under Cells2. I think some consequent paranoia carried over to this code.
In 2001, Hal when challenged on his determination of a fault in some device suggests putting it back in and allowing it to fail. I suggest we do the same, but not use the opportunity to send an astronaut drifting off into space to eventually suffocate.
I hacked a solution today to the fm-other tree searches which were all over cells-gtk -- now we have with-widget and with-widget-value which do the right thing without kicking off tree searches (I introduced an automatically maintained hashtable of active instances hashing by md-name, like I did in cells-ode).
haha, almost every day now I think about implementing namespace search that way. :) It is kind of cool to have this automatic location-relative search that Just Works when we have repeating structures and lots of widgets with the same name, but so often a simple hash lookup would suffice.
Part of the magic of the existing scheme, btw, is that it does not matter that some widget might get awakened before the other widget it seeks has even been created. The navigation works by hitting kids slots, which get run JIT to spawn the thing being looked for.
When I saw that working without having been planned for I had a feeling I had stumbled onto something.
If you want to send me your whole project I will look to see how I would rewrite the rules if that is even possible, and if not take a look at solving this formally.
As I said, I like to try things out in test-gtk first, so that I can isolate the error (and create a nice demo on the way). I will work through cvs and commit it tomorrow, I hope. I just don't want to force you to have to deal with my whole project -- and all the other issues it has.
Oh, no need for a reproducible. Now that I know it was in not-to-be it is an understood issue.
fyi, in the past I have done silly things like having Cells just return nil
on slot-value access to dead cells, but we may want to find something more
elegant. :)
I have such code in my project, too:
(defun deadp (cell) (eql (slot-value cell 'cells::.md-state) :eternal-rest))
However, since I need the slot-value in my case, this does not help ;-)
Funny you should mention slot-value. :) That's what I used to get at the slots of dead instances in my not-to-be situation.
Well... I am hesitating. There /is/ an evil variant of corpse access to be blocked as an aid to the developer, a form of access wholly unjustifiable. I think we need to yell if these happen. As for deferring the death certificate, well, what if your not-to-be wants access to a sibling or parent or child who is also being interred? Perhaps they get not-to-be'd, declared dead, and now it is your turn and you try to read them? We ran but could not hide?
ie, I am not sure then where to put the certification of death, tho it occurs to be that the unfinished-business queue (if you have read that far) is exactly where this should go, ie we queue instances up for death certification to run after full propagation (including not-to-be processing) after which point it will be safe to say slot access is an absolute bug.
The neat thing is that we do have a distinct "awaken" stage in ufb processing, why not a distinct "ex-parrot" stage?
Another very quick fix would be simply to do something like with-integrity, which explicitly says "I know this may not run right away". We could wrap the not-to-be call in a with-post-mortem macro dynamically binding *dead-is-cool* to t and then any access anywhere to dead things would be allowed. This would avoid the runtime cost of queueing things up for annihilation (but then I do not think that would be a big runtime burden at all).
Anyway, let me implement *dead-is-cool* and have that bound to t before calling not-to-be so there is no need to wrap not-to-be code at all and see what happens. I'll commit something soon, lemme know if it works. :)
kt