Phillip,
Nice write-up. Random notes:
What I discovered is quite cool. The Cells system *automatically
discovers* dynamic dependencies, without having to explicitly specify that
X depends on Y, as long as X and Y are both implemented using cell
objects.
<g> And that is part of why Cells is pretty much all-or-nothing for a
developer: I have not tried to figure out the threshhold, but above what I
think is a very low one, all application semantics must be expressed
declaratively as Cell rules. Otherwise imperative code gets left out of the
action as the automatic dataflow engine does its thing. The corollary being
your "as long as" qualifier: my declarative rules are crippled if some
important datapoint is not a Cell.
For the first seven years of Cells development, when in doubt I started out
a new slot/attribute as a non-Cell, bending over backwards if you will not
to force the mechanism where it should not go. The default :cell
meta-attribute was nil until just recently. But in each case, soon enough it
turned out I would need them to be Cells. Just recently "true" became the
default for :cell.
Specifically, the cells system understands how to make event-based updates
orderly and deterministic, in a way that peak.events cannot.
It may be of interest that this orderliness is relatively new to Cells. For
the longest time and in the most intense applications I got away with
murder. Strangely, it was development of a RoboCup client that forced Cells
to "grow up".
One especially interesting bit is that
the Cells system can "optimize away" dependencies or subscriptions when
their subjects are known to be constant values.
I was quite surprised at how much faster this made Cells run.
I'm also wondering if a Cells-like system couldn't also be used to
implement STM (Software Transactional Memory) to allow for atomic
operations even in the presence of threads. All reads and writes are
controlled by the cells system, so it can in principle abort and retry a
"transaction", by waiting until *something changes* that would affect the
transaction's ability to succeed.
We have a Google SoC project over on the Lisp side to implement STM, and
yes, I am excited about that making Cells viable in a multi-threaded
situation. Mind you, I had never heard of STM before this proposal landed on
our doorstep, nor do even have much idea of what is available to
applications when it comes to dealing with threads, but looking at how Cells
manages data integrity I know it will need help to survive threads. STM
looks like a great fix.
However, seeing how the Cells paradigm works, it seems to me that it should
be pretty easy to establish the convention that side-effects should be
confined to non-rule "observer" code.
Right, it is just a convention, but I think one that gets easier to follow
because the engine provides a simple way to say "do this when the time is
right".
experience w/e.g. peak.binding attributes shows that it's rare to want to
put side-effects into pull-oriented rules.
"We could do it, but it would be wrong."
Really, the principal downside to Cells is wrapping your head around the
idea that *everything* should be treated as pull-oriented rules.
Yes, it really is a paradigm shift, one it takes a long time to internalize.
What I noticed was that, if I decided to add a significant new mechanism to
the system, after about two hours of coding I would be having increasing
difficulties and start to get a vague "bad feeling". Then I would realize
that I had, from long habit, fallen back into an imperative style. Hence the
"bad feeling". Because the code was all new, it did not grow naturally from
the Cell-based model. if it had, It would of course been done originally in
the declarative style.
I have encouraged Ryan, the PyCells author, not to allow backdoors to the
Cells engine, precisely because of this. The big win comes from the
declarative paradigm, and developers will not climb that learning curve if
they can avoid it. SImple human nature. Cells makes one think harder up
front in return for all sorts of good things later, and that is a tradeoff I
have always liked to make as a developer.
There are
some operations (such as receiving a command and responding to it) that
seem to be more naturally expressed as pushing operations, where you make
some decisions and then directly update things or send other commands out.
Exactly! A spreadsheet is a steady-state thing (here are the values, here is
the computed other state) and using Cells to express static reality is a
snap. Otoh, imperative code is all about change, so it is great for handling
events.
We use ephemeral Cells to model events (they take on a value, propagate,
then revert to null silently, without propagating), but one still can end up
thinking pretty hard when it comes to events. I think the most frightening
"rule" I have written was for a Timer class implemented by the Tcl "after"
command.
Actually, you can still do that, it's just that those updates or commands
force another "virtual moment" of time into being, where if you had made
them pull-driven they could've happened in the *same* "virtual
moment". So, it's more that pull-based rules are slightly more efficient
than push-based ones, which is nice because that means most developers will
consider it worth learning how to do it the pull way. ;)
That and the straitjacket I hope PyCells keeps from Cells.
Anyway, there is a *lot* of interesting food for thought, here. For
example, you could create object validation rules using cells, and the
results would be automatically recomputed when something they depended on
changed. Not only that, but it would be possible to do atomic updates,
such that the validation wouldn't occur until *after* all the changes were
made -- i.e., no false positives. Of course, you'd get the resulting
validation errors in the *next* "time quantum", so you'd need to make the
response to them event-driven as well.
It's definitely a slippery slope. :)
For example, this deterministic model of computation seems to
resemble "object prevalence" (e.g. Prevayler) in that everything (even the
clock) is deterministic, changes are atomic, and I/O occurs between logical
moments of time. I haven't thought this particular link through very much
yet, it's just an intriguing similarity.
Nice call. I have heard the Cells data integrity model maps nicely onto the
transaction model of AllegroCache, a persistent Lisp object database.
The head-exploding part is figuring out how to get errors to propagate
backwards in time, so that validation rules (which run in the "next
moment") could appear to cause an error at the point where the values were
set.
Sounds like you want at least one Undo. What about a "fail now or forever
hold your peace" policy?
cheers, kenny