Update of /project/elephant/cvsroot/elephant/doc In directory clnet:/tmp/cvs-serv25569/doc
Modified Files: scenarios.texinfo Log Message: final snapshot scenario and code changes
--- /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/19 05:24:37 1.5 +++ /project/elephant/cvsroot/elephant/doc/scenarios.texinfo 2007/04/19 22:25:51 1.6 @@ -7,7 +7,7 @@
@menu * File System Replacement:: Deployment of Elephant as file replacement -* Checkpointing Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence. +* Checkpointing Conventional Program State:: How to recover the application state as recorded in a set of interdependant standard classes for purposes of undo, crash recovery and session persistence. * Persistent System Objects:: Making persistent objects a natural part of your system * Elephant as Database:: Using Elephant as a database for records and user data instead of using a SQL relational Database * Multithreaded Web Applications:: Elephant is a natural match for web applications @@ -99,9 +99,9 @@
@footnote{Example provided by Ian Eslick, April 2007}
-@node Checkpointing Program State +@node Checkpointing Conventional Program State @comment node-name, next, previous, up -@section Checkpointing Program State +@section Checkpointing Conventional Program State
Another challenge for many programs is saving some subset of program state. This could involve checkpointing an evolving computation, @@ -168,12 +168,368 @@
@subsection Implementation: The Snapshot Set
-To generalize all this behavior, we will define a new class called a -snapshot set. The set itself is a persistent object that wraps a -btree, but provides all the automation to store and recover sets of -objects. +In this section we walk through the implementation of the snapshot set +in detail as it provides: + +@itemize +@item Insight into constraints in serialization and lisp object identity +@item How to leverage Elephant for some more sophisticated applications than + persistent indices and class slots. +@item Helps you understand a useful utility (that we may add to an extensions + release in the future) +@end itemize + +To generalize the behavior discussed above, we will define a new +persistent class called a snapshot set. The set itself is a wrapper +around the btree, but provides all the automation to store and recover +sets of standard objects. + +@lisp +(defpclass snapshot-set () + ((index :accessor snapshot-set-index :initform (make-btree)) + (next-id :accessor snapshot-set-next-id :initform 0) + (root :accessor snapshot-set-root :initform nil) + (cache :accessor snapshot-set-cache + :initform (make-hash-table :weak-keys t) + :transient t) + (touched :accessor snapshot-set-touched + :initform (make-array 20 :element-type 'fixnum + :initial-element 0 :fill-pointer t :adjustable t) + :transient t)) + (:documentation "Keeps track of a set of standard objects + allowing a single snapshot call to update the store + controller with the latest state of all objects registered with + this set")) +@end lisp + +The set class keeps track of IDs, a set of cached objects in memory, +the on-disk btree for storing instances by uid and the current uid +variable value. Notice the use of the transient keyword argument for +the cache. + +There are two major operations supported by sets @code{snapshot} and +@code{restore}. These save objects to disk and restore objects to +memory, along with proper recovery of multiple references to the same +object. + +Additional operations are: + +@itemize +@item Registration: Adding and removing objects from a set +@item Root operations: Easy access to a single root hash table or object +@item Mapping: Walk over all objects in a set +@end itemize + +To enable snapshots, we have to register a set of root objects with +the set. This function ignores objects that are already cached, +otherwise allocates a new ID and caches the object. + +@lisp +(defmethod register-object ((object standard-object) (set snapshot-set)) + "Register a standard object. Not recorded until snapshot is called on db" + (aif (lookup-cached-id object set) + (values object it) + (let ((id (incf (snapshot-set-next-id set)))) + (cache-snapshot-object id object set) + (values object id)))) + +(defun lookup-cached-id (obj set) + (gethash obj (snapshot-set-cache set))) + +(defun cache-snapshot-object (id obj set) + (setf (gethash obj (snapshot-set-cache set)) id)) +@end lisp + +A parallel function registers hash tables. One very important +invariant implied here is that the cache always contains objects that +are eq and mapped back to a serialized object in the backing btree. +There is no need, however, to immediately write objects to the store +and this gives us some transactional properties: snapshots are atomic, +consistent and durable. Isolation is not enforced by snapshots. + +This means that the transient cache has to be valid immediately after +the snapshot set is loaded from the data store. + +@lisp +(defmethod initialize-instance :after ((set snapshot-set) &key lazy-load &allow-other-keys) + (unless lazy-load (restore set))) +@end lisp + +This also has consequences for unregistration. Removing a root object +should also result in the removal of all objects that are unreachable +from other roots. However, since side effects are not permanent until +a snapshot operation, we merely have to garbage collect id's that were +not touched during a snapshot operation. This makes unregistration +simple. + +@lisp +(defmethod unregister-object (object (set snapshot-set)) + "Drops the object from the cache and backing store" + (let ((id (gethash object (snapshot-set-cache set)))) + (when (null id) + (error "Object ~A not registered in ~A" object set)) + (drop-cached-object object set))) +@end lisp + +But snapshots are a little bit more work. + +@lisp +(defmethod snapshot ((set snapshot-set)) + "Saves all objects in the set (and any objects reachable from the + current set of objects) to the persistent store" + (with-transaction (:store-controller (get-con (snapshot-set-index set))) + (loop for (obj . id) in (get-cache-entries (snapshot-set-cache set)) do + (save-snapshot-object id obj set)) + (collect-untouched set))) + +(defun save-snapshot-object (id obj set) + (unless (touched id set) + (setf (get-value id (snapshot-set-index set)) + (cond ((standard-object-subclass-p obj) + (save-proxy-object obj set)) + ((hash-table-p obj) + (save-proxy-hash obj set)) + (t (error "Cannot only snapshot standard-objects and hash-tables")))) + (touch id set)) + id) + +(defun collect-untouched (set) + (map-btree (lambda (k v) + (unless (touched k set) + (remove-kv k (snapshot-set-index set)))) + (snapshot-set-index set)) + (clear-touched set)) +@end lisp + +We go through all objects in the cache, storing objects as we go via +@code{save-snapshot-object}. This function is responsible for storing +objects and hash tables and recursing on any instances that are +referenced. Any object that is saved is added to a touch list so they +are not stored again and we can mark stored instances for the +@code{collect-untouched} call which ensures that newly unreachable +objects are deleted from the persistent store. Any newly found +objects are added to the in-memory cache which, being a weak array, +should eventually drop references to objects that are not referred to +elsewhere. + +It should be noted that garbage objects not garbage collected from the +weak-array based cache may be stored to and restored from the +persistent store. However this is merely a storage overhead as they +will eventually be dropped across sessions as there are no saved +references to them. + +Now when we serialize a standard object, all the slot values are +stored inline. This means that by default, a slot that refers to a +standard object would get an immediately serialized version rather +than a reference. This of course makes it impossible to restore +multiple references to a single object. The approach taken here is to +instantiate a @emphasize{proxy} object which is a copy of the original +class and stores references to normal values in its slots. Any +references to hashes or standard classes are replaced with a reference +object that records the unique id of the object so it can be properly +restored. + +@lisp +(defun save-proxy-object (obj set) + (let ((svs (subsets 2 (slots-and-values obj)))) + (if (some #'reified-class-p (mapcar #'second svs)) + (let ((proxy (make-instance (type-of obj)))) + (loop for (slotname value) in svs do + (setf (slot-value proxy slotname) + (if (reify-class-p value) + (reify-value value set) + value))) + proxy) + obj))) +@end lisp + +The function checks whether any slot value can be reified (represented +by a unique id) and if so, makes a new proxy instance and properly +instantiates its slots, returning it to the main store function which +writes the proxy object to the btree. + +On restore, we simply load all objects into memory. + +@lisp +(defmethod restore ((set snapshot-set)) + "Restores a snapshot by setting the snapshot-set state to the last snapshot. + If this is used during runtime, the user needs to drop all references + to objects and retrieve again from the snapshot set. Also used to initialize + the set state when a set is created, for example pulled from the root of a + store-controller, unless :lazy-load is specified" + (clear-cache set) + (map-btree (lambda (id object) + (load-snapshot-object id object set)) + (snapshot-set-index set))) + +(defun load-snapshot-object (id object set) + (let ((object (ifret object (get-value id (snapshot-set-index set))))) + (cond ((standard-object-subclass-p object) + (load-proxy-object id object set)) + ((hash-table-p object) + (load-proxy-hash id object set)) + (t (error "Unrecognized type ~A for id ~A in set ~A" (type-of object) id set))))) +@end lisp + +If an object has a reference object in a slot, then we simply restore +that object as well. @code{load-snapshot-object} accepts null for an +object so it can be used recursively when a reference object refers to +an object (via the unique id) that is not yet cached. The @code{load} +functions return an object so that they can used directly to create +values for writing slots or hash entries. + +@lisp +(defun load-proxy-object (id obj set) + (ifret (lookup-cached-object id set) + (progn + (cache-snapshot-object id obj set) + (let ((svs (subsets 2 (slots-and-values obj)))) + (loop for (slotname value) in svs do + (when (setrefp value) + (setf (slot-value obj slotname) + (load-snapshot-object (snapshot-set-reference-id value) nil set))))) + obj))) +@end lisp + +A full set of source code for @code{snapshot-sets} can be found in the +Elephant source tree under @code{src/conrib/eslick/snapshot-set.lisp}. + +@subsection Using Snapshot Sets + +A snapshot set is quite easy to use. Load the complete code and play +with this simple walk through. First we need to create a set object, + +@lisp +(setf my-set (make-instance 'snapshot-set)) +@end lisp + +and add it to the root so we don't lose track of it.
-@subsection Isolating snapshot sets +@lisp +(add-to-root 'my-set my-set) +@end lisp + +Then we need some objects to play with. + +@lisp +(defclass my-test-class () + ((value :accessor test-value :initarg :value) + (reference :accessor test-reference :initarg :reference))) + +(setf obj1 (make-instance 'my-test-class :value 1 :reference nil)) +(setf obj2 (make-instance 'my-test-class :value 2 :reference obj1)) +(setf obj3 (make-instance 'my-test-class :value 3 :reference obj2)) + +(register-object obj3 my-set) +(snapshot my-set) +@end lisp + +Now your set should have persistent versions of all three classes that +are reachable from @code{obj3}. + +@lisp +(map-set (lambda (x) (print (test-value x))) my-set) +=> +3 +2 +1 +@end lisp + +Of course such fully connected objects are not always common, so we'll +demonstrate using hash tables to create root indexes into our objects +and sidestep registration calls entirely. We'll create a fresh set to +work with. + +@lisp +(setf my-set (make-instance 'snapshot-set)) +(add-to-root 'my-set my-set) + +(setf obj4 (make-instance 'my-test-class :value 4 :reference obj1)) +(setf obj5 (make-instance 'my-test-class :value 5 :reference nil)) + +(setf hash (make-hash-table)) +(setf (snapshot-root my-set) hash) + +(setf (gethash 'obj3 hash) obj3) +(setf (gethash 'obj4 hash) obj4) +(setf (gethash 'obj5 hash) obj5) + +(snapshot my-set) +@end lisp + +To properly simulate restoring objects, we need to drop our old hash +table as well as clear the persistent object cache so the snapshot set +transient object is reset. + +@lisp +(setf my-set nil) +(setf hash nil) +(elephant::flush-instance-cache *store-controller*) +@end lisp + +Now we'll pretend we're startup up a new session. + +@lisp +(setf my-set (get-from-root 'my-set)) +(setf hash (snapshot-root my-set)) +@end lisp + +The cache is automatically populated by the implicit @code{restore} +call during snapshot-set initialization, and our hash table should now +have all the proper references. We'll pull out a few. + +@lisp +(setf o4 (gethash 'obj4 hash)) +(setf o3 (gethash 'obj3 hash)) +(setf o2 (test-reference o3)) + +(not (or (eq o4 obj4) + (eq o3 obj3) + (eq o2 obj2))) +=> t +@end lisp + +The new objects should not be eq the old ones as we have restored +fresh copies from the disk. + +If you review the setup above, @code{obj3} references @code{obj2} +which references @code{obj1} and @code{obj4} also references +@code{obj1}. So if the objects were properly restored, these +references should be @code{eq}. + +@lisp +(eq (test-reference o2) (test-reference o4)) +=> t +@end lisp + +And finally we can demonstrate the restorative power of snapshot sets. + +@lisp +(remhash 'obj5 hash) + +(gethash 'obj5 hash) +=> nil nil + +(restore my-set) +(setf hash (snapshot-root my-set)) + +(gethash 'obj5 hash) +=> #<MY-TEST-CLASS ..> t + +(test-value *) +=> 5 +@lisp + +This means that while our set object was not reset, the restore +operation properly restored the old reference structure of our root +hash object. Unfortunately, in this implementation you have to reset +your lisp pointers to get access to the restored objects. + +A future version could traverse the existing object cache, dropping +new references and restoring old ones so that in-memory lisp pointers +were still valid. + +@subsection Isolating multiple snapshot sets
A brief note on how to separate out the objects you want to store from those you don't may be useful. We want to snapshot groups of @@ -281,10 +637,9 @@
Of course this doesn't work for multi-threaded environments, or for
[11 lines skipped]