Update of /project/elephant/cvsroot/elephant In directory common-lisp.net:/tmp/cvs-serv7649
Modified Files: TUTORIAL Log Message: updates, persistent classes, collections, threading, performance
Date: Sun Aug 29 22:35:37 2004 Author: blee
Index: elephant/TUTORIAL diff -u elephant/TUTORIAL:1.3 elephant/TUTORIAL:1.4 --- elephant/TUTORIAL:1.3 Sun Aug 29 09:51:02 2004 +++ elephant/TUTORIAL Sun Aug 29 22:35:37 2004 @@ -90,12 +90,12 @@
You can put something into the root object by
-* (add-to-root *store-controller* "my key" "my value") +* (add-to-root "my key" "my value") => NIL
and get things out via
-* (get-from-root *store-controller* "my key") +* (get-from-root "my key") => "my value" => T
@@ -115,14 +115,14 @@ things: numbers (except for complexes, which will be easy to support), symbols, strings, nil, characters, pathnames, conses, hash-tables, arrays, CLOS objects. Nested and -circular structures are allowed. Basically everything -except lambdas, closures, structures, packages and streams. -(These may eventually get supported too.) +circular things are allowed. You can serialize basically +anything except lambdas, closures, structures, packages and +streams. (These may eventually get supported too.)
Unfortunately Berekely DB doesn't understand Lisp, so Lisp data needs to be serialized to enter the database, (e.g. converted to byte arrays), and deserialized on the way -out. This introduces some caveats: +out. This introduces some caveats (not unique to Elephant):
1) Lisp identity can't be preserved. Since this is a store which persists across invocations of Lisp, this probably @@ -130,12 +130,12 @@
* (setq foo (cons nil nil)) => (NIL) -* (add-to-root *store-controller* "my key" foo) +* (add-to-root "my key" foo) => NIL -* (add-to-root *store-controller* "my other key" foo) +* (add-to-root "my other key" foo) => NIL -* (eq (get-from-root *store-controller* "my key") - (get-from-root *store-controller* "my other key")) +* (eq (get-from-root "my key") + (get-from-root "my other key")) => NIL
As a consequence, btrees have a sort of mishmash eql / @@ -145,7 +145,7 @@
* (setf (car foo) T) => T -* (get-from-root *store-controller* "my key") +* (get-from-root "my key") => (NIL)
You can of course manually re-input objects. @@ -175,12 +175,12 @@ * (setq foo (make-instance 'my-persistent-class)) => #<MY-PERSISTENT-CLASS {492F4F85}>
-* (add-to-root *store-controller* "foo" foo) +* (add-to-root "foo" foo) => NIL -* (add-to-root *store-controller* "bar" foo) +* (add-to-root "bar" foo) => NIL -* (eq (get-from-root *store-controller* "foo") - (get-from-root *store-controller* "bar")) +* (eq (get-from-root "foo") + (get-from-root "bar")) => T
What's going on here? Persistent classes, that is, classes @@ -190,7 +190,7 @@ are stored in separate entries, keyed by OID and slot. Loading (deserializing) a persistent class
-* (get-from-root *store-controller* "foo") +* (get-from-root "foo") => #<MY-PERSISTENT-CLASS {492F4F85}>
instantiates the object or finds it from the cache, if it @@ -211,7 +211,7 @@
* (setf (slot1 foo) "three") => "three" -* (slot1 (get-from-root *store-controller* "bar")) +* (slot1 (get-from-root "bar")) => "three"
Although it is hard to see here, serialization / @@ -219,6 +219,37 @@ than ordinary CLOS objects. Finally, they do not suffer from merge-conflicts (more on this later.)
+------------------------------ +Rules about Persistent Classes +------------------------------ + +Using the persistent-metaclass metaclass declares all slots +to be persistent by default. To make a non-persistent slot +use the :transient t flag. Class slots are never persisted, +for either persistent or ordinary classes. (Is this the +right behavior?) + +Readers, writers, accessors, and slot-value-using-class are +instrumented. Because slot-value is not a generic function, +it is not guaranteed to work properly with persistent slots +-- don't use it! + +Persistent classes may inherit from other classes. slots +inherited from persistent classes remain persistent. +transient slots and slots inherited from ordinary classes +remain transient. + +Ordinary classes cannot inherit from persistent classes -- +slots need to get stored! Likewise, once a slot is declared +persistent, it cannot later be changed to a transient slot. + +Note that the database is read every time you access a slot. +In particular, if your slot value is not an immediate value, +this will cons the value. Gets are not an expensive +operation (I can do a million reads in 30 seconds), but if +you're concerned, cache values. (In the future we will +provide automatic value caching.) + ------------ Transactions ------------ @@ -258,12 +289,102 @@ If for some reason (like db error) you decide to abort, you can do so via (db-transaction-abort).
-All of this is packaged up in two macros: with-transaction -and with-transaction-retry. The first starts a new -transaction, executes the body, then tries to commit the +All of this is packaged up in with-transaction. It starts a +new transaction, executes the body, then tries to commit the transaction. If anywhere along the way there is a database -error, the transaction is aborted. +error, the transaction is aborted, and it attempts to retry +(a fixed number of times) by re-executing the whole body. + +----------- +Collections +----------- + +The btrees class are to hash-tables as persistent-objects +are to ordinary objects. btrees have a hash-table-like +interface, but store their keys and values directy in a +Sleepycat btree. Btrees may be persisted simply by their +OID. Hence they have all the nice properties of persistent +objects: identity, fast serialization / deserialization, no +merge conflicts..... + +* (defvar friends-birthdays (make-instance 'btree)) +=> FRIENDS-BIRTHDAYS + +* (add-to-root "friends-birthdays" friends-birthdays) +=> #<BTREE {4951CF6D}> + +* (setf (get-value "Andrew" friends-birthdays) "12/22/1976") +=> "12/22/1976" + +* (get-value "Andrew" friends-birthdays) +=> "12/22/1976" +=> T
-with-transaction-retry does the same thing, except on a -failure, after aborting it attempts to automatically retry a -few times: it re-runs the body, and again tries to commit. +Because of serialization semantics, btrees hash on a value, +not identity. This is probably ok for strings, numbers, and +persistent things, but not for ordinary aggregates. + +In the future there will be support for automatically +generating secondary indicies to search or index into btrees +with. + +--------- +Threading +--------- + +Sleepycat plays well with threads and processes. The store +controller is thread-safe by default, that is, can be shared +amongst threads. Transactions may not be shared amongst +threads except serially. One thing which is NOT thread and +process safe is recovery, which should be run when no one is +else is talking to the database environment. Consult the +Sleepycat docs for more information. + +Elephant uses some specials to hold parameters and buffers. +If you're using a natively threaded lisp, you can initialize +these specials to thread-local storage by using the +"run-elephant-thread" function, assuming your lisp creates +thread-local storage for let-bound specials. + +Persisting ordinary aggregate types suffers from something +called "merge-conflicts." Since updating one value of an +aggregate object requires the entire object to be written to +the database, in heavily threaded situations you may +overwrite changes another thread or process has committed. +This is not protected by transactions. + +Consider two processes operating on the same cons: + +-----start--read--update-car--write--commit----------------- +-start------read--update-cdr-----------------write--commit-- + +Although the first process successfully committed their +transaction, their work (writing to the car) will be erased +by the second process's transaction (which writes both the +car and cdr.) + +Persistent classes and persistent collections do not suffer +from merge-conflicts, since each slot / entry is a separate +database entry. + +----------- +Performance +----------- + +Performance is usually measured in transactions per second. +Database reads are cheap. To get more transactions +throughput, consider setting + +* (db-env-set-flags (controller-environment *store-controller*) 1 + :txn-nosync t) + +or look at other flags in the sleepycat docs. This will +greatly increase your throughput at the cost of some +durability; I get around a 100x improvement. This can be +recovered with judicious use of checkpointing and +replication, though this is currently not supported by +Elephant -- see the sleepycat docs. + +The serializer is definitely fast on fixnums, strings, and +persistent things. It is fairly fast but consing with +floats and doubles. YMMV with other values.