Update of /project/elephant/cvsroot/elephant/doc In directory clnet:/tmp/cvs-serv8936
Modified Files: tutorial.texinfo user-guide.texinfo Log Message: Latest documentation edits - still very broken so don't look!
--- /project/elephant/cvsroot/elephant/doc/tutorial.texinfo 2007/03/24 12:16:02 1.7 +++ /project/elephant/cvsroot/elephant/doc/tutorial.texinfo 2007/03/25 11:04:38 1.8 @@ -6,15 +6,15 @@ @cindex Tutorial
@menu -* Overview:: Overview of elphant's features +* Overview:: Overview of elephant's features. * Getting Started:: Opening and accessing a store. * The Store Root:: Accessing persistent data. * Serialization:: Storage semantics for lisp values. * Persistent Classes:: Persistent semantics for objects. * Rules about Persistent Classes:: What you need to know. -* Persistent collections:: Keep track of collections of objects. -* Class Indices:: Simple way to keep track of instances. -* Using Transactions:: Enabling ACID database properties. +* Persistent collections:: Keep track of collections of values. +* Indexing Persistent Classes:: Simple way to keep track of persistent instances. +* Using Transactions:: Providing ACID database properties. @end menu
@node Overview @@ -377,82 +377,249 @@ the most recent commits, right? Note that this can be used as a weak form of IPC. But also note that in particular, if your slot value is not an immediate value, reading will cons or allocate the value. Gets -are not an expensive operation; you can perform tens of thousands of -primitive reads per second. However, if you're concerned, cache -large values. +are not an expensive operation; you can perform thousands to tens of +thousands of primitive reads per second. However, if you're +concerned, cache large values in memory.
@node Persistent collections @comment node-name, next, previous, up @section Persistent collections
+The remaining problem outlined in @ref{Serialization} is that +operations which mutate aggregate objects are not persistent. While +we solved this problem for objects, there is no collection type such +as arrays, hashes or lists which provide this ability. Elephant +provides two primary types of collections, a @code{btree} and a +@code{indexed-btree}. + +We will focus on the core concepts of BTrees in this section, for a +detailed review including the behavior of indexed BTrees, @pxref{Using +BTrees}, @ref{Secondary Indices} and @ref{Using Cursors} in the +@ref{User Guide}.
+Elephant provides a rich data structure called a BTree for storing +large sets of key-value pairs. Every key-value pair is stored +independantly in Elephant just like persistent object slots. +Therefore they inherit all the nice properties of persistent objects: +identity, fast serialization / deserialization, no merge conflicts, +etc. + +The primary interface to @code{btree} objects is through +@code{get-value}. You can also @code{setf} @code{get-value} to store +key-value pairs. + +@lisp +(defvar *friends-birthdays* (make-btree)) +=> *FRIENDS-BIRTHDAYS* + +(add-to-root "friends-birthdays" *friends-birthdays*) +=> #<BTREE @{4951CF6D@}> + +(setf (get-value "Ben" *friends-birthdays*) + (encode-universal-time 0 0 0 14 4 1973)) +=> 2312600400 + +(setf (get-value "Andrew" *friends-birthdays*) + (encode-universal-time 0 0 0 22 12 1976)) +=> 2429071200 + +(get-value "Andrew" *friends-birthdays*) +=> 2429071200 +=> T + +(decode-universal-time *) +=> 0 + 0 + 0 + 22 + 12 + 1976 + 2 + NIL + 6 +@end lisp + +In addition to the hash-table like interface, @code{btree} stores +pairs sorted by the lisp value of the key, lowest to highest. This is +works well for numbers, strings, symbols and persistent objects, but +due to serialization semantics may be strange for other values like +arrays, lists, standard-objects, etc. + +Because elements are sorted by value, we should be able to iterate +over all the elements of the BTree in order. We entered the data in +reverse alphabetic order, but will read it out in alphabetical order. + +@lisp +(map-btree (lambda (k v) + (format t "name: ~A utime: ~A~%" k + (subseq (multiple-value-list (decode-universal-time v)) 3 6))) + *friends-birthdays*) +"Andrew" +"Ben" +=> NIL +@end lisp
-@node Class Indices +But what if we want to read out our friends from oldest to youngest, +or youngest to oldest? In the @ref{User Guide}, specifically the +section on @ref{Secondary indices} you will discover ways to sort +according to the order defined by a lisp function of the key-value pair. + +@node Indexing Persistent Classes @comment node-name, next, previous, up -@section Class Indices +@section Indexing Persistent Classes + +Class indices simplify the recording and retrieving of persistent +objects. An indexed class stores every instance of the class that is +created, ensuring that every object is automatically persisted between +sessions. + +@lisp +(defpclass friend () + ((name :accessor name :initarg :name) + (birthday :initarg :birthday)) + (:index t)) +=> #<PERSISTENT-METACLASS FRIEND> + +(defmethod print-object ((f friend) stream) + (format stream "#<~A>" (name f))) + +(defun encode-birthday (dmy) + (apply #'encode-universal-time + (append '(0 0 0) dmy))) + +(defmethod (setf birthday) (dmy (f friend)) + (setf (slot-value f 'birthday) + (encode-birthday dmy)) + dmy) + +(defun decode-birthday (utime) + (subseq (multiple-value-list (decode-universal-time utime)) 3 6)) + +(defmethod birthday ((f friend)) + (decode-birthday (slot-value f 'birthday))) +@end lisp + +Notice the class argument ``:index t''. This tells Elephant to store +a reference to this class. Under the covers, there are a set of +btrees that keep track of classes, but we won't need to worry about +that as all the functionality has been nicely packaged for you. + +We also created our own birthday accessor for convenience so it +accepts and returns birthdays in a list consisting of month, day and +year such as @code{(27 3 1972)}. The index key will be the encoded +universal time, however.
-Class indices are a very convenient way of gaining the efficiency that -BTrees provide. If a given object is most often sought by the value -of one of its slots, which is of course quite common, it is convenient -to define a class index on that slot, although the same functionality -can be gained in a more complicated way through the use of secondary -indices. - -The file @file{tests/testindexing.lisp} provides many useful examples -of both declaring class indexes and using the API to seek objects using them. - -The following code from that file in the test ``indexing-range'' demonstrates -the convenience of a class indexes and the function ``get-instances-by-range''. -Note in the definition of the ``slot1'' the keyword ``:index'' is used to -specify that this slot should be indexed. - -@lisp - (defclass idx-four () - ((slot1 :initarg :slot1 :initform 1 :accessor slot1 :index t)) - (:metaclass persistent-metaclass)) - - - (defun make-idx-four (val) - (make-instance 'idx-four :slot1 val)) - - (with-transaction () - (mapc #'make-idx-four '(1 1 1 2 2 4 5 5 5 6 10))) - - (let ((x1 (get-instances-by-range 'idx-four 'slot1 2 6)) - (x2 (get-instances-by-range 'idx-four 'slot1 0 2)) - (x3 (get-instances-by-range 'idx-four 'slot1 6 15)) - ) - (format t " x1 = ~A~%" (mapcar #'slot1 x1)) - (format t " x2 = ~A~%" (mapcar #'slot1 x2)) - (format t " x3 = ~A~%" (mapcar #'slot1 x3)) -@end lisp - -Additionally, the test -@lisp -(do-test 'INDEXING-TIMING) -@end lisp -Can be used to judge the performance of indexing a medium sized dataset. - -The file @file{src/elephant/classindex.lisp} provides the source code and -some crisp documentation of the class indexing system. - -Note that for retrieving items, the API is provided by three functions: - -@lisp -(defgeneric get-instances-by-class (persistent-metaclass)) -(defgeneric get-instances-by-value (persistent-metaclass slot-name value)) -(defgeneric get-instances-by-range (persistent-metaclass slot-name start end)) -@end lisp - -By using these functions, any class that is a subclass of persistent-metaclass -can also be thought of as a container of all of its instances, which are -persistent in the database between lisp invocations. Moreover an individual -object can be looked up on O(log n) time via a value on which it is indexed. - -At the top of this same file, you will find the a description of the API -which can be used to dynamically add and remove indexes. (Adding and -removing indexes can also be performed by a re-execution of the ``defclass'' -macro with different values.) +Now we can easily manipulate all the instances of a class. + +@lisp +(defun print-friend (friend) + (format t " name: ~A birthdate: ~A~%" (name friend) (birthday friend))) + +(make-instance 'friend :name "Carlos" :birthday (encode-birthday '(1 1 1972))) +(make-instance 'friend :name "Adriana" :birthday (encode-birthday '(24 4 1980))) +(make-instance 'friend :name "Zaid" :birthday (encode-birthday '(14 8 1976))) + +(get-instances-by-class 'friends) +=> (#<Carlos> #<Adriana> #<Zaid>) + +(mapcar #'print-friend *) + name: Carlos birthdate: (1 1 1972) + name: Adriana birthdate: (24 4 1980) + name: Zaid birthdate: (14 8 1976) +=> (#<Carlos> #<Adriana> #<Zaid>) +@end lisp + +But what if we have thousands of friends? Aside from never getting +work done, our get-instances-by-class will be doing a great deal of +consing, eating up lots of memory and wasting our time. Fortunately +there is a more efficient way of dealing with all the instances of a +class. + +@lisp +(map-class #'print-friend 'friend) + name: Carlos birthdate: (1 1 1972) + name: Adriana birthdate: (24 4 1980) + name: Zaid birthdate: (14 8 1976) +=> NIL +@end lisp + +@code{map-class} has the advantage that it does not keep references to +objects after they are processed. The garbage collector can come +along, clear references from the weak instance cache so that your +working set is finite. The list version above conses all objects into +memory before you can do anything with them. The deserialization +costs are very low in both cases. + +Notice that the order in which the records are printed are not sorted +according to either name or birthdate. Elephant makes no guarantee +about the ordering of class elements, so you cannot depend on the +insertion ordering shown here. + +So what if we want ordered elements? How do we access our friends +according to name and birthdate? This is where slot indices come into +play. + +@lisp +(defpclass friend () + ((name :accessor name :initarg :name :index t) + (birthday :initarg :birthday :index t))) +@end lisp + +Notice the :index argument to the slots. Also notice that we dropped +the class :index argument. Specifying that a slot is indexed +automatically registers the class as indexed. While slot indices +increase the cost of writes and disk storage, each entry is only +slightly larger than the size of the slot value. Numbers, small +strings and symbols are good candidate types for indexed slots, but +any value may be used, even different types. + +Once we've indexed a slot, we can use another set of +@code{get-instances} and @code{map} functions to access objects +in-order and by their slot value. + +@lisp +(get-instances-by-value 'friends 'name "Carlos") +=> (#<Carlos>) + +(get-instances-by-range 'friends 'name "Adam" "Devin") +=> (#<Adriana> #<Carlos>) + +(get-instances-by-range 'friend 'birthday (encode-birthday '(1 1 1974)) (encode-birthday '(31 12 1984))) +=> (#<Zaid> #<Adriana>) + +(mapc #'print-friend *) + name: Zaid birthdate: (14 8 1976) + name: Adriana birthdate: (24 4 1980) +=> (#<Zaid> #<Adriana>) + +(map-class-index #'print-friend 'friend 'name "Carlos" "Carlos") + name: Carlos birthdate: (1 1 1972) +=> NIL + +(map-class-index #'print-friend 'friend 'name "Adam" "Devin") + name: Adriana birthdate: (24 4 1980) + name: Carlos birthdate: (1 1 1972) +=> NIL + +(map-class-index #'print-friend 'friend 'birthday + (encode-birthday '(1 1 1974)) + (encode-birthday '(31 12 1984))) + name: Zaid birthdate: (14 8 1976) + name: Adriana birthdate: (24 4 1980) +=> NIL + +(map-class-index #'print-friend 'friend 'birthday nil (encode-birthday '(10 10 1978))) + name: Carlos birthdate: (1 1 1972) + name: Zaid birthdate: (14 8 1976) +=> NIL + +(map-class-index #'print-friend 'friend 'birthday + (encode-birthday '(10 10 1975)) + nil) + name: Zaid birthdate: (14 8 1976) + name: Adriana birthdate: (24 4 1980) +=> NIL +@end lisp
You can enable/disable class indexing for an entire class. When you disable indexing all references to instances of that class are lost. If you re-enable @@ -606,16 +773,14 @@ to retry the transaction a fixed number of times by re-executing the whole body.
-You can see in the example that a single statement in lisp can include -several primitive Elephant operations as in the decf statement in -withdraw. It takes some careful thinking to properly implement -transactions, for a complete treatment @pxref{Transaction details}. - -The other thing transactions can give us is the ability to put off -synchronizing our data to disk. The expensive part of persistent -writes is flushing data to disk. Since a transaction caches values, -all the read and written values are kept in memory until the -transaction is complete, this can dramatically improve performance. +The other value transactions provide is the capability to delay +flushing dirty data to disk. The most time-intensive part of +persistent operations is flushing newly written data to disk. Using +the default auto-commit behavior requires a flush on every operation +which can become very expensive. Because a transaction caches values, +all the values read or written are cached in memory until the +transaction completes, dramatically decreasing the number of flushes +and the total time taken.
@lisp (defpclass test () @@ -625,8 +790,9 @@ (make-instance 'test :slot1 i))) @end lisp
-This can take a long time. Here each new objects that is created has -to independantly write its value to disk and accept a disk flush cost. +This can take a long time, well over a minute on the CLSQL data store. +Here each new objects that is created has to independantly write its +value to disk and accept a disk flush cost.
@lisp (time (with-transaction () @@ -634,7 +800,8 @@ (make-instance 'test :slot1 i)))) @end lisp
-Here, ....... +Wrapping this operation in a transaction dramatically increases the +time from 10's of seconds to a second or less.
@lisp (time (with-transaction () @@ -642,14 +809,64 @@ (make-instance 'test :slot1 i)))) @end lisp
-These are huge differences in performance! -Of course since we are caching all this data in memory, we cannot have -infinitely sized transactions. Large operations need to get split up -into subtransactions. When dealing with persistent objects a good -rule of thumb is to keep your transactions less than 1000 at a time. +When we increase the number of objects within the transaction, the +time cost does not go up linearly. This is because the total time to +write a hundred simple objects is still dominated by the final +synchronization step. +
[56 lines skipped] --- /project/elephant/cvsroot/elephant/doc/user-guide.texinfo 2007/03/24 13:55:15 1.1 +++ /project/elephant/cvsroot/elephant/doc/user-guide.texinfo 2007/03/25 11:04:38 1.2 @@ -76,49 +76,71 @@ @code{with-open-controller} macro. Opening and closing a controller is very expensive.
+ +@node{Class indices} +@comment node-name, next, previous, up +@section Class indicies + +You can enable/disable class indexing for an entire class. When you disable +indexing all references to instances of that class are lost. If you re-enable +class indexing only newly created classes will be stored in the class index. +You can manually restore them by using @code{find-class-index} to get the +clas index BTree if you have an alternate in-memory index. + +You can add/remove a secondary index for a slot. So long as the class index +remains, this can be done multiple times without losing any data. + +There is also a facility for defining 'derived slots'. These can be non-slot +parameters which are a function of the class's persistent slot values. For +example you can use an index to keep an alternate representation available +for fast indexing. If an object has an x,y coordinate, you could define a +derived index for r,theta which stored references in polar coordinates. +These would be ordered so you could iterate over a class-index to get objects +in order of increasing radius from the origin or over a range of theta. + +Beware, however, that derived indices have to compute their result every +time you update any persistent instance's slot. This is because there is +no way to know which persistent slots the derived index value(s) depends +on. Thus there is a fairly significant computational cost to objects +with frequent updates having derived indices. The storage cost, however, +may be less as all that is added is the index value and an OID reference +into the class index. To add a slot value you add a serialized +OID+class-ref+slotname to index value which can be much larger if you +use long slotnames and package names and unicode. + +Thus, the question of if and how a given class should be indexed is +very flexible and dynamic, and does not need to be determined at the +beginning of your development. This represents the ability to ``late bind'' +the decision of what to index. + +In general, there is always a tradeoff: an indexed slot increases storage +associated with that slot and slows down write operations. Reads however remain +as fast as for unindexed persistent slots. The Elephant system +makes it simple to choose where and when one wants to utilize this tradeoff. + +Finally, that file @file{src/elephant/classindex-utils.lisp} documents +tools for handling class redefinitions and the policy that should be +used for synchronizing the classes with the database. This process is +somewhat user customizable; documentation for this exists in the source +file referenced above. + @node{Using BTrees} @comment node-name, next, previous, up @section Using BTrees
-The btree class are to hash-tables as persistent-objects are to -ordinary objects. BTrees have a hash-table-like interface, but store -their keys and values directly as rows in a Sleepycat BTree. Btrees -may be persisted simply by their OID. Hence they have all the nice -properties of persistent objects: identity, fast serialization / -deserialization, no merge conflicts..... - - (defvar *friends-birthdays* (make-btree)) - => *FRIENDS-BIRTHDAYS* - - (add-to-root "friends-birthdays" *friends-birthdays*) - => #<BTREE {4951CF6D}> - - (setf (get-value "Andrew" *friends-birthdays*) - (encode-universal-time 0 0 0 22 12 1976)) - => 2429071200 - - (setf (get-value "Ben" *friends-birthdays*) - (encode-universal-time 0 0 0 14 4 1976)) - => 2407298400 - - (get-value "Andrew" *friends-birthdays*) - => 2429071200 - => T - - (decode-universal-time *) - => 0 - 0 - 0 - 22 - 12 - 1976 - 2 - NIL - 6 - -Because of serialization semantics, BTrees hash on a value, not -identity. This is probably ok for strings, numbers, and persistent -things, but may be strange for other values. +A BTree is a data structure designed for on-disk databases. +Traditional binary trees are a tree structure that stores a key and +value and two links in each node. To get to a value, you compare your +query key to the node key. If it is equal, return the value in this +node. If it is less, follow the first link and if it is greater, +follow the second link. The problem here is that every link requires +a disk seek. + +The BTree exploits the properties of memory/disk data heirarchies. +The key properties are that disk seeks are expensive, loading large +blocks of data is relatively inexpensive after a seek and comparisons +on objects that are stored in memory is cheap. +
@node Repository Migration @comment node-name, next, previous, up