Update of /project/elephant/cvsroot/elephant/doc In directory clnet:/tmp/cvs-serv22287
Modified Files: tutorial.texinfo user-guide.texinfo Log Message: Documentation changes, mostly to transaction section of tutorial
--- /project/elephant/cvsroot/elephant/doc/tutorial.texinfo 2007/04/01 14:33:29 1.11 +++ /project/elephant/cvsroot/elephant/doc/tutorial.texinfo 2007/04/01 20:22:24 1.12 @@ -732,6 +732,8 @@ transaction that performs all the updates atomically and thus enforcing consistency.
+@subsection Why do we need Transactions? + Most real applications will need to use explicit transactions rather than relying on the primitives alone because you will want multiple read-modify-update operations act as an atomic unit. A good example @@ -815,6 +817,8 @@ And presto, we have an ACID compliant, thread-safe, persistent banking system!
+@subsection Using @code{with-transaction} + What is @code{with-transaction} really doing for us? It first starts a new transaction, attempts to execute the body, and if successful commit the transaction. If anywhere along the way there is a deadlock @@ -823,14 +827,145 @@ to retry the transaction a fixed number of times by re-executing the whole body.
-The other value transactions provide is the capability to delay -flushing dirty data to disk. The most time-intensive part of -persistent operations is flushing newly written data to disk. Using -the default auto-commit behavior requires a flush on every operation -which can become very expensive. Because a transaction caches values, -all the values read or written are cached in memory until the -transaction completes, dramatically decreasing the number of flushes -and the total time taken. +And this brings us to two important caveats: nested transactions and +idempotent side-effects. + +@subsection Nesting Transactions + +In general, you want to avoid nesting @code{with-transaction} +statements. Nested transactions are valid for some data stores +(namely Berkeley DB), but typically only a single transaction can be +active at a time. The purpose of a nested transaction in data stores +that provide it, is break a long transaction into chunks. This way if +there is contention on a given subset of variables, only the inner +transaction is restarted while the larger transaction can continue. +When commit their results, those results become part of the outer +transaction until it in turn commits. + +If you have transaction protected primitive operations (such as +@code{deposit} and @code{withdraw}) and you want to perform a group of +such transactions, for example a transfer between accounts, you can +use the macro @code{ensure-transaction} instead of @code{with-transaction}. + +@lisp +(defun deposit (account amount) + "Wrap the balance read and the setf with the new balance" + (ensure-transaction () + (let ((balance (balance account))) + (setf (balance account) + (+ balance amount))))) + +(defun deposit (account amount) + "A more concise version with decf doing both read and write" + (ensure-transaction () + (decf (balance account) amount))) + +(defun withdraw (account amount) + (ensure-transaction () + (decf (balance account) amount))) + +(defun transfer (src dst amount) + "There are four primitive read/write operations + grouped together in this transaction" + (with-transaction () + (withdraw src amount) + (deposit dst amount))) +@end lisp + +@code{ensure-transaction} is exactly like @code{with-transaction} +except it will reuse an existing transaction, if there is one, or +create a new one. There is no harm, in fact, in using this macro all +the time. + +Notice the use of @code{decf} and @code{incf} above. The primary +reason to use Lisp is that it is good at hiding complexity using +shorthand constructs just like this. This also means it is also going +to be good at hiding data dependencies that should be captured in a +transaction! + +@subsection Idempotent Side Effects + +Within the body of a with-transaction, any non database operations +need to be @emph{idempotent}. That is the side effects of the body +must be the same no matter how many times the body is executed. This +is done automatically for side effects on the database, but not for +side effects like pushing a value on a lisp list, or creating a new +standard object. + +@lisp +(defparameter *transient-objects* nil) + +(defun load-transients (n) + "This is the wrong way!" + (with-transaction () + (loop for i from 0 upto n do + (push (get-from-root i) *transient-objects*)))) +@end lisp + +In this contrived example we are pulling a set of standard objects +from the database using an integer key and pushing them onto a list +for later use. However, if there is a conflict where some other +process writes a key-value pair to a matching key, the whole +transaction will abort and the loop will be run again. In a heavily +contended system you might see results like the following. + +@lisp +(defun test-list () + (setf *transient-objects* nil) + (load-transients) + (length *transient-objects*)) + +(test-list) +=> 3 + +(test-list) +=> 5 + +(test-list) +=> 4 +@end lisp + +So the solution is to make sure that the operation on the lisp +parameters is atomic if the transaction completes. + +@lisp +(defun load-transients () + "This is a better way" + (setq *transient-objects* + (with-transaction () + (loop for i from 0 upto 3 collect + (get-from-root i))))) +@end lisp + +Of course we would need to use @code{nreverse} if we cared about the +order of instances in @code{*transient-objects*}. The best rule of +thumb is that transaction bodies should be purely functional as above, +except for side effects to the persistent store such as persistent +slot writes, adding to btrees, etc). + +If you do need side effects to lisp memory, such as writes to +transient slots, make sure they are idempotent and that other +processes will not be reading the written values until the transaction +completes. + +@subsection Transactions and Performance + +By now transactions almost look like more work than they are worth! +Well there are still some significant benefits to be had. Part of how +transactions are implemented is that they gather together all the +writes that are supposed to made to the database and store them until +the transaction commits, and then writes them atomically. + +The most time-intensive part of persistent operations is flushing +newly written data to disk. Using the default auto-committing +behavior requires a flush for every primitive write operation. This +can become very expensive! Because all the values read or written are +cached in memory until the transaction completes, the number of +flushes can be dramatically reduced. + +But don't take my word for it, run the following statements and see +for yourself the visceral impact transactions can have on system +performance.
@lisp (defpclass test () @@ -872,52 +1007,42 @@ thumb is to keep the number of objects touched in a transaction well under 1000.
-And this brings us to the last caveat we'll introduce in this -introductory tutorial: nested transactions. - -In general, avoid nesting transactions. Nested transactions are valid -for some data stores (namely Berkeley DB), but typically only a single -transaction is valid at a time. The purpose of a nested transaction -is to allow a long transaction to be broken up into chunks. This way -if there is contention on a given subset of variables, only the -subtransaction is restarted while the larger transaction can continue. -Subtransactions commit their results and they become part of the -outer transaction until it in turn commits. - -If you have transaction protected primitive operations (such as -@code{deposit} and @code{withdraw}) and you want to perform a group of -such transactions, for example a transfer between accounts, you can -use the macro @code{ensure-transaction} instead of @code{with-transaction}. - -@lisp -(defun deposit (account amount) - (ensure-transaction () - (let ((balance (balance account))) - (setf (balance account) - (+ balance amount))))) - -(defun withdraw (account amount) - (ensure-transaction () - (decf (balance account) amount))) - -(defun transfer (src dst amount) - (with-transaction () - (withdraw src amount) - (deposit dst amount))) -@end lisp - -@code{ensure-transaction} is exactly like @code{with-transaction} -except it will reuse an existing transaction, if there is one, or -create a new one. There is no harm, in fact, in using this macro all -the time. +@subsection Transactions and Applications
Designing and tuning a transactional architecture can become quite -complicated. The best strategy at the beginning is a conservative -one, break things up into the smallest logical sets of primitive -operations and only wrap higher level functions in transactions when -they absolutely have to commit together. See @ref{Transaction Details} -for the full details and @pxref{Usage Scenarios} for more examples of -how systems can be designed and tuned using transactions. +complex. Moreover, bugs in your system can be very difficult to find +as they only show up when transactions are interleaved within a +larger, multi-threaded application. + +In many cases, however, you can ignore transactions. For example, +when you don't have any other concurrent processes running. In this +case all operations are sequential and there is no chance of +conflicts. You would only want to use transactions for write +performance. + +You can also ignore transactions if your application can guarantee +that concurrency won't generate any conflicts. For example, a web app +that guarantees only one thread will write to objects in a particular +session can avoid transactions altogether. However, it is good to be +careful about making these assumptions. In the above example, a +reporting function that iterates over sessions, users or other objects +may still see partial updates (i.e. a user's id was written prior to +the query, but not the name). However, if you don't care about these +infrequent glitches, this case would still hold. + +If these cases don't apply to your application, or you aren't sure, +you will fare best by programming defensively. Break your system into +the smallest logical sets of primitive operations +(i.e. @code{withdraw} and @code{deposit}) using +@code{ensure-transaction} and then wrap the highest level calls made +to your system in with-transaction when the operations absolutely have +to commit together or you need the extra performance. Try not to have +more than two levels of transactional accesses with the top using +with-transaction and the bottom using ensure-transaction. + +@xref{Transaction Details} for more details and @pxref{Usage +Scenarios} for examples of how systems can be designed and tuned using +transactions.
@node Advanced Topics @comment node-name, next, previous, up --- /project/elephant/cvsroot/elephant/doc/user-guide.texinfo 2007/04/01 14:33:29 1.5 +++ /project/elephant/cvsroot/elephant/doc/user-guide.texinfo 2007/04/01 20:22:24 1.6 @@ -23,26 +23,6 @@ * Performance Tuning:: How to get the most from Elephant. @end menu
-@node Persistent objects -@comment node-name, next, previous, up -@section Persistent Objects - -Finally, if you for some reason make an instance with a specified OID -which already exists in the database, @code{initargs} take precedence -over values in the database, which take precedences over -@code{initforms}. - -Also currently there is a bug where -@code{initforms} are always evaluated, so beware. -(What is the current model here?) - -Readers, writers, accessors, and @code{slot-value-using-class} are -employed in redirecting slot accesses to the database, so override -these with care. Because @code{slot-value, slot-boundp, -slot-makunbound} are not generic functions, they are not guaranteed by -the specification to work properly with persistent slots. However the -proper behavior has been verified on SBCL, Allegro and Lispworks. - @node The Store Controller @comment node-name, next, previous, up @section The Store Controller @@ -90,6 +70,26 @@
Empty.
+@node Persistent objects +@comment node-name, next, previous, up +@section Persistent Objects + +Finally, if you for some reason make an instance with a specified OID +which already exists in the database, @code{initargs} take precedence +over values in the database, which take precedences over +@code{initforms}. + +Also currently there is a bug where +@code{initforms} are always evaluated, so beware. +(What is the current model here?) + +Readers, writers, accessors, and @code{slot-value-using-class} are +employed in redirecting slot accesses to the database, so override +these with care. Because @code{slot-value, slot-boundp, +slot-makunbound} are not generic functions, they are not guaranteed by +the specification to work properly with persistent slots. However the +proper behavior has been verified on SBCL, Allegro and Lispworks. + @node Class Indices @comment node-name, next, previous, up @section Class Indices @@ -141,6 +141,111 @@ @comment node-name, next, previous, up @section Querying persistent instances
+ + +A SQL select-like interface is in the works, but for now queries are +limited to manual mapping over class instances or doing small queries +with @code{get-instances-*} functions. One advantage of this is that +it is easy to estimate the performance costs of your queries and to +choose standard and derived indices that give you the ordering and +performance you want. + +There is, however, a quick and dirty query API example that is not +officially supported in the release but is intended to invite comment. +This is an example of a full query system that would automatically +perform joins, use the appropriate indices and perhaps even adaptively +suggest or add indices to facilitate better performance on common +queries. + +There are two functions @ref{Function elephant:get-query-instances} +and @ref{Function elephant:map-class-query} which accept a set of +constraints instead of the familiar value or range arguments. + +We'll use the classes @code{person} and @code{department} to +illustrate how to perform queries over a set of objects that may be +constrainted by their relationships to other objects. + +@lisp +(defpclass person () + ((name :initarg :name :index t) + (salary :initarg :salary :index t) + (department :initarg :dept))) + +(defmethod print-object ((p person) stream) + (format stream "#<PERS: ~A>" (slot-value p 'name))) + +(defun print-name (inst) + (format t "Name: ~A~%" (slot-value inst 'name))) + +(defpclass department () + ((name :initarg :name) + (manager :initarg :manager))) + +(defmethod print-object ((d department) stream) + (format stream "#<DEPT ~A, mgr = ~A>" + (slot-value d 'name) + (when (slot-boundp d 'manager) + (slot-value (slot-value d 'manager) 'name)))) +@end lisp + +Here we have a simple employee database with managers (also of type +person) and departments. This simple system will provide fodder for +some reasonably complex constraints. Let's create a few departments. + +@lisp +(setf marketing (make-instance 'department :name "Marketing")) +(setf engineering (make-instance 'department :name "Engineering")) +(setf sales (make-instance 'department :name "Sales")) +@end lisp + +And manager @code{people} for the departments. + +@lisp +(make-instance 'person :name "George" :salary 140000 :department marketing) +(setf (slot-value marketing 'manager) *) + +(make-instance 'person :name "Sally" :salary 140000 :department engineering) +(setf (slot-value engineering 'manager) *) + +(make-instance 'person :name "Freddy" :salary 180000 :department sales) +(setf (slot-value sales 'manager) *) +@end lisp + +And of course we need some folks to manage + +@lisp +(defparameter *names* + '("Jacob" "Emily" "Michael" "Joshua" "Andrew" "Olivia" "Hannah" "Christopher")) + +(defun random-element (list) + "Choose a random element from the list and return it" + (nth (random (length list)) list)) + +(with-transaction () + (loop for i from 0 upto 40 do + (make-instance 'person + :name (format nil "~A~A" (random-elephant *names*) i) + :salary (floor (+ (* (random 1000) 100) 30000)) + :department (case (random 3) + (0 marketing) + (1 engineering) + (2 sales))))) +@end lisp + +Due to the random allocation of +In the follwoing examples below, the results will be different due to the random +allocation of employee names, etc. However, these examples are +illustrative of what you should see if you run the same code. + + + +For those familiar with SQL, if an instance of @code{person} has a +pointer to an instance of @code{department} then that relation can be +used to perform a join. Of course joins in the object world won't +return a table, instead they will return conjunctions of objects that +satisfy a mutual set of constraints. + + @node Using BTrees @comment node-name, next, previous, up @section Using BTrees @@ -174,6 +279,14 @@ @comment node-name, next, previous, up @section Transaction Details
+You can trace @code{elephant::execute-transaction} to see the sequence +of calls to @code{execute-transaction} that occur dynamically and +detect where transactions are and are not happening. We may add some +transaction diagnosis and tracing tools in the future, such as +throwing a condition when @code{with-transaction} forms are nested +dynamically. + + ;; Transaction architecture: ;; ;; User and designer considerations: