Here is something similar to what I suggested, although still very unfinished. It's probably not the proper way to write a macro, but I guess other people on the list will know how to do it correctly. Efficient linearization can be added in the same manner; also, the macro should be modified s.t. it accepts several array specifications at once, i.e. with-foreign-array ((array-1 :double) (array-2 :int) (array-3 :double) ...), etc. Of course, if you can figure out a way to incorporate the optimizations into gref without such a clumsy workaround, I'd be all for it. best regards, Sebastian
(defun mapcons (fn x) (if (atom x) x (funcall fn (let ((a (mapcons fn (car x))) (d (mapcons fn (cdr x)))) (if (and (eql a (car x)) (eql d (cdr x))) x (cons a d))))))
(defmacro with-fast-access-to-single-foreign-array ((array element-type) &body body) (alexandria:with-unique-names (array-fptr) `(let ((,array-fptr (grid::foreign-pointer ,array))) ,@(mapcons (lambda (expr) (if (and (consp expr) (eq (first expr) 'grid:gref*) (eq (second expr) array)) (list 'cffi:mem-aref array-fptr element-type (elt expr 2)) expr)) body))))
(defun macro-force-function (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (declare (fixnum dim)) (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (with-fast-access-to-single-foreign-array (output :double) (with-fast-access-to-single-foreign-array (zvector :double) (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (grid:gref* zvector k))) (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (grid:gref* output i) (- (grid:gref* zvector i) (aref temp-values 0)))))))))
Sebastian,
Trying to walk code correctly is a nasty business, and generally non-portable, so I'd stay away from it. We could do this with a macro/macrolet pair but I'd like to optimize via the CLOS route if we can.
Liam
On Thu, Oct 28, 2010 at 1:01 PM, Sebastian Sturm Sebastian.Sturm@itp.uni-leipzig.de wrote:
Here is something similar to what I suggested, although still very unfinished. It's probably not the proper way to write a macro, but I guess other people on the list will know how to do it correctly. Efficient linearization can be added in the same manner; also, the macro should be modified s.t. it accepts several array specifications at once, i.e. with-foreign-array ((array-1 :double) (array-2 :int) (array-3 :double) ...), etc. Of course, if you can figure out a way to incorporate the optimizations into gref without such a clumsy workaround, I'd be all for it. best regards, Sebastian
(defun mapcons (fn x) (if (atom x) x (funcall fn (let ((a (mapcons fn (car x))) (d (mapcons fn (cdr x)))) (if (and (eql a (car x)) (eql d (cdr x))) x (cons a d))))))
(defmacro with-fast-access-to-single-foreign-array ((array element-type) &body body) (alexandria:with-unique-names (array-fptr) `(let ((,array-fptr (grid::foreign-pointer ,array))) ,@(mapcons (lambda (expr) (if (and (consp expr) (eq (first expr) 'grid:gref*) (eq (second expr) array)) (list 'cffi:mem-aref array-fptr element-type (elt expr 2)) expr)) body))))
(defun macro-force-function (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (declare (fixnum dim)) (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (with-fast-access-to-single-foreign-array (output :double) (with-fast-access-to-single-foreign-array (zvector :double) (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (grid:gref* zvector k))) (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (grid:gref* output i) (- (grid:gref* zvector i) (aref temp-values 0)))))))))
GSLL-devel mailing list GSLL-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/gsll-devel
Liam,
I'm sorry, I now see a difference between the original gref* and your modified version of it. I guess I forgot to reload the Lisp image last time, so both timings had been obtained using the new gref*. There is still consing, but it's already reduced by a factor of three. Can be reduced by a further ~60% by using (the double-float (grid:gref* ...)) instead of just gref*; I assume this is due to to the float->pointer coercion done to gref*'s <return-value>.(?) Here's the new timing data (dim = 15):
"gref" Evaluation took: 0.063 seconds of real time 0.056171 seconds of total run time (0.044926 user, 0.011245 system) [ Run times consist of 0.006 seconds GC time, and 0.051 seconds non-GC time. ] 88.89% CPU 28 lambdas converted 137,624,278 processor cycles 4,032,960 bytes consed
"gref*" Evaluation took: 0.012 seconds of real time 0.011932 seconds of total run time (0.011929 user, 0.000003 system) 100.00% CPU 26,178,823 processor cycles 785,360 bytes consed
"modified gref*" Evaluation took: 0.001 seconds of real time 0.001202 seconds of total run time (0.000884 user, 0.000318 system) 100.00% CPU 3,126,090 processor cycles 278,512 bytes consed
"hardwired cffi:mem-aref" Evaluation took: 0.000 seconds of real time 0.000033 seconds of total run time (0.000032 user, 0.000001 system) 100.00% CPU 66,462 processor cycles 0 bytes consed
I have attached the lisp file used to obtain these timings.
thanks, Sebastian
OK, this turned out to be a lot harder than I thought. I have done three things: 1) I have defined gref* and (setf gref*) methods specific to each of the foreign-array types that call cffi:mem-aref. This gives about 20x speedup because I am now passing the literal type to cffi:mem-aref, so it can work that fact in at compile time. 2) There is now a compiler macro to turn a grid:gref into a grid:gref* if there's only one index. This gives about 2x speed up when gref is used. 3) There is another compiler macro that turns a grid:gref* into a cffi:mem-aref directly if the foreign array is declared. This gives about a 400x speedup overall, similar to your "hardwired" result. It is a bit slower because I'm not able to precompute the pointer, it has to be recomputed each time on the gref* call.
On the last point, there is a caveat. I tried to make it work when the foreign array has been declared with a standard (declare ...) form. This has a chance of working on SBCL because of its support for the CLtL2 function variable-information, which was removed from CL before it was sent to ANSI standardization. However, it did not work for me; I will continue to try to get this working. In the meantime, the only way to do a declaration to take advantage of 3 is with a 'the form, e.g. (grid:gref (the vector-double-float zvector) i). This is kind of annoying, but it is portable, and allows you to avoid going to lower level functions (i.e., cffi:mem-aref).
So for example see my rewrite of your function (in foreign-array/tests/fast-array-access.lisp) (defun gref-access (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (declare (fixnum dim) (optimize (speed 3) (safety 0) (debug 0)) (type vector-double-float zvector)) ;;; <--- this is useless, but ought not to be! (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (grid:gref (the vector-double-float zvector) k))) ; This declaration does the work! (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (grid:gref output i) (- (grid:gref (the vector-double-float zvector) i) ; This one does too! (aref temp-values 0))))))) is now (almost) as fast as your cffi-access.
There is still a bunch of stuff to be done --- the optimizations only work for vectors, not higher dimensional arrays, and I haven't defined a compiler macro for setf yet on 3). But it's a start; try it in your problem and let me know how it performs.
Liam
My apologies for taking so long to answer. I have verified that the new gref* can now be used without consing and comes relatively close to the unconvenient cffi:mem-aref solution in terms of computation time. Many thanks for that! I also tried the functions on matrices, which still results in some consing, but this doesn't seem to be time-critical here. In my experiments, most of the consing could be eliminated by storing the linearized index in an auxiliary variable, but this even resulted in a slight runtime increase. For my original application, the GSLL solution is now comparable in speed to Mathematica for small dimensionality (dim < about 50); for a fair comparison of both solutions for larger dim, I'll have to rework my very naive computation of the Hessian matrix.
Judging from the git commits, you have also changed something related to complex-valued arrays. Did you implement similar optimizations as in the real-valued case? I'll try that once I have repaired my Hessian.
Best regards, Sebastian
On Sun, Nov 14, 2010 at 10:24 AM, Sebastian Sturm Sebastian.Sturm@itp.uni-leipzig.de wrote:
My apologies for taking so long to answer. I have verified that the new gref* can now be used without consing and comes relatively close to the unconvenient cffi:mem-aref solution in terms of computation time. Many thanks for that! I also tried the functions on matrices, which still results in some consing, but this doesn't seem to be time-critical here. In my experiments, most of the consing could be eliminated by storing the linearized index in an auxiliary variable, but this even resulted in a slight runtime increase. For my original application, the GSLL solution is now comparable in speed to Mathematica for small dimensionality (dim < about 50); for a fair comparison of both solutions for larger dim, I'll have to rework my very naive computation of the Hessian matrix.
Judging from the git commits, you have also changed something related to complex-valued arrays. Did you implement similar optimizations as in the real-valued case? I'll try that once I have repaired my Hessian.
Best regards, Sebastian
The work is still ongoing. I don't know when you last pulled, but I've made a few commits in the last day or so, and I'm working on some more improvements. My previous email was incorrect in stating that the 'declare form doesn't work; it does at least in newer versions of SBCL. Also, I've expanded use to CCL because that supports variable-information as well (this hasn't been committed yet). The other recent change is that inputs and outputs of gref* and (setf gref*) are now wrapped with the appropriate 'the forms to help the compiler optimize. So far these mostly apply only to vectors (1 dimensional arrays) but I am working on compile-time linearization of indices for higher dimensional arrays, so that at compile time the forms will also expand directly into mem-aref calls; that should help with your matrices if they are declared with their dimensions. I'm working on this now. Everything should apply to all foreign array types, including complex.
Liam
OK, I've now committed what I think is my last attempt at this. This latest version includes optimization of matrices and setting values in foreign arrays. This latter comes with a caveat however. The compiler will macroexpand setf, and for SBCL at least, that means making temporary variables to bind the actual values. Those variables of course don't have any declarations, so no compiler macro expansion is done. They way around this is to funcall #'(setf gref*) instead of using the setf macro, something that I bet most people won't want to do. If I get some more energy to pursue this, I could define a setf macro to shadow cl:setf, but I think I'll let this be for now.
I've written a "timing test" function loosely based on yours. It is in foreign-array/tests/timing.lisp. I hope in the next few days to run a few of these tests to see the benefit of the optimizations that have been put in place. You can see I have commented out the funcall #'(setf grid:gref*) form at the end; in this example there are so few sets that the speed difference is undetectable.
Liam