Sebastian,
Can you temporarily define this and find the timing/consing for your test case:
(defmethod gref* ((object vector-double-float) linearized-index) (cffi:mem-aref (foreign-pointer object) :double linearized-index))
(I think you don't use any matrices but if you do, define an analogous function for matrix-double-float.)
As you can see, it has the literal type declaration, and I'm hopeful that CFFI will pick that up and make this competitive in speed with the best that you saw. If that's so, it should be fairly easy for me to make this generic and incorporate it into GSD. I'm still interested in making the linearization more efficient if that's still significant, but let's try this for now to see how much speed we can squeeze out of gref*.
Thanks,
Liam
On Tue, Oct 26, 2010 at 10:25 AM, Sebastian Sturm Sebastian.Sturm@itp.uni-leipzig.de wrote:
It seems that CFFI includes some compiler macros that use type information supplied at compile time to generate more efficient code (got that from the cffi mailing list, http://www.mail-archive.com/cffi-devel@common-lisp.net/msg01154.html). In my case, I'm using this optimization by supplying :double to cffi:mem-aref. If I replace this by (cl-cffi (element-type zvector)), as is done internally by gref, then (again with dim = 50), better-force-function uses around 1.8 GCycles and conses 80 MB in the process, whereas the :double version needs ~ 8.6 MCycles, not consing anything. The slow-but-flexible version of better-force-function reads as follows: (defun better-force-function (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (declare (fixnum dim)) (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (let ((zvector-fptr (grid::foreign-pointer zvector)) (output-fptr (grid::foreign-pointer output)) ;; this makes it worse (elt-type (grid:cl-cffi (grid:element-type zvector))) ) (macrolet ((quick-ref (the-vector n) `(cffi:mem-aref ,(case the-vector (zvector 'zvector-fptr) (output 'output-fptr)) ;; :double elt-type ;; replace this by :double ,n))) (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (quick-ref zvector k))) ;; generates efficiency warnings when using elt-type (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (quick-ref output i) (- (quick-ref zvector i) (aref temp-values 0))))))))) Also, with the variable type left unspecified at compile time, the innermost loop generates efficiency warnings telling me that generic-+ needs to be used. Writing (the double-float (quick-ref zvector k)) removes these and slightly reduces the consing amount of the slow variant to ~ 63 MB. I still have to try the SLIME profiler though. thanks, Sebastian