Sebastian,
Can you temporarily define this and find the timing/consing for your
test case:
(defmethod gref* ((object vector-double-float) linearized-index)
(cffi:mem-aref
(foreign-pointer object)
:double
linearized-index))
(I think you don't use any matrices but if you do, define an analogous
function for matrix-double-float.)
As you can see, it has the literal type declaration, and I'm hopeful
that CFFI will pick that up and make this competitive in speed with
the best that you saw. If that's so, it should be fairly easy for me
to make this generic and incorporate it into GSD. I'm still
interested in making the linearization more efficient if that's still
significant, but let's try this for now to see how much speed we can
squeeze out of gref*.
Thanks,
Liam
On Tue, Oct 26, 2010 at 10:25 AM, Sebastian Sturm
<Sebastian.Sturm@itp.uni-leipzig.de> wrote:
It seems that CFFI includes some compiler macros that use type information
supplied at compile time to generate more efficient code (got that from the
cffi mailing
list, http://www.mail-archive.com/cffi-devel@common-lisp.net/msg01154.html).
In my case, I'm using this optimization by supplying :double to
cffi:mem-aref. If I replace this by (cl-cffi (element-type zvector)), as is
done internally by gref, then (again with dim = 50), better-force-function
uses around 1.8 GCycles and conses 80 MB in the process, whereas the :double
version needs ~ 8.6 MCycles, not consing anything. The slow-but-flexible
version of better-force-function reads as follows:
(defun better-force-function (dim)
"Given an integer dim, this constructs a function that, when supplied with
a
N-dimensional vector Z and some output vector (-> pointer?), yields the
corresponding forces"
(declare (fixnum dim))
(let ((temp-values (make-array 2 :element-type 'double-float
:initial-element 0.0d0)))
(lambda (zvector output)
(let ((zvector-fptr (grid::foreign-pointer zvector))
(output-fptr (grid::foreign-pointer output))
;; this makes it worse
(elt-type (grid:cl-cffi (grid:element-type zvector)))
)
(macrolet ((quick-ref (the-vector n)
`(cffi:mem-aref
,(case the-vector
(zvector 'zvector-fptr)
(output 'output-fptr))
;; :double
elt-type ;; replace this by :double
,n)))
(do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i))
(setf (aref temp-values 0) 0.0d0)
(do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m))
(do ((n i (1+ n))) ((= n dim)) (declare (fixnum n))
(setf (aref temp-values 1) 0.0d0)
(do ((k m (1+ k))) ((> k n)) (declare (fixnum k))
(incf (aref temp-values 1) (quick-ref zvector k))) ;; generates efficiency
warnings when using elt-type
(incf (aref temp-values 0) (expt (aref temp-values 1) -2))))
(setf (quick-ref output i)
(- (quick-ref zvector i)
(aref temp-values 0)))))))))
Also, with the variable type left unspecified at compile time, the innermost
loop generates efficiency warnings telling me that generic-+ needs to be
used. Writing (the double-float (quick-ref zvector k)) removes these and
slightly reduces the consing amount of the slow variant to ~ 63 MB. I still
have to try the SLIME profiler though.
thanks,
Sebastian