---------- Forwarded message ---------- From: Sebastian Sturm Sebastian.Sturm@itp.uni-leipzig.de Date: Fri, Oct 22, 2010 at 2:13 PM Subject: Re: [Gsll-devel] Efficient access to externally generated double-float arrays? To: Liam Healy lhealy@common-lisp.net, gsll-devel@common-lisp.net
I'm surprised about the grid:gref result. Please run a profiler and confirm that the time consumed is within grid:gref and its callees,
I tried using sb-sprof, but as a complete CL newbie I had a hard time making sense of its output and thus ended up ignoring the profiler and using trial-and-error again. It seems that a fair amount of consing is due to the index linearization. Simply replacing grid:gref by grid:gref* for one-dimensional arrays reduces consing by a factor of 3 in my experiments. It seems that the rest can be eliminated by specifying the data type (here it's :double) at compile-time; incorporating this as an option to grid:gref via some macro-trickery would probably uglify the GSLL framework somewhat (I guess?), but I'd be willing to pay that price in return for an order-of-magnitude speedup. Also, I haven't tried if similar problems arise (and if similar workarounds are available) for the case of complex-valued arrays. That being said, I'm currently using the following force-function (where caching the foreign-ptrs is irrelevant with respect to consing, but reduces the amount of cycles needed to access array elements by a factor of ~ 5):
(defun better-force-function (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (declare (fixnum dim)) (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (let ((zvector-fptr (grid::foreign-pointer zvector)) (output-fptr (grid::foreign-pointer output))) (macrolet ((quick-ref (the-vector n) `(cffi:mem-aref ,(case the-vector (zvector 'zvector-fptr) (output 'output-fptr)) :double ,n))) (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (quick-ref zvector k))) (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (quick-ref output i) (- (quick-ref zvector i) (aref temp-values 0)))))))))
Using the timing code from my first post and dim set to 50, I get the following results:
- using plain-vanilla grid:gref: Evaluation took: 2.692 seconds of real time 2.668223 seconds of total run time (2.624302 user, 0.043921 system) [ Run times consist of 0.082 seconds GC time, and 2.587 seconds non-GC time. ] 99.11% CPU 5,906,443,268 processor cycles 189,391,360 bytes consed
- using the unpleasant cffi:mem-aref workaround: Evaluation took: 0.004 seconds of real time 0.003107 seconds of total run time (0.003049 user, 0.000058 system) 75.00% CPU 6,997,397 processor cycles 0 bytes consed
As a side point, it seems like your use of an array is a case of premature optimization: you've concluded it will run faster by allocating an array of length two, without studying the results both ways. I think there would be no measurable difference in speed; perhaps even faster with scalars. And I disagree with you about elegance: scalars are much more understandable to me and therefore elegant than array cruft. But it's your choice and a side point anyway.
Concerning the last point, I fully agree with you; probably my previous post was somewhat unclear. Accessing certain unnamed elements of some arbitrary temp-array is probably the most ugly solution one could think of, but it seems to be a simple way to ensure low overhead (I took the idea from a paper by Fateman). Since it doesn't really relate to the problem at hand, I figured that I could just leave it at that until the more pressing issue is resolved. Still, I appreciate very much any advice on how to further improve speed and/or clarity.
Thanks alot for your help, Sebastian