On 03/09/07, Mikael Lax mikael.lax@bredband.net wrote:
I was working on a cl-opengl app recently and I noticed that my textures were taking much longer to load than I could remember.
(defun test-1 ()
[...]
(setf (cffi:mem-aref data 'cl-opengl-bindings:ubyte i) (aref array i))))))
(defun test-2 ()
[...]
(setf (cffi:mem-aref data type i) (aref array i))))))
So the problem is that in TEST-2 the type argument to MEM-AREF is not known at compile-time. In TEST-1 that call to mem-aref is compiled down to a fewer assembler instructions.
In TEST-2 however, on each iteration you have at least 3 generic function calls, 4 regular function calls (that probably call more of their own) including one to PARSE-TYPE that needs to access an hash-table to retreive the parser closure. UBYTE is defined through DEFCTYPE, which memoizes the type instance, at least we're not instantiating a new type object every time.
- (time (test-1)) 0.027 seconds of real time
[...]
3,145,736 bytes consed.
[..]
- (time (test-2))
Evaluation took: 9.565 seconds of real time
[...]
688,918,576 bytes consed.
I didn't do any profiling but I guess that extra overhead I described explains the time difference. I'm not sure what's consing 700M though.
So, if you can make your type constant, do it. If not, you can still move the majority of that overhead out of the inner loop:
(defun test-3 () (let* ((array (make-array (* 512 512 3) :initial-element 0)) (count (length array)) (type :unsigned-char)) (with-foreign-object (data type count) (let ((ctype (cffi::canonicalize (cffi::parse-type type))) (size (foreign-type-size type))) (loop for i below count do (cffi::%mem-set (aref array i) data ctype (* i size)))))))
CFFI> (time (test-3)) Evaluation took: 0.356 seconds of real time 0.121568 seconds of user run time 0.065652 seconds of system run time 0 calls to %EVAL 0 page faults and 9,437,320 bytes consed.
For comparision, these are the other timings in my laptop:
CFFI> (time (test-1)) Evaluation took: 0.162 seconds of real time 0.068727 seconds of user run time 0.008378 seconds of system run time 0 calls to %EVAL 0 page faults and 3,145,736 bytes consed.
CFFI> (time (test-2)) Evaluation took: 34.538 seconds of real time 12.495962 seconds of user run time 3.882277 seconds of system run time [Run times include 1.007 seconds GC run time.] 0 calls to %EVAL 0 page faults and 701,490,664 bytes consed.
So, if there's a need for this, we might want to export things like PARSE-TYPE and CANONICALIZE (with better names) and have the various operators accept parsed types.
HTH,