Hello,
I was working on a cl-opengl app recently and I noticed that my textures were taking much longer to load than I could remember. After some investigation I could distill the following test functions:
(defun test-1 () (let* ((array (make-sequence 'vector (* 512 512 3) :initial-element 0)) (count (length array))) (cffi:with-foreign-object (data 'cl-opengl-bindings:ubyte count) (loop for i below count do (setf (cffi:mem-aref data 'cl-opengl-bindings:ubyte i) (aref array i))))))
(defun test-2 () (let* ((array (make-sequence 'vector (* 512 512 3) :initial-element 0)) (count (length array)) (type (cl-opengl::symbolic-type->real-type :unsigned-byte))) (cffi:with-foreign-object (data type count) (loop for i below count do (setf (cffi:mem-aref data type i) (aref array i))))))
Now observe the timings (in SBCL) for them:
* (time (test-1)) Evaluation took: 0.027 seconds of real time 0.019999 seconds of user run time 0.006666 seconds of system run time 0 calls to %EVAL 0 page faults and 3,145,736 bytes consed. NIL
* (time (test-2)) Evaluation took: 9.565 seconds of real time 8.629437 seconds of user run time 0.236652 seconds of system run time [Run times include 0.74 seconds GC run time.] 0 calls to %EVAL 0 page faults and 688,918,576 bytes consed.
A substantial difference is apparent. For completeness I also tried CLISP:
[14]> (time (test-1)) Real time: 10.952836 sec. Run time: 10.169337 sec. Space: 173015440 Bytes GC: 100, GC time: 0.816611 sec. NIL [15]> (time (test-2)) Real time: 18.55567 sec. Run time: 17.215544 sec. Space: 342884976 Bytes GC: 199, GC time: 1.526577 sec. NIL
While not as prominent, a difference is still clearly visible. Now I'm curious as to why there's such a performance difference here, am I missing something obvious? Does it happen to anyone else?
Sincerely, Mikael Lax
PS: I don't know if this or cl-opengl is the appropriate mailing-list for this, but since there seems to be some cross-pollination of cl-opengl/cffi developers I'm sending it here on a gamble.
On 03/09/07, Mikael Lax mikael.lax@bredband.net wrote:
I was working on a cl-opengl app recently and I noticed that my textures were taking much longer to load than I could remember.
(defun test-1 ()
[...]
(setf (cffi:mem-aref data 'cl-opengl-bindings:ubyte i) (aref array i))))))
(defun test-2 ()
[...]
(setf (cffi:mem-aref data type i) (aref array i))))))
So the problem is that in TEST-2 the type argument to MEM-AREF is not known at compile-time. In TEST-1 that call to mem-aref is compiled down to a fewer assembler instructions.
In TEST-2 however, on each iteration you have at least 3 generic function calls, 4 regular function calls (that probably call more of their own) including one to PARSE-TYPE that needs to access an hash-table to retreive the parser closure. UBYTE is defined through DEFCTYPE, which memoizes the type instance, at least we're not instantiating a new type object every time.
- (time (test-1)) 0.027 seconds of real time
[...]
3,145,736 bytes consed.
[..]
- (time (test-2))
Evaluation took: 9.565 seconds of real time
[...]
688,918,576 bytes consed.
I didn't do any profiling but I guess that extra overhead I described explains the time difference. I'm not sure what's consing 700M though.
So, if you can make your type constant, do it. If not, you can still move the majority of that overhead out of the inner loop:
(defun test-3 () (let* ((array (make-array (* 512 512 3) :initial-element 0)) (count (length array)) (type :unsigned-char)) (with-foreign-object (data type count) (let ((ctype (cffi::canonicalize (cffi::parse-type type))) (size (foreign-type-size type))) (loop for i below count do (cffi::%mem-set (aref array i) data ctype (* i size)))))))
CFFI> (time (test-3)) Evaluation took: 0.356 seconds of real time 0.121568 seconds of user run time 0.065652 seconds of system run time 0 calls to %EVAL 0 page faults and 9,437,320 bytes consed.
For comparision, these are the other timings in my laptop:
CFFI> (time (test-1)) Evaluation took: 0.162 seconds of real time 0.068727 seconds of user run time 0.008378 seconds of system run time 0 calls to %EVAL 0 page faults and 3,145,736 bytes consed.
CFFI> (time (test-2)) Evaluation took: 34.538 seconds of real time 12.495962 seconds of user run time 3.882277 seconds of system run time [Run times include 1.007 seconds GC run time.] 0 calls to %EVAL 0 page faults and 701,490,664 bytes consed.
So, if there's a need for this, we might want to export things like PARSE-TYPE and CANONICALIZE (with better names) and have the various operators accept parsed types.
HTH,
"Luís Oliveira" luismbo@gmail.com writes:
I didn't do any profiling but I guess that extra overhead I described explains the time difference. I'm not sure what's consing 700M though.
A quick profiling session reveals CHECK-FOR-TYPEDEF-CYCLES is the culprit, because it conses a fresh hashtable each time. This is called by PARSE-TYPE, which in turn is called by MEM-[A]REF in that inner loop in TEST-2. That's 786432 new hashtables.
TEST-2 with PARSE-TYPE modified not to call CHECK-FOR-TYPEDEF-CYCLES:
CFFI> (time (test-2)) Evaluation took: 11.782 seconds of real time 5.285146 seconds of user run time 0.307772 seconds of system run time [Run times include 0.145 seconds GC run time.] 0 calls to %EVAL 0 page faults and 22,016,008 bytes consed.
I suppose it might be worthwhile to look into some way of optimizing C-F-T-C. This check can probably be done when the type is defined, at least in some cases.
On Mon, 03 Sep 2007 18:21:43 +0100 Luis Oliveira luismbo@gmail.com wrote:
"Luís Oliveira" luismbo@gmail.com writes:
I didn't do any profiling but I guess that extra overhead I described explains the time difference. I'm not sure what's consing 700M though.
A quick profiling session reveals CHECK-FOR-TYPEDEF-CYCLES is the culprit, because it conses a fresh hashtable each time. This is called by PARSE-TYPE, which in turn is called by MEM-[A]REF in that inner loop in TEST-2. That's 786432 new hashtables.
TEST-2 with PARSE-TYPE modified not to call CHECK-FOR-TYPEDEF-CYCLES:
CFFI> (time (test-2)) Evaluation took: 11.782 seconds of real time 5.285146 seconds of user run time 0.307772 seconds of system run time [Run times include 0.145 seconds GC run time.] 0 calls to %EVAL 0 page faults and 22,016,008 bytes consed.
I suppose it might be worthwhile to look into some way of optimizing C-F-T-C. This check can probably be done when the type is defined, at least in some cases.
Thank you very much for your response Luís, it was quite useful to me. One thing I realize I didn't point out very well in my initial mail was that the code in cl-opengl that loads textures from lispy arrays currently expands into the horribly slow test-2 case. Just a heads-up to any cl-opengl users.
Sincerely, Mikael Lax