Hi,
James Bielman wrote:
Thu Feb 2 18:41:28 CST 2006 James Bielman jamesjb@jamesjb.com
- Add an optimization for defining non-translatable types.
Is that something akin to sealing? Maybe that's a good step towards optimization. I've noticed the following behaviour (with CFFI-CVS from last week or so), which led me to write this note:
Since the addition of the new type translators, I'd got some bad feelings in my stomach (and wrote about it), without the time to look closer. Now I looked.
Somehow I feel there's a missing API for compile-time transformations in the new branch, similar to the old duality provided by translate-to-c <-> to-c-FORM.
To make a long story short, here's output for test struct.5 (defctype my-int :int) (defcstruct s5 (a my-int)) (deftest struct.5 (with-foreign-object (s 's5) (setf (foreign-slot-value s 's5 'a) 42) (foreign-slot-value s 's5 'a)) 42)
The old CFFI produces (just skim over the rightmost column): 12 Bytecode-Instruktionen: 0 (LOAD&PUSH 1) 1 (CALL1&PUSH 0) ; FFI:FOREIGN-ADDRESS 3 (CONST&PUSH 1) ; 42 4 (LOAD&PUSH 1) 5 (CONST&PUSH 2) ; FFI:INT 6 (CONST&PUSH 3) ; 0 7 (CALL 4 4) ; FFI::WRITE-MEMORY-AS 10 (LOAD&PUSH 0) 11 (CONST&PUSH 2) ; FFI:INT 12 (CONST&PUSH 3) ; 0 13 (CALL 3 5) ; FFI:MEMORY-AS 16 (SKIP&RET 3) Neat. Perfect.
The new branch of CFFI produces: 18 Bytecode-Instruktionen: 0 (LOAD&PUSH 1) 1 (CALL1&PUSH 0) ; FFI:FOREIGN-ADDRESS 3 (CONST&PUSH 1) ; 42 4 (CONST&PUSH 2) ; #<CFFI::FOREIGN-TYPEDEF MY-INT> 5 (CALL2&PUSH 3) ; CFFI::TRANSLATE-TYPE-TO-FOREIGN 7 (LOAD&PUSH 0) 8 (LOAD&PUSH 2) 9 (CONST&PUSH 4) ; FFI:INT 10 (CONST&PUSH 5) ; 0 11 (CALL 4 6) ; FFI::WRITE-MEMORY-AS 14 (SKIP 1) 16 (LOAD&PUSH 0) 17 (CONST&PUSH 4) ; FFI:INT 18 (CONST&PUSH 5) ; 0 19 (CALL&PUSH 3 7) ; FFI:MEMORY-AS 22 (CONST&PUSH 2) ; #<CFFI::FOREIGN-TYPEDEF MY-INT> 23 (CALL2 8) ; CFFI::TRANSLATE-TYPE-FROM-FOREIGN 25 (SKIP&RET 3)
Ouch. Two generic function calls, which each call another generic function (e.g. translate-to-foreign), some of which cons, e.g. (defmethod translate-type-to-foreign (value (type foreign-typedef)) ;;We build a list out of the second value returned... ;;IMHO bug: 1. undocumented and 2. most don't provide a second value
So we compare o 4 GF calls in total, and some consing o to nothing like this previously o with something a C compiler compiles to 2-4 native instructions.
IMHO this hurts. It hurts even more so, since CFFI originaly started out from disatisfaction with UFFI on cmucl, where James Bielman observed boxing of float values IIRC, and he initially reported performance improvements over UFFI using his approach. Right now, I'd expect CFFI to lag far behind UFFI in performance (at least with typedefs and structs, other tests are still compiled fine).
The type translators may have gained a lot of flexibility. However such gain should not come at the cost of lack of compile-time optimization. People often enough turn towards an FFI when some fast library is needed they don't want to port...
I believe there's a need for some translate-*-FORM protocol, which can generate code for known types, and eliminate run-time GF calls when present, by which I mean that user-defined custom fancy converters may still call GF at run-time, they are welcome. But GF is not the typical case, which gets optimized as tight as can be.
Regards, Jorg Hohle
"Hoehle, Joerg-Cyril" Joerg-Cyril.Hoehle@t-systems.com writes:
Thu Feb 2 18:41:28 CST 2006 James Bielman jamesjb@jamesjb.com
- Add an optimization for defining non-translatable types.
Is that something akin to sealing?
Maybe that's a good step towards optimization. I've noticed the following behaviour (with CFFI-CVS from last week or so), which led me to write this note:
Since the addition of the new type translators, I'd got some bad feelings in my stomach (and wrote about it), without the time to look closer. Now I looked.
Somehow I feel there's a missing API for compile-time transformations in the new branch, similar to the old duality provided by translate-to-c <-> to-c-FORM.
To make a long story short, here's output for test struct.5 (defctype my-int :int) (defcstruct s5 (a my-int)) (deftest struct.5 (with-foreign-object (s 's5) (setf (foreign-slot-value s 's5 'a) 42) (foreign-slot-value s 's5 'a)) 42)
Today's patch allows you to optimize the case where you define a typedef that you can guarantee does not need translation (as in this example). So you can do:
(defctype my-int :int :translate-p nil)
And you should get the same disassembly as before. Any TRANSLATE-* methods defined on MY-INT will simply be ignored.
Ouch. Two generic function calls, which each call another generic function (e.g. translate-to-foreign), some of which cons, e.g. (defmethod translate-type-to-foreign (value (type foreign-typedef)) ;;We build a list out of the second value returned... ;;IMHO bug: 1. undocumented and 2. most don't provide a second value
This doesn't need to be documented---it builds a list of the optional second return values from each (possibly nested) type translator for use internally so that it can recurse down the list when freeing the object, passing the correct parameters to each free method. If user-defined translators don't return a second value, it will just pass nil to the free method.
So we compare o 4 GF calls in total, and some consing o to nothing like this previously o with something a C compiler compiles to 2-4 native instructions.
IMHO this hurts. It hurts even more so, since CFFI originaly started out from disatisfaction with UFFI on cmucl, where James Bielman observed boxing of float values IIRC, and he initially reported performance improvements over UFFI using his approach. Right now, I'd expect CFFI to lag far behind UFFI in performance (at least with typedefs and structs, other tests are still compiled fine).
Well, the idea here was to get it right first, then get it fast. The :TRANSLATE-P argument is a start at optimizing types that won't need translation.
Ideally, it would be possible to tell at compile-time whether a given type would need translation based on the set of applicable methods on TRANSLATE-TO-FOREIGN, but since this method can also be specialized on the Lisp value to convert, all the specializers are not known.
I believe there's a need for some translate-*-FORM protocol, which can generate code for known types, and eliminate run-time GF calls when present, by which I mean that user-defined custom fancy converters may still call GF at run-time, they are welcome. But GF is not the typical case, which gets optimized as tight as can be.
I've thought a little bit about this, and I think an interface based loosely on Common Lisp compiler macros (but always expanded) might be a good fit here---define the translator using the method when there is no way around doing the conversion at runtime, but override using the "compiler macro" (which can punt when necessary and fall back to the runtime method) when performing conversions on known types.
Obviously, what you lose with such an interface is the ability to cleanly translate different Lisp types in different ways---for example passing pointers as :STRING values without translation. This is the primary reason the new interface is defined using GFs.
James
"James" == James Bielman jamesjb@jamesjb.com writes:
[...] James> Today's patch allows you to optimize the case where you define a James> typedef that you can guarantee does not need translation (as in James> this example). So you can do:
James> (defctype my-int :int :translate-p nil)
Am I the only one using docstrings with defctype? :)
(defctype evas :pointer "Evas")
; caught ERROR: ; (during macroexpansion of (DEFCTYPE EVAS ...)) ; error while parsing arguments to DEFMACRO DEFCTYPE: ; odd number of elements in keyword/value list: ("Evas")
--J.
Jan Rychter jan@rychter.com writes:
Am I the only one using docstrings with defctype? :)
(defctype evas :pointer "Evas")
(defctype evas :pointer :documentation "Evas")
is the way to do it now.
Snipped from Joerg-Cyril Hoehle's message:
IMHO this hurts. It hurts even more so, since CFFI originaly started out from disatisfaction with UFFI on cmucl, where James Bielman observed boxing of float values IIRC, and he initially reported performance improvements over UFFI using his approach. Right now, I'd expect CFFI to lag far behind UFFI in performance (at least with typedefs and structs, other tests are still compiled fine).
I actually agree with most of the content of your message, but wanted to point out that on SBCL, CFFI is orders of magnitude faster at dealing with structs than UFFI. Basically, UFFI generates SBCL code with no type declarations, which means that every access to a slot-value in a foreign struct generates (I kid you not) 250,000 bytes of consing as it called naturalize and compiles up some closures. CFFI, because its struct representation is directly in bits (as I understand it), avoids the SBCL alien functions for accessing the structure, and is way way faster. (It is possible to write efficient SBCL alien code directly (NOT UFFI), but I have not yet found a way to do it without declarations that are quite painful).
So I always want things faster, but on SBCL, CFFI is making me VERY VERY happy compared to UFFI.
Cheers,
rif