Dear CFFI developers!
Recently we have migrated one of our largest projects from our home-grown foreign-function interface implementation to CFFI. The project is almost 1M LOC with almost 1K of foreign-functions.
Previously, we have inlined all of our foreign-function stubs. We also use FTYPEs through the code for type-safety and optimization. Having CFFI functions inlined and FTYPEs generated would spare us from writing those declarations on our own.
The attached patch contains a proposal that we would like to share. Please, let us know about any modification necessary that would make this patch useful for a larger community.
FYI: We have tested the patch against about a hundredth open-source Lisp projects.
Kindly,
just a trivial comment:
maybe its docstring should read:
"True if DEFCFUN is marked to be inlined in the lexical environment ENV."
also, in my own projects i'd call it something like
%inline-defcfun?
but in CFFI i guess it should be something like
%inline-defcfun-p
unless you have plans to use it in a more general way. then maybe
%inline-cffi-definitions-p
Hi Andrzej,
On Wed, Nov 1, 2017 at 10:43 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
Recently we have migrated one of our largest projects from our home-grown foreign-function interface implementation to CFFI. The project is almost 1M LOC with almost 1K of foreign-functions.
Cool! That's interesting to know!
The attached patch contains a proposal that we would like to share. Please, let us know about any modification necessary that would make this patch useful for a larger community.
The idea of automatically inlining DEFCFUNs when SPACE > 2 > DEBUG is interesting. If we go down that route, we'd better be consistent across lisps, but I wonder if there's such a thing is possible when the optimisations vary so much across Lisps. Are CL otimisation declarations the best way to control the optimisation? I'm skeptical. I'm also a bit worried about using SBCL internals.
I'm curious to know the purpose of the declarations. Are you mostly taking advantage of SBCL's declarations-as-assertions behaviour or are you looking for some sort of optimisation? Do you have any examples showing how they help?
Cheers,
Hi Luis,
I should have thanked you and your colleagues in my initial email for implementing and maintaining CFFI. It is a great library with a super design - personally I like it very much.
After some thoughts, I agree, that the inlining control is not a fully-baked idea, yet. I'll split the patch into INLINE and FTYPE parts.
You may ask why we have decided to inline (almost) every of the DEFCFUNs? CFFI interfaces to C - obviously - and a lot of code in C deals with double-floats and 64-bit integers. Without the stubs being inline, every call to C and back produces boxed values, impacting the performance.
In the end, one of the reasons why developers interface to C is performance, so IMHO the CFFI library, which is already great by default, should allow developers to access every bit of performance that is there.
Regarding the FTYPE declarations - we build our software stacks with different compiler policy settings. In developer/test builds the declarations act as run-time assertions. In production builds they produce much better optimized code. At compile time, we can detect program errors using the type system.
The lisp-parameter/value-type generics, introduced here, allow us to declare custom CFFI types that for example pass like int64 but hold only fixnum values. Or we can declare a string type that processes as simple-base-string which gives nice optimizations in SBCL.
Also, always declaring FTYPEs on DEFCFUNs is a good thing since the C code that we interface to is rather static as compared to Lisp code. Thus static function declarations of C stubs will always be a win.
Cheers,
On Mon, Nov 6, 2017 at 10:38 AM, Luís Oliveira luismbo@gmail.com wrote:
Hi Andrzej,
On Wed, Nov 1, 2017 at 10:43 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
Recently we have migrated one of our largest projects from our home-grown foreign-function interface implementation to CFFI. The project is almost 1M LOC with almost 1K of foreign-functions.
Cool! That's interesting to know!
The attached patch contains a proposal that we would like to share. Please, let us know about any modification necessary that would make this patch useful for a larger community.
The idea of automatically inlining DEFCFUNs when SPACE > 2 > DEBUG is interesting. If we go down that route, we'd better be consistent across lisps, but I wonder if there's such a thing is possible when the optimisations vary so much across Lisps. Are CL otimisation declarations the best way to control the optimisation? I'm skeptical. I'm also a bit worried about using SBCL internals.
I'm curious to know the purpose of the declarations. Are you mostly taking advantage of SBCL's declarations-as-assertions behaviour or are you looking for some sort of optimisation? Do you have any examples showing how they help?
Cheers,
-- Luís Oliveira http://kerno.org/~luis/
On Mon, Nov 6, 2017 at 5:10 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
After some thoughts, I agree, that the inlining control is not a fully-baked idea, yet.
Incidentally, this recent pro@c-l.net message makes me a whole lot less skeptical about your suggested approach: https://mailman.common-lisp.net/pipermail/pro/2017-November/001464.html. The introspect-environment library seems to provide a nice portable API for inspecting optimization policies. Alas, there doesn't seem to be away to define your own declarations; I was hoping we might be able to something like (declaim (inline-cffi-functions <boolean>)) or something along those lines.
I'll split the patch into INLINE and FTYPE parts.
Sounds like a good idea.
You may ask why we have decided to inline (almost) every of the DEFCFUNs? CFFI interfaces to C - obviously - and a lot of code in C deals with double-floats and 64-bit integers. Without the stubs being inline, every call to C and back produces boxed values, impacting the performance.
Right. Float boxing; that's a classic. But, I (perhaps naively) didn't expect boxing to happen even without the declarations. Take the following example:
(in-package :cffi)
(declaim (optimize (speed 3) (space 0) (safety 0) (debug 0)))
(declaim (inline fabs-1)) (defun fabs-1 (x) (foreign-funcall "fabs" :double x :double))
(declaim (inline fabs-2)) (defun fabs-2 (x) (declare (double-float x)) (foreign-funcall "fabs" :double x :double))
(defun foo-1 (x) (floatp (fabs-1 x)))
(defun foo-2 (x) (floatp (fabs-2 x)))
I expected FABS-1 and FABS-2 to be identical. (FOREIGN-FUNCALL eventually expands to ALIEN-FUNCALL and I was expecting that would provide all the type information SBCL's compiler might need.) The disassembly shows that FAB-2 is one instruction shorter. But disassembling FOO-1 and FOO-2 shows that they're identical.
Perhaps this example is too contrived. Do you have a better one?
Cheers,
Hello Luis,
Indeed, I am guilty as well of writing a compiler-macro based Lisp-form compiler - as hinted in the pro@c-l.net post :) It was one of the motivations to add FTYPEs everywhere. Thank you for the interesting pointer.
As stated in the quoted comment, double-float boxing is one of the reasons why we inline CFFI stubs. It's not like SBCL could chose a different call convention depending on known type declarations. Once inlined, there will be no difference for FABS-1 and FABS-2 - with or without type of ftype declarations. Thank you for the excellent example illustrating this classic issue.
The FTYPEs help SBCL produce better code if the double-float function stubs are not inlined (for some reason). Also, going back to the post you have mentioned, having FTYPEs on CFFI could help compiler-macros take informed decisions about code transformations.
E.g. (quickly typed - might not compile)
(declaim (ftype (function (double-float double-float) (values double-float &optional)) pow) (inline pow)) (defcfun pow :double (base :double) (exp :double))
(defun test (&rest nums) (sum (bind #'pow 2) nums))
Could expand to:
(defun test (&rest nums) (flet ((#:f1 (#:e1) (declare (double-float #:e1)) (pow 2 #:e1))) (declare (ftype (function (double-float) (values double-float &optional)) #:f1) (dynamic-extent #'#:f1)) (let ((#:a1 (first nums))) (declare (double-float #:a1)) (dolist (#:n1 (rest nums) #:a1) (setf #:a1 (+ #:a1 (funcall #'#:f1 #:n1)))))))
Which should allow the compiler to choose an unboxed representation for passing arguments to the internal #'#:f1. BTW: I am sure SBCL would do just fine with just 1/3rd of the declarations above - but unsure which 1/3rd ... this time of year.
Cheers,
On Tue, Nov 7, 2017 at 7:12 PM, Luís Oliveira luismbo@gmail.com wrote:
On Mon, Nov 6, 2017 at 5:10 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
After some thoughts, I agree, that the inlining control is not a
fully-baked
idea, yet.
Incidentally, this recent pro@c-l.net message makes me a whole lot less skeptical about your suggested approach: https://mailman.common-lisp.net/pipermail/pro/2017-November/001464.html. The introspect-environment library seems to provide a nice portable API for inspecting optimization policies. Alas, there doesn't seem to be away to define your own declarations; I was hoping we might be able to something like (declaim (inline-cffi-functions <boolean>)) or something along those lines.
I'll split the patch into INLINE and FTYPE parts.
Sounds like a good idea.
You may ask why we have decided to inline (almost) every of the DEFCFUNs? CFFI interfaces to C - obviously - and a lot of code in C deals with double-floats and 64-bit integers. Without the stubs being inline, every call to C and back produces boxed values, impacting the performance.
Right. Float boxing; that's a classic. But, I (perhaps naively) didn't expect boxing to happen even without the declarations. Take the following example:
(in-package :cffi)
(declaim (optimize (speed 3) (space 0) (safety 0) (debug 0)))
(declaim (inline fabs-1)) (defun fabs-1 (x) (foreign-funcall "fabs" :double x :double))
(declaim (inline fabs-2)) (defun fabs-2 (x) (declare (double-float x)) (foreign-funcall "fabs" :double x :double))
(defun foo-1 (x) (floatp (fabs-1 x)))
(defun foo-2 (x) (floatp (fabs-2 x)))
I expected FABS-1 and FABS-2 to be identical. (FOREIGN-FUNCALL eventually expands to ALIEN-FUNCALL and I was expecting that would provide all the type information SBCL's compiler might need.) The disassembly shows that FAB-2 is one instruction shorter. But disassembling FOO-1 and FOO-2 shows that they're identical.
Perhaps this example is too contrived. Do you have a better one?
Cheers,
-- Luís Oliveira http://kerno.org/~luis/
Andrzej,
I'm convinced this is a good idea. If you have the time/budget, I suggest you open a pull request. Sooner or later we'll need documentation and tests.
Cheers, Luís
On Wed, Nov 8, 2017 at 4:50 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
Hello Luis,
Indeed, I am guilty as well of writing a compiler-macro based Lisp-form compiler - as hinted in the pro@c-l.net post :) It was one of the motivations to add FTYPEs everywhere. Thank you for the interesting pointer.
As stated in the quoted comment, double-float boxing is one of the reasons why we inline CFFI stubs. It's not like SBCL could chose a different call convention depending on known type declarations. Once inlined, there will be no difference for FABS-1 and FABS-2 - with or without type of ftype declarations. Thank you for the excellent example illustrating this classic issue.
The FTYPEs help SBCL produce better code if the double-float function stubs are not inlined (for some reason). Also, going back to the post you have mentioned, having FTYPEs on CFFI could help compiler-macros take informed decisions about code transformations.
E.g. (quickly typed - might not compile)
(declaim (ftype (function (double-float double-float) (values double-float &optional)) pow) (inline pow)) (defcfun pow :double (base :double) (exp :double))
(defun test (&rest nums) (sum (bind #'pow 2) nums))
Could expand to:
(defun test (&rest nums) (flet ((#:f1 (#:e1) (declare (double-float #:e1)) (pow 2 #:e1))) (declare (ftype (function (double-float) (values double-float &optional)) #:f1) (dynamic-extent #'#:f1)) (let ((#:a1 (first nums))) (declare (double-float #:a1)) (dolist (#:n1 (rest nums) #:a1) (setf #:a1 (+ #:a1 (funcall #'#:f1 #:n1)))))))
Which should allow the compiler to choose an unboxed representation for passing arguments to the internal #'#:f1. BTW: I am sure SBCL would do just fine with just 1/3rd of the declarations above - but unsure which 1/3rd ... this time of year.
Cheers,
On Tue, Nov 7, 2017 at 7:12 PM, Luís Oliveira luismbo@gmail.com wrote:
On Mon, Nov 6, 2017 at 5:10 PM, Andrzej Walczak andrzejwalczak@google.com wrote:
After some thoughts, I agree, that the inlining control is not a fully-baked idea, yet.
Incidentally, this recent pro@c-l.net message makes me a whole lot less skeptical about your suggested approach: https://mailman.common-lisp.net/pipermail/pro/2017-November/001464.html. The introspect-environment library seems to provide a nice portable API for inspecting optimization policies. Alas, there doesn't seem to be away to define your own declarations; I was hoping we might be able to something like (declaim (inline-cffi-functions <boolean>)) or something along those lines.
I'll split the patch into INLINE and FTYPE parts.
Sounds like a good idea.
You may ask why we have decided to inline (almost) every of the DEFCFUNs? CFFI interfaces to C - obviously - and a lot of code in C deals with double-floats and 64-bit integers. Without the stubs being inline, every call to C and back produces boxed values, impacting the performance.
Right. Float boxing; that's a classic. But, I (perhaps naively) didn't expect boxing to happen even without the declarations. Take the following example:
(in-package :cffi)
(declaim (optimize (speed 3) (space 0) (safety 0) (debug 0)))
(declaim (inline fabs-1)) (defun fabs-1 (x) (foreign-funcall "fabs" :double x :double))
(declaim (inline fabs-2)) (defun fabs-2 (x) (declare (double-float x)) (foreign-funcall "fabs" :double x :double))
(defun foo-1 (x) (floatp (fabs-1 x)))
(defun foo-2 (x) (floatp (fabs-2 x)))
I expected FABS-1 and FABS-2 to be identical. (FOREIGN-FUNCALL eventually expands to ALIEN-FUNCALL and I was expecting that would provide all the type information SBCL's compiler might need.) The disassembly shows that FAB-2 is one instruction shorter. But disassembling FOO-1 and FOO-2 shows that they're identical.
Perhaps this example is too contrived. Do you have a better one?
Cheers,
-- Luís Oliveira http://kerno.org/~luis/
-- Andrzej Walczak (Google/ITA Software Engineer)