[cmucl-imp] Unboxed float arguments
Here is an idea for a calling convention that supports unboxed floats: 1. When DEFUN compiles a function that has a ftype declaration involving floats, it creates a special entry-point that accepts unboxed arguments. 2. Named calls to such functions are compiled so that the arguments are represented as indicated by the ftype declaration. 3. We also keep a linker table to connect those special entry-points to call-sites. When the function gets redefined, we go through the existing call-sites and patch them so that the new definition is called. If the type of the new definition is different than the one expected by the call-site, we create an "adapter function" that converts the representation to the new type or if the conversion is not possible signals an error. Attached is a prototype implementation to illustrate how this could be done. See the comments in the file for the main points. A more polished implementation would probably change more places then MAKE-XEP-LAMBDA. What do you think: is such a calling convention worth having? Helmut ;; Toy implementation for a calling convention with unboxed arguments. (in-package cl-user) ;; Conceptionally we'd like to create functions that have two entry ;; points: one that accepts only arguments of a specific ;; type/representation and a general entry point that accept all ;; possible types as a fallback. ;; ;; A "typed entry point" is an entry point that only accepts arguments ;; of that specfic type and makes no typechecks. There is a direct ;; mapping from the function's type to the representation of the ;; arguments and return values. So if we know the type, e.g. from a ;; ftype declaration, we can predict the representation. ;; ;; This implementation actually uses two functions instead of one ;; function with two entry points. ;; We use function names like (:typed-entry-point foo) for functions ;; which have a typed entry point. (ext:define-function-name-syntax :typed-entry-point (name) (ext:valid-function-name-p (cadr name))) ;; A callsite describes what we need to know to link/re-link a call to ;; a typed entry point. The TYPE slot is the type of the callsite. We ;; use FDEFN to call the entry. In theory, we could patch the call ;; instruction with the correct address of the entry point, but code ;; patching is always tricky, so for now we use an indirection through ;; fdefn objects and only patch the address in the fdefn. ;; ;; When a function gets redefined we patch existing callsites so that ;; the new function gets called. We record the type of the callsite ;; so that we can verify that the type of the new definition matches ;; existing callsites. If the types match, we simply patch the fdefn ;; and are done. If the types don't match we generate an "adapter ;; function". The job of an adapter is it to take arguments of some ;; particular type and box/unbox them so that it becomes possible to ;; call the new definition. The current adapter implementation would ;; sometimes create endless recursions, so we have and ADAPT flag to ;; disable the creation of adapters for some callsites. ;; (defstruct callsite (type (ext:required-argument) :type kernel:function-type :read-only t) (fdefn (ext:required-argument) :type kernel:fdefn :read-only t) (adapt (ext:required-argument) :type boolean :read-only t)) (defstruct linkage (callsites nil :type (or callsite list)) (adapters nil :type (or function list))) ;; For functions with typed entry points we maintiain a list of ;; callsites and record that in the info db. ;; ;; If we use typed entry points only for functions with ftype ;; declarations then this table is probably relatively small. (ext:define-info-type function linkage linkage) (defun listify (x) (if (listp x) x (list x))) (defmacro push-unlistified (new-value (reader object)) `(let ((new-value ,new-value) (object ,object)) (let ((old-value (,reader ,object))) (setf (,reader object) (typecase old-value (null new-value) (cons (cons new-value old-value)) (t (list new-value old-value))))))) ;; find-typed-entry-point is called at load-time and returns the ;; fdefn that should be called. ;; ;; 1. We go through the list of existing callsites to see if we ;; already have one with the same type and reuse it if possible. ;; ;; 2. We look at the current definition. If the types match, we ;; create a callsite object, store it in the info db, and return the ;; fdefn. ;; ;; 3. Now we know that the types don't match we need to use adapters. ;; First again, we look at existing adapters and reuse them if possible. ;; ;; 4. An adapter is created that boxes the arguments and forwards the ;; call to the "normal" entry point. ;; ;; 5. If we are not allowed to create adapters, we look again at the ;; current definition to handle the case where no current definition ;; exists. If so, we return an empty fdefn object that will call the ;; undefined-tramp assembly routine. ;; ;; 6. If all else fails we link the callsite to our error handler. ;; (declaim (ftype (function (t t boolean) kernel:fdefn) find-typed-entry-point)) (defun find-typed-entry-point (name callsite-typespec adapt) (let* ((cs-type (kernel:specifier-type callsite-typespec)) (linkage (multiple-value-bind (info foundp) (ext:info function linkage name) (cond (foundp info) (t (setf (ext:info function linkage name) (make-linkage))))))) (cond ((dolist (cs (listify (linkage-callsites linkage))) (let* ((ep-type (callsite-type cs))) (when (and (function-types-compatible-p cs-type ep-type) (eq (callsite-adapt cs) adapt)) (return (callsite-fdefn cs)))))) ((let ((fdefn (lisp::fdefinition-object name nil))) (when fdefn (let ((fun (kernel:fdefn-function fdefn))) (when fun (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (let* ((aname (kernel:%function-name fun)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) fdefn)))))))) ((and adapt (dolist (fun (listify (linkage-adapters linkage))) (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (let* ((aname (kernel:%function-name fun)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) (return fdefn))))))) ((and adapt (let* ((fun (generate-adapter-function cs-type (second name))) (fdefn (kernel:make-fdefn (kernel:%function-name fun))) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified fun (linkage-adapters linkage)) (push-unlistified cs (linkage-callsites linkage)) fdefn))) ((and (not adapt) (or (not (lisp::fdefinition-object name nil)) (not (kernel:fdefn-function (lisp::fdefinition-object name nil))))) (let* ((aname `(:typed-entry-point #:undefined)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (push-unlistified cs (linkage-callsites linkage)) fdefn)) (t (let* ((fun (generate-adapter-function cs-type 'linkage-error)) (fdefn (kernel:make-fdefn (kernel:%function-name fun))) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) fdefn))))) (defun linkage-error (&rest args) (declare (ignore args)) (error "Linking callsite to typed-entry-point failed")) ;; Generate an adapter function that changes the representation of the ;; arguments (specified with FTYPE) and forwards the call to NAME. ;; The adapter has also a typed entry point. It should also check ;; that the values returned by NAME match FTYPE. ;; ;; In practice, the compiler infered type may not match exactly FTYPE, ;; even if we add lotso declarations. This is annyoingly brittle. (defun generate-adapter-function (ftype name) (let* ((atypes (kernel:function-type-required ftype)) (tmps (loop for nil in atypes collect (gensym))) (fname `(:typed-entry-point :boxing-adapter ,(make-symbol (string name)))) (ftypespec (kernel:type-specifier ftype))) (proclaim `(ftype ,ftypespec ,fname)) (compile fname `(lambda ,tmps (declare ,@(loop for tmp in tmps for type in atypes collect `(type ,(kernel:type-specifier type) ,tmp))) (the ,(kernel:type-specifier (kernel:function-type-returns ftype)) (funcall (function ,name) . ,tmps)))) (let ((fun (fdefinition fname))) (unless (eq name 'linkage-error) (fix-ftype fun ftype)) fun))) (defun fix-ftype (fun ftype) (let ((etype (kernel:extract-function-type fun))) (unless (function-types-compatible-p ftype etype t) (break))) fun) ;; This is our rule to decide when a type at a callsite matches the ;; type of the entry point. ;; ;; 1. The arguments at the callsite should be subtypes of the ;; arguments at the entry point. ;; ;; 2. The return value at the callsite should be supertypes of the ;; return values at the entry point. ;; ;; 3. The representations must agree. Representations should probably ;; decided in the backend, but for now we assume only double-floats ;; are unboxed. (defun function-types-compatible-p (callsite-type entrypoint-type &optional ignore-representation) (flet ((return-types (ftype) (let ((type (kernel:function-type-returns ftype))) (cond ((kernel:values-type-p type) (assert (and (not (kernel:values-type-rest type)) (not (kernel:values-type-keyp type)))) (kernel:values-type-required type)) (t (list type))))) (ptype= (type1 type2) (let ((double-float (kernel:specifier-type 'double-float))) (cond (ignore-representation t) ((kernel:type= type1 double-float) (kernel:type= type2 double-float)) ((kernel:type= type2 double-float) nil) (t t))))) (and (every #'kernel:csubtypep (kernel:function-type-required callsite-type) (kernel:function-type-required entrypoint-type)) (every #'ptype= (kernel:function-type-required callsite-type) (kernel:function-type-required entrypoint-type)) (or (and (every #'kernel:csubtypep (return-types entrypoint-type) (return-types callsite-type)) (every #'ptype= (return-types entrypoint-type) (return-types callsite-type))) (kernel:type= (kernel:function-type-returns entrypoint-type) (kernel:specifier-type 'nil)))))) ;; check-function-redefinition is used as setf-fdefinition-hook. ;; We go through all existing callsites and ;; ;; 1. If the new type matches, we patch the callsite with the new function. ;; ;; 2. If the types don't match and if allowed, we redirect the ;; callsite to and adapter. ;; ;; 3. If the callsites doesn't want adapters we link the callsite to ;; an error handler. (defun check-function-redefinition (name new-fun) (when (and (consp name) (eq (car name) :typed-entry-point) (not (eq (cadr name) :boxing-adapter))) (multiple-value-bind (linkage foundp) (ext:info function linkage name) (when foundp (let ((new-type (kernel:extract-function-type new-fun))) (dolist (cs (listify (linkage-callsites linkage))) (let ((cs-type (callsite-type cs)) (fdefn (callsite-fdefn cs))) (cond ((function-types-compatible-p cs-type new-type) (patch-fdefn fdefn new-fun)) ((and (callsite-adapt cs) (dolist (fun (listify (linkage-adapters linkage))) (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (patch-fdefn fdefn fun) (return t)))))) ((callsite-adapt cs) (let ((fun (generate-adapter-function cs-type (second name)))) (push-unlistified fun (linkage-adapters linkage)) (patch-fdefn fdefn fun))) (t (format nil "~3t") (warn "New type of ~s incompatible with callsite:~%~ ~2t new-type: ~s~%~ ~2t cs-type: ~s" name (kernel:type-specifier new-type) (kernel:type-specifier cs-type)) (let ((fun (generate-adapter-function cs-type 'linkage-error))) (patch-fdefn fdefn fun))))))))))) ;; This lets us set the name in fdefn objects. We use that for ;; debugging. (eval-when (:compile-toplevel) (c:defknown set-fdefn-name (kernel:fdefn t) t) (c:def-setter set-fdefn-name vm:fdefn-name-slot vm:other-pointer-type)) (defun patch-fdefn (fdefn new-fun) (setf (kernel:fdefn-function fdefn) new-fun) (let ((name (kernel:%function-name new-fun))) (set-fdefn-name fdefn name)) fdefn) (pushnew 'check-function-redefinition ext:*setf-fdefinition-hook*) (in-package x86) ;; make-typed-call-tns chooses the representation for a function type. ;; This is similar to c::make-call-out-tns and should probably also be ;; a vm-support-routine. ;; ;; The current convention passes double-floats unboxed and all other ;; types remain boxed. Registers XMM4-XMM7 are used for the first 4 ;; double arguments. Boxed values are passed in standard locations. ;; ;; Returning values on the stack is currenlty not implemented, so all ;; return values must fit in registers. (defun make-typed-call-tns (ftype) (declare (type function-type ftype)) (labels ((ptype (name) (primitive-type-or-lose name *backend*)) (double-float-arg (state) (cond ((<= (getf state :xmms-reg) xmm7-offset) (make-wired-tn (ptype 'double-float) double-reg-sc-number (prog1 (getf state :xmms-reg) (incf (getf state :xmms-reg))))) (t (make-wired-tn (ptype 'double-float) double-stack-sc-number (prog1 (getf state :frame-size) (incf (getf state :frame-size) 2)))))) (boxed-arg (state) (cond ((<= (getf state :reg-args) register-arg-count) (let ((n (getf state :reg-args))) (incf (getf state :reg-args)) (x86-standard-argument-location n))) (t (make-wired-tn (ptype 't) control-stack-sc-number (prog1 (getf state :frame-size) (incf (getf state :frame-size) 1)))))) (double-float-type-p (type) (and (numeric-type-p type) (eq (numeric-type-class type) 'float) (eq (numeric-type-format type) 'double-float))) (arg-tn (type state) (cond ((double-float-type-p type) (double-float-arg state)) (t (boxed-arg state)))) (ret-tn (type state) (let ((tn (arg-tn type state))) (assert (member (sc-name (tn-sc tn)) '(double-reg descriptor-reg))) tn))) (let* ((arg-state (list :frame-size 2 :xmms-reg xmm4-offset :reg-args 0)) (ret-state (list :frame-size 2 :xmms-reg xmm4-offset :reg-args 0)) (returns (function-type-returns ftype)) (rtypes (typecase returns (values-type (values-type-required returns)) (t (list returns))))) (values (loop for type in (function-type-required ftype) collect (arg-tn type arg-state)) (loop for type in rtypes collect (ret-tn type ret-state)) (x86-make-stack-pointer-tn) (max (getf arg-state :frame-size) (getf ret-state :frame-size)) (x86-make-number-stack-pointer-tn) 0)))) ;; This VOP performs the call. Note the (:move-args :local-call) ;; which, hopefully, coerces and moves arguments to the correct ;; representation. (define-vop (cl-user::call-typed-named) (:args (new-fp) (new-nfp) (fdefn :scs (descriptor-reg control-stack) :target eax) (args :more t :scs (descriptor-reg))) (:results (results :more t)) (:save-p t) (:move-args :local-call) (:vop-var vop) (:info arg-locs real-frame-size) (:ignore new-nfp args arg-locs results) (:temporary (:sc descriptor-reg :offset eax-offset) eax) (:generator 30 ;; FIXME: allocate the real frame size here. We had to emit ;; ALLOCATE-FRAME before this vop so that we can use the ;; (:move-args :local-call) option here. Without the ;; ALLOCATE-FRAME vop we get a failed assertion. (inst lea esp-tn (make-ea :dword :base new-fp :disp (- (* real-frame-size word-bytes)))) ;; Move fdefn to eax before switching frames. (move eax fdefn) ;; Write old frame pointer (epb) into new frame. (storew ebp-tn new-fp (- (1+ ocfp-save-offset))) ;; Switch to new frame. (move ebp-tn new-fp) (note-this-location vop :call-site) ;; Load address out of fdefn and call it. (inst call (make-ea :dword :base eax :disp (- (* fdefn-raw-addr-slot word-bytes) other-pointer-type))) )) (in-package c) ;; To generate typed entry points we modify MAKE-XEP-LAMBDA. ;; Currently we use the function name as marker: if the name looks ;; like (:typed-entry-point <foo>) then we know we need to our thing. ;; ;; We lookup the currently declared type and create wrap code around ;; the main function. Our code receives and returns the values with ;; typed calling convention. The n-supplied argument is not used, but ;; the rest of the compiler want it to be there (generates no code ;; thogh). ;; ;; FIXME: instead of using the function name as marker it might be ;; better to have a declaration to request typed entry points. ;; ;; FIXME: %return-results-with-type is currently not inlined ;; (let-converted) proberly and instead generates local-call. Getting ;; the optimisations correctly dones also seems rather brittle. (fwrappers:define-fwrapper make-xep-lambda-wrapper (fun) (declare (type functional fun)) (cond ((and (consp (functional-name fun)) (eq (car (functional-name fun)) :typed-entry-point)) (let* ((ftype (the function-type (info function type (functional-name fun)))) (fspec (type-specifier ftype)) (returns (function-type-returns ftype)) (rtypes (typecase returns (values-type (values-type-required returns)) (t (list returns)))) (n-supplied (gensym)) (temps (loop for nil in (lambda-vars fun) collect (gensym))) (results (loop for nil in rtypes collect (gensym)))) `(lambda (,n-supplied) (declare (ignore ,n-supplied)) (multiple-value-bind ,temps (cl-user::%receive-arguments-with-type ',fspec) (multiple-value-bind ,results (the ,(type-specifier returns) (%funcall ,fun . ,temps)) (cl-user::%return-results-with-type ',fspec . ,results)))))) (t (fwrappers:call-next-function)))) (fwrappers:fwrap 'make-xep-lambda 'make-xep-lambda-wrapper) ;; %receive-arguments-with-type knows how to access the arguments from ;; the passing locations. (defknown cl-user::%receive-arguments-with-type (cons) *) (defoptimizer (cl-user::%receive-arguments-with-type derive-type) ((ftypespec)) (unless (constant-continuation-p ftypespec) (error "Function-type must be constant.")) (let ((ftype (the function-type (specifier-type (continuation-value ftypespec))))) (values-specifier-type `(values . ,(mapcar #'type-specifier (function-type-required ftype)))))) ;; Here we generate the IR2 to receive arguments. make-typed-call-tns ;; chooses the representation and move-continuation-result does the ;; rest. (defoptimizer (cl-user::%receive-arguments-with-type ir2-convert) ((ftypespec) node block) (let* ((ftype (the function-type (specifier-type (continuation-value ftypespec)))) (cont (node-cont node)) (arg-tns (x86::make-typed-call-tns ftype))) (move-continuation-result node block arg-tns cont))) (defknown cl-user::%return-results-with-type (cons &rest *) nil) (defoptimizer (cl-user::%return-results-with-type ltn-annotate) ((type &rest args) node policy) (dolist (arg args) (annotate-ordinary-continuation arg policy))) ;; %return-results-with-type is similar to a known-return just that we ;; let make-typed-call-tns choose the representation. (defoptimizer (cl-user::%return-results-with-type ir2-convert) ((type &rest args) node block) (let* ((ftype (the function-type (specifier-type (continuation-value type)))) (home (lambda-home (lexenv-lambda (node-lexenv node)))) (env (environment-info (lambda-environment home))) (old-fp (ir2-environment-old-fp env)) (return-pc (ir2-environment-return-pc env))) (multiple-value-bind (arg-tns result-tns) (x86::make-typed-call-tns ftype) (declare (ignore arg-tns)) (let ((val-tns (loop for arg in args collect (continuation-tn node block arg)))) (vop* known-return node block (old-fp return-pc (reference-tn-list val-tns nil)) (nil) result-tns))))) (defknown cl-user::%typed-call (cons &rest *) *) ;; %typed-call gets transformed to %%typed-call. We create the the ;; load-time-value-form and some type declarations. ;; ;; FIXME: The :adapt option should probably be a separate argument, ;; but for now it's encoded in the function name. ;; ;; FIXME: why is the truly-the still needed? Shouldn't ;; load-time-value be able use the declared return type of ;; find-typed-entry-point? (deftransform cl-user::%typed-call ((name &rest args) * * :important t) (unless (constant-continuation-p name) (error "Function name must be constant.")) (destructuring-bind (&key typed-entry-point adapt) (continuation-value name) (let* ((name typed-entry-point) (ftype (the function-type (info function type `(:typed-entry-point ,name)))) (vars (loop for nil in (function-type-required ftype) collect (gensym)))) `(lambda (name ,@vars) (declare (ignore name) ,@(loop for type in (function-type-required ftype) for var in vars collect `(type ,(type-specifier type) ,var))) (cl-user::%%typed-call (locally (declare (optimize speed)) (truly-the fdefn (load-time-value (cl-user::find-typed-entry-point '(:typed-entry-point ,name) ',(type-specifier ftype) ',adapt) t))) ',ftype ,@vars))))) (defknown cl-user::%%typed-call (fdefn function-type &rest *) *) (defoptimizer (cl-user::%%typed-call derive-type) ((fdefn type &rest args)) (let ((ftype (continuation-value type))) (function-type-returns ftype))) (defoptimizer (cl-user::%%typed-call ltn-annotate) ((fdefn type &rest args) node policy) (setf (basic-combination-info node) :funny) (setf (node-tail-p node) nil) (annotate-ordinary-continuation fdefn policy) (dolist (arg args) (annotate-ordinary-continuation arg policy))) ;; A typed-call we is similar to a known-local-call. ;; make-typed-call-tns chooses the argument representation. ;; ;; 1. allocate a frame ;; 2. moving/coercing the arguments and done the CALL-TYPED-NAMED vop ;; 3. move-continuation-result moves the results in the right place ;; ;; FIXME: all return values are in registers move-continuation-result ;; can probably not handle stack arguments. ;; ;; FIXME: the ALLOCATE-FRAME vop isn't quite right for us, see the ;; comment in call-typed-named. ;; ;; FIXME: currently can't make tail calls. We would need to guarantee ;; that the types match. (defoptimizer (cl-user::%%typed-call ir2-convert) ((fdefn type &rest args) node block) (let ((ftype (the function-type (continuation-value type))) (cont (node-cont node))) (multiple-value-bind (arg-tns result-tns fp stack-frame-size nfp number-stack-frame-size) (x86::make-typed-call-tns ftype) (declare (ignore number-stack-frame-size)) (let ((fdefn-tn (continuation-tn node block fdefn)) (cont-tns (loop for arg in args collect (continuation-tn node block arg)))) (vop allocate-frame node block nil fp nfp) (vop* cl-user::call-typed-named node block (fp nfp fdefn-tn (reference-tn-list cont-tns nil)) ((reference-tn-list result-tns t)) arg-tns stack-frame-size) (move-continuation-result node block result-tns cont))))) ;; Tests for typed calling convention. (in-package :cl-user) (eval-when (:compile-toplevel) (fmakunbound '(:typed-entry-point fid)) (fmakunbound '(:typed-entry-point sum-prod)) (fmakunbound '(:typed-entry-point cons-sum)) (fmakunbound '(:typed-entry-point id))) ;; First we create a declaration (declaim (ftype (function (double-float) double-float) (:typed-entry-point fid))) ;; then the typed entry point (defun (:typed-entry-point fid) (f) f) ;; then the "normal" entry point, which calls the typed version. (defun fid (f) (%typed-call '(:typed-entry-point fid :adapt nil) f)) ;; and also a compiler macro so that named calls automtomatically call ;; the typed entry point. (define-compiler-macro fid (f) `(%typed-call '(:typed-entry-point fid :adapt t) ,f)) (defun test-fid-1 () (assert (= (fid 1d0) 1d0))) ;; (fid 1d0) ;; (disassemble '(:typed-entry-point fid)) ;; (disassemble 'fid) ;; Let's wrap that up as macro. In a more polished implementation ;; DEFUN could do all this. (defmacro defun-typed (name (&rest args) ((&rest arg-types) return-type) &body body) `(progn (declaim (ftype (function ,arg-types ,return-type) (:typed-entry-point ,name))) (defun (:typed-entry-point ,name) ,args (the ,return-type . ,body)) (defun ,name ,args (%typed-call '(:typed-entry-point ,name :adapt nil) . ,args)) (define-compiler-macro ,name ,args `(%typed-call '(:typed-entry-point ,',name :adapt t) . ,(list . ,args))))) (defun-typed f+ (x y) ((double-float double-float) double-float) (+ x y)) ;; (disassemble '(:typed-entry-point f+)) (defun-typed sum-prod (x y z u v w) ((double-float double-float double-float double-float double-float double-float) (values double-float double-float)) (values (+ x y z u v w) (* x y z u v w))) ;; (disassemble '(:typed-entry-point sum-prod)) ;; (ext:info function linkage '(:typed-entry-point sum-prod)) (defun test-sum-prod-1 () (multiple-value-bind (sum prod) (sum-prod 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0))) (assert (= prod (* 2d0 3d0 4d0 5d0 6d0 7d0))))) (defun test-sum-prod-2 () (multiple-value-bind (sum) (sum-prod 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0))))) (defun test-sum-prod-3-aux (x y z u v w) (sum-prod x y z u v w)) (defun test-sum-prod-3 () (multiple-value-bind (sum prod) (test-sum-prod-3-aux 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0))) (assert (= prod (* 2d0 3d0 4d0 5d0 6d0 7d0))))) (defun-typed id (x) ((t) t) x) ;; (id 1) (defun test-id-1 () (assert (eql (id 1) 1))) (defun test-id-2 () (assert (eql (id 1d0) 1d0))) ;; This one has both boxed and unboxed arguments. (defun-typed cons-sum (o1 f1 o2 f2) ((t double-float t double-float) (values cons double-float)) (values (cons o1 o2) (+ f1 f2))) (defun test-cons-sum-1 () (multiple-value-bind (cons sum) (cons-sum 1 2d0 3 4d0) (assert (equal cons '(1 . 3))) (assert (= sum (+ 2d0 4d0))))) ;; SUM will be redefined with different types to exercise the linker a ;; bit. (defun-typed sum (f1 f2) ((double-float double-float) double-float) (+ f1 f2)) (defun test-sum-1 () (assert (= (sum 2d0 3d0) 5d0))) (defun-typed sum (f1 f2) ((t t) t) (+ f1 f2)) (defun test-sum-2 () (assert (= (sum 2d0 3d0) 5d0))) (defun test-sum-3 () (handler-case (progn (sum 2 3) (assert nil)) (type-error (c) (assert (equal (type-error-datum c) 3)) (assert (eq (type-error-expected-type c) 'double-float))))) (defun-typed sum (f1 f2) ((t double-float) double-float) (+ f1 f2)) (defun test-sum-4 () (assert (= (sum 2d0 3d0) 5d0))) (defun test-sum-5 () (assert (= (sum 2 3d0) 5d0))) (defun test-sum-6 () (handler-case (progn (sum #c(0 1) 3d0) (assert nil)) (type-error (c) (assert (equal (type-error-datum c) #c(3d0 1d0))) (assert (eq (type-error-expected-type c) 'double-float))))) #+(or) (defun foo () (labels ((sum (x y) (+ x y))) (declare (ftype (function (double-float double-float) double-float) sum)) (list (sum 2d0 4d0) (sum 2 4)))) (defun tests () (test-fid-1) (test-sum-prod-1) (test-sum-prod-2) (test-sum-prod-3) (test-id-1) (test-id-2) (test-cons-sum-1) (test-sum-1) (test-sum-2) (test-sum-3) (test-sum-4) (test-sum-5) (test-sum-6) ) ;; (tests)
Cool. I have not really looked at the code, but how generalizable would this approach be? I.e. why stop at FTYPEd floats? I am asking because of Pascal Costanza's recent post on the PRO mailing list about stack-allocation of values (a DYNAMIC-EXTENT on steroids) Cheers -- Marco On Jun 9, 2012, at 09:46 , Helmut Eller wrote:
Here is an idea for a calling convention that supports unboxed floats:
1. When DEFUN compiles a function that has a ftype declaration involving floats, it creates a special entry-point that accepts unboxed arguments.
2. Named calls to such functions are compiled so that the arguments are represented as indicated by the ftype declaration.
3. We also keep a linker table to connect those special entry-points to call-sites. When the function gets redefined, we go through the existing call-sites and patch them so that the new definition is called. If the type of the new definition is different than the one expected by the call-site, we create an "adapter function" that converts the representation to the new type or if the conversion is not possible signals an error.
Attached is a prototype implementation to illustrate how this could be done. See the comments in the file for the main points. A more polished implementation would probably change more places then MAKE-XEP-LAMBDA.
What do you think: is such a calling convention worth having?
Helmut
;; Toy implementation for a calling convention with unboxed arguments.
(in-package cl-user)
;; Conceptionally we'd like to create functions that have two entry ;; points: one that accepts only arguments of a specific ;; type/representation and a general entry point that accept all ;; possible types as a fallback. ;; ;; A "typed entry point" is an entry point that only accepts arguments ;; of that specfic type and makes no typechecks. There is a direct ;; mapping from the function's type to the representation of the ;; arguments and return values. So if we know the type, e.g. from a ;; ftype declaration, we can predict the representation. ;; ;; This implementation actually uses two functions instead of one ;; function with two entry points.
;; We use function names like (:typed-entry-point foo) for functions ;; which have a typed entry point. (ext:define-function-name-syntax :typed-entry-point (name) (ext:valid-function-name-p (cadr name)))
;; A callsite describes what we need to know to link/re-link a call to ;; a typed entry point. The TYPE slot is the type of the callsite. We ;; use FDEFN to call the entry. In theory, we could patch the call ;; instruction with the correct address of the entry point, but code ;; patching is always tricky, so for now we use an indirection through ;; fdefn objects and only patch the address in the fdefn. ;; ;; When a function gets redefined we patch existing callsites so that ;; the new function gets called. We record the type of the callsite ;; so that we can verify that the type of the new definition matches ;; existing callsites. If the types match, we simply patch the fdefn ;; and are done. If the types don't match we generate an "adapter ;; function". The job of an adapter is it to take arguments of some ;; particular type and box/unbox them so that it becomes possible to ;; call the new definition. The current adapter implementation would ;; sometimes create endless recursions, so we have and ADAPT flag to ;; disable the creation of adapters for some callsites. ;; (defstruct callsite (type (ext:required-argument) :type kernel:function-type :read-only t) (fdefn (ext:required-argument) :type kernel:fdefn :read-only t) (adapt (ext:required-argument) :type boolean :read-only t))
(defstruct linkage (callsites nil :type (or callsite list)) (adapters nil :type (or function list)))
;; For functions with typed entry points we maintiain a list of ;; callsites and record that in the info db. ;; ;; If we use typed entry points only for functions with ftype ;; declarations then this table is probably relatively small. (ext:define-info-type function linkage linkage)
(defun listify (x) (if (listp x) x (list x)))
(defmacro push-unlistified (new-value (reader object)) `(let ((new-value ,new-value) (object ,object)) (let ((old-value (,reader ,object))) (setf (,reader object) (typecase old-value (null new-value) (cons (cons new-value old-value)) (t (list new-value old-value)))))))
;; find-typed-entry-point is called at load-time and returns the ;; fdefn that should be called. ;; ;; 1. We go through the list of existing callsites to see if we ;; already have one with the same type and reuse it if possible. ;; ;; 2. We look at the current definition. If the types match, we ;; create a callsite object, store it in the info db, and return the ;; fdefn. ;; ;; 3. Now we know that the types don't match we need to use adapters. ;; First again, we look at existing adapters and reuse them if possible. ;; ;; 4. An adapter is created that boxes the arguments and forwards the ;; call to the "normal" entry point. ;; ;; 5. If we are not allowed to create adapters, we look again at the ;; current definition to handle the case where no current definition ;; exists. If so, we return an empty fdefn object that will call the ;; undefined-tramp assembly routine. ;; ;; 6. If all else fails we link the callsite to our error handler. ;; (declaim (ftype (function (t t boolean) kernel:fdefn) find-typed-entry-point)) (defun find-typed-entry-point (name callsite-typespec adapt) (let* ((cs-type (kernel:specifier-type callsite-typespec)) (linkage (multiple-value-bind (info foundp) (ext:info function linkage name) (cond (foundp info) (t (setf (ext:info function linkage name) (make-linkage))))))) (cond ((dolist (cs (listify (linkage-callsites linkage))) (let* ((ep-type (callsite-type cs))) (when (and (function-types-compatible-p cs-type ep-type) (eq (callsite-adapt cs) adapt)) (return (callsite-fdefn cs)))))) ((let ((fdefn (lisp::fdefinition-object name nil))) (when fdefn (let ((fun (kernel:fdefn-function fdefn))) (when fun (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (let* ((aname (kernel:%function-name fun)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) fdefn)))))))) ((and adapt (dolist (fun (listify (linkage-adapters linkage))) (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (let* ((aname (kernel:%function-name fun)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) (return fdefn))))))) ((and adapt (let* ((fun (generate-adapter-function cs-type (second name))) (fdefn (kernel:make-fdefn (kernel:%function-name fun))) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified fun (linkage-adapters linkage)) (push-unlistified cs (linkage-callsites linkage)) fdefn))) ((and (not adapt) (or (not (lisp::fdefinition-object name nil)) (not (kernel:fdefn-function (lisp::fdefinition-object name nil))))) (let* ((aname `(:typed-entry-point #:undefined)) (fdefn (kernel:make-fdefn aname)) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (push-unlistified cs (linkage-callsites linkage)) fdefn)) (t (let* ((fun (generate-adapter-function cs-type 'linkage-error)) (fdefn (kernel:make-fdefn (kernel:%function-name fun))) (cs (make-callsite :type cs-type :fdefn fdefn :adapt adapt))) (setf (kernel:fdefn-function fdefn) fun) (push-unlistified cs (linkage-callsites linkage)) fdefn)))))
(defun linkage-error (&rest args) (declare (ignore args)) (error "Linking callsite to typed-entry-point failed"))
;; Generate an adapter function that changes the representation of the ;; arguments (specified with FTYPE) and forwards the call to NAME. ;; The adapter has also a typed entry point. It should also check ;; that the values returned by NAME match FTYPE. ;; ;; In practice, the compiler infered type may not match exactly FTYPE, ;; even if we add lotso declarations. This is annyoingly brittle. (defun generate-adapter-function (ftype name) (let* ((atypes (kernel:function-type-required ftype)) (tmps (loop for nil in atypes collect (gensym))) (fname `(:typed-entry-point :boxing-adapter ,(make-symbol (string name)))) (ftypespec (kernel:type-specifier ftype))) (proclaim `(ftype ,ftypespec ,fname)) (compile fname `(lambda ,tmps (declare ,@(loop for tmp in tmps for type in atypes collect `(type ,(kernel:type-specifier type) ,tmp))) (the ,(kernel:type-specifier (kernel:function-type-returns ftype)) (funcall (function ,name) . ,tmps)))) (let ((fun (fdefinition fname))) (unless (eq name 'linkage-error) (fix-ftype fun ftype)) fun)))
(defun fix-ftype (fun ftype) (let ((etype (kernel:extract-function-type fun))) (unless (function-types-compatible-p ftype etype t) (break))) fun)
;; This is our rule to decide when a type at a callsite matches the ;; type of the entry point. ;; ;; 1. The arguments at the callsite should be subtypes of the ;; arguments at the entry point. ;; ;; 2. The return value at the callsite should be supertypes of the ;; return values at the entry point. ;; ;; 3. The representations must agree. Representations should probably ;; decided in the backend, but for now we assume only double-floats ;; are unboxed. (defun function-types-compatible-p (callsite-type entrypoint-type &optional ignore-representation) (flet ((return-types (ftype) (let ((type (kernel:function-type-returns ftype))) (cond ((kernel:values-type-p type) (assert (and (not (kernel:values-type-rest type)) (not (kernel:values-type-keyp type)))) (kernel:values-type-required type)) (t (list type))))) (ptype= (type1 type2) (let ((double-float (kernel:specifier-type 'double-float))) (cond (ignore-representation t) ((kernel:type= type1 double-float) (kernel:type= type2 double-float)) ((kernel:type= type2 double-float) nil) (t t))))) (and (every #'kernel:csubtypep (kernel:function-type-required callsite-type) (kernel:function-type-required entrypoint-type)) (every #'ptype= (kernel:function-type-required callsite-type) (kernel:function-type-required entrypoint-type)) (or (and (every #'kernel:csubtypep (return-types entrypoint-type) (return-types callsite-type)) (every #'ptype= (return-types entrypoint-type) (return-types callsite-type))) (kernel:type= (kernel:function-type-returns entrypoint-type) (kernel:specifier-type 'nil))))))
;; check-function-redefinition is used as setf-fdefinition-hook. ;; We go through all existing callsites and ;; ;; 1. If the new type matches, we patch the callsite with the new function. ;; ;; 2. If the types don't match and if allowed, we redirect the ;; callsite to and adapter. ;; ;; 3. If the callsites doesn't want adapters we link the callsite to ;; an error handler. (defun check-function-redefinition (name new-fun) (when (and (consp name) (eq (car name) :typed-entry-point) (not (eq (cadr name) :boxing-adapter))) (multiple-value-bind (linkage foundp) (ext:info function linkage name) (when foundp (let ((new-type (kernel:extract-function-type new-fun))) (dolist (cs (listify (linkage-callsites linkage))) (let ((cs-type (callsite-type cs)) (fdefn (callsite-fdefn cs))) (cond ((function-types-compatible-p cs-type new-type) (patch-fdefn fdefn new-fun)) ((and (callsite-adapt cs) (dolist (fun (listify (linkage-adapters linkage))) (let ((ep-type (kernel:extract-function-type fun))) (when (function-types-compatible-p cs-type ep-type) (patch-fdefn fdefn fun) (return t)))))) ((callsite-adapt cs) (let ((fun (generate-adapter-function cs-type (second name)))) (push-unlistified fun (linkage-adapters linkage)) (patch-fdefn fdefn fun))) (t (format nil "~3t") (warn "New type of ~s incompatible with callsite:~%~ ~2t new-type: ~s~%~ ~2t cs-type: ~s" name (kernel:type-specifier new-type) (kernel:type-specifier cs-type)) (let ((fun (generate-adapter-function cs-type 'linkage-error))) (patch-fdefn fdefn fun)))))))))))
;; This lets us set the name in fdefn objects. We use that for ;; debugging. (eval-when (:compile-toplevel) (c:defknown set-fdefn-name (kernel:fdefn t) t) (c:def-setter set-fdefn-name vm:fdefn-name-slot vm:other-pointer-type))
(defun patch-fdefn (fdefn new-fun) (setf (kernel:fdefn-function fdefn) new-fun) (let ((name (kernel:%function-name new-fun))) (set-fdefn-name fdefn name)) fdefn)
(pushnew 'check-function-redefinition ext:*setf-fdefinition-hook*)
(in-package x86)
;; make-typed-call-tns chooses the representation for a function type. ;; This is similar to c::make-call-out-tns and should probably also be ;; a vm-support-routine. ;; ;; The current convention passes double-floats unboxed and all other ;; types remain boxed. Registers XMM4-XMM7 are used for the first 4 ;; double arguments. Boxed values are passed in standard locations. ;; ;; Returning values on the stack is currenlty not implemented, so all ;; return values must fit in registers. (defun make-typed-call-tns (ftype) (declare (type function-type ftype)) (labels ((ptype (name) (primitive-type-or-lose name *backend*)) (double-float-arg (state) (cond ((<= (getf state :xmms-reg) xmm7-offset) (make-wired-tn (ptype 'double-float) double-reg-sc-number (prog1 (getf state :xmms-reg) (incf (getf state :xmms-reg))))) (t (make-wired-tn (ptype 'double-float) double-stack-sc-number (prog1 (getf state :frame-size) (incf (getf state :frame-size) 2)))))) (boxed-arg (state) (cond ((<= (getf state :reg-args) register-arg-count) (let ((n (getf state :reg-args))) (incf (getf state :reg-args)) (x86-standard-argument-location n))) (t (make-wired-tn (ptype 't) control-stack-sc-number (prog1 (getf state :frame-size) (incf (getf state :frame-size) 1)))))) (double-float-type-p (type) (and (numeric-type-p type) (eq (numeric-type-class type) 'float) (eq (numeric-type-format type) 'double-float))) (arg-tn (type state) (cond ((double-float-type-p type) (double-float-arg state)) (t (boxed-arg state)))) (ret-tn (type state) (let ((tn (arg-tn type state))) (assert (member (sc-name (tn-sc tn)) '(double-reg descriptor-reg))) tn))) (let* ((arg-state (list :frame-size 2 :xmms-reg xmm4-offset :reg-args 0)) (ret-state (list :frame-size 2 :xmms-reg xmm4-offset :reg-args 0)) (returns (function-type-returns ftype)) (rtypes (typecase returns (values-type (values-type-required returns)) (t (list returns))))) (values (loop for type in (function-type-required ftype) collect (arg-tn type arg-state)) (loop for type in rtypes collect (ret-tn type ret-state)) (x86-make-stack-pointer-tn) (max (getf arg-state :frame-size) (getf ret-state :frame-size)) (x86-make-number-stack-pointer-tn) 0))))
;; This VOP performs the call. Note the (:move-args :local-call) ;; which, hopefully, coerces and moves arguments to the correct ;; representation. (define-vop (cl-user::call-typed-named) (:args (new-fp) (new-nfp) (fdefn :scs (descriptor-reg control-stack) :target eax) (args :more t :scs (descriptor-reg))) (:results (results :more t)) (:save-p t) (:move-args :local-call) (:vop-var vop) (:info arg-locs real-frame-size) (:ignore new-nfp args arg-locs results) (:temporary (:sc descriptor-reg :offset eax-offset) eax) (:generator 30 ;; FIXME: allocate the real frame size here. We had to emit ;; ALLOCATE-FRAME before this vop so that we can use the ;; (:move-args :local-call) option here. Without the ;; ALLOCATE-FRAME vop we get a failed assertion. (inst lea esp-tn (make-ea :dword :base new-fp :disp (- (* real-frame-size word-bytes))))
;; Move fdefn to eax before switching frames. (move eax fdefn)
;; Write old frame pointer (epb) into new frame. (storew ebp-tn new-fp (- (1+ ocfp-save-offset)))
;; Switch to new frame. (move ebp-tn new-fp)
(note-this-location vop :call-site)
;; Load address out of fdefn and call it. (inst call (make-ea :dword :base eax :disp (- (* fdefn-raw-addr-slot word-bytes) other-pointer-type)))
))
(in-package c)
;; To generate typed entry points we modify MAKE-XEP-LAMBDA. ;; Currently we use the function name as marker: if the name looks ;; like (:typed-entry-point <foo>) then we know we need to our thing. ;; ;; We lookup the currently declared type and create wrap code around ;; the main function. Our code receives and returns the values with ;; typed calling convention. The n-supplied argument is not used, but ;; the rest of the compiler want it to be there (generates no code ;; thogh). ;; ;; FIXME: instead of using the function name as marker it might be ;; better to have a declaration to request typed entry points. ;; ;; FIXME: %return-results-with-type is currently not inlined ;; (let-converted) proberly and instead generates local-call. Getting ;; the optimisations correctly dones also seems rather brittle. (fwrappers:define-fwrapper make-xep-lambda-wrapper (fun) (declare (type functional fun)) (cond ((and (consp (functional-name fun)) (eq (car (functional-name fun)) :typed-entry-point)) (let* ((ftype (the function-type (info function type (functional-name fun)))) (fspec (type-specifier ftype)) (returns (function-type-returns ftype)) (rtypes (typecase returns (values-type (values-type-required returns)) (t (list returns)))) (n-supplied (gensym)) (temps (loop for nil in (lambda-vars fun) collect (gensym))) (results (loop for nil in rtypes collect (gensym)))) `(lambda (,n-supplied) (declare (ignore ,n-supplied)) (multiple-value-bind ,temps (cl-user::%receive-arguments-with-type ',fspec) (multiple-value-bind ,results (the ,(type-specifier returns) (%funcall ,fun . ,temps)) (cl-user::%return-results-with-type ',fspec . ,results)))))) (t (fwrappers:call-next-function))))
(fwrappers:fwrap 'make-xep-lambda 'make-xep-lambda-wrapper)
;; %receive-arguments-with-type knows how to access the arguments from ;; the passing locations. (defknown cl-user::%receive-arguments-with-type (cons) *)
(defoptimizer (cl-user::%receive-arguments-with-type derive-type) ((ftypespec)) (unless (constant-continuation-p ftypespec) (error "Function-type must be constant.")) (let ((ftype (the function-type (specifier-type (continuation-value ftypespec))))) (values-specifier-type `(values . ,(mapcar #'type-specifier (function-type-required ftype))))))
;; Here we generate the IR2 to receive arguments. make-typed-call-tns ;; chooses the representation and move-continuation-result does the ;; rest. (defoptimizer (cl-user::%receive-arguments-with-type ir2-convert) ((ftypespec) node block) (let* ((ftype (the function-type (specifier-type (continuation-value ftypespec)))) (cont (node-cont node)) (arg-tns (x86::make-typed-call-tns ftype))) (move-continuation-result node block arg-tns cont)))
(defknown cl-user::%return-results-with-type (cons &rest *) nil)
(defoptimizer (cl-user::%return-results-with-type ltn-annotate) ((type &rest args) node policy) (dolist (arg args) (annotate-ordinary-continuation arg policy)))
;; %return-results-with-type is similar to a known-return just that we ;; let make-typed-call-tns choose the representation. (defoptimizer (cl-user::%return-results-with-type ir2-convert) ((type &rest args) node block) (let* ((ftype (the function-type (specifier-type (continuation-value type)))) (home (lambda-home (lexenv-lambda (node-lexenv node)))) (env (environment-info (lambda-environment home))) (old-fp (ir2-environment-old-fp env)) (return-pc (ir2-environment-return-pc env))) (multiple-value-bind (arg-tns result-tns) (x86::make-typed-call-tns ftype) (declare (ignore arg-tns)) (let ((val-tns (loop for arg in args collect (continuation-tn node block arg)))) (vop* known-return node block (old-fp return-pc (reference-tn-list val-tns nil)) (nil) result-tns)))))
(defknown cl-user::%typed-call (cons &rest *) *)
;; %typed-call gets transformed to %%typed-call. We create the the ;; load-time-value-form and some type declarations. ;; ;; FIXME: The :adapt option should probably be a separate argument, ;; but for now it's encoded in the function name. ;; ;; FIXME: why is the truly-the still needed? Shouldn't ;; load-time-value be able use the declared return type of ;; find-typed-entry-point? (deftransform cl-user::%typed-call ((name &rest args) * * :important t) (unless (constant-continuation-p name) (error "Function name must be constant.")) (destructuring-bind (&key typed-entry-point adapt) (continuation-value name) (let* ((name typed-entry-point) (ftype (the function-type (info function type `(:typed-entry-point ,name)))) (vars (loop for nil in (function-type-required ftype) collect (gensym)))) `(lambda (name ,@vars) (declare (ignore name) ,@(loop for type in (function-type-required ftype) for var in vars collect `(type ,(type-specifier type) ,var))) (cl-user::%%typed-call (locally (declare (optimize speed)) (truly-the fdefn (load-time-value (cl-user::find-typed-entry-point '(:typed-entry-point ,name) ',(type-specifier ftype) ',adapt) t))) ',ftype ,@vars)))))
(defknown cl-user::%%typed-call (fdefn function-type &rest *) *)
(defoptimizer (cl-user::%%typed-call derive-type) ((fdefn type &rest args)) (let ((ftype (continuation-value type))) (function-type-returns ftype)))
(defoptimizer (cl-user::%%typed-call ltn-annotate) ((fdefn type &rest args) node policy) (setf (basic-combination-info node) :funny) (setf (node-tail-p node) nil) (annotate-ordinary-continuation fdefn policy) (dolist (arg args) (annotate-ordinary-continuation arg policy)))
;; A typed-call we is similar to a known-local-call. ;; make-typed-call-tns chooses the argument representation. ;; ;; 1. allocate a frame ;; 2. moving/coercing the arguments and done the CALL-TYPED-NAMED vop ;; 3. move-continuation-result moves the results in the right place ;; ;; FIXME: all return values are in registers move-continuation-result ;; can probably not handle stack arguments. ;; ;; FIXME: the ALLOCATE-FRAME vop isn't quite right for us, see the ;; comment in call-typed-named. ;; ;; FIXME: currently can't make tail calls. We would need to guarantee ;; that the types match. (defoptimizer (cl-user::%%typed-call ir2-convert) ((fdefn type &rest args) node block) (let ((ftype (the function-type (continuation-value type))) (cont (node-cont node))) (multiple-value-bind (arg-tns result-tns fp stack-frame-size nfp number-stack-frame-size) (x86::make-typed-call-tns ftype) (declare (ignore number-stack-frame-size)) (let ((fdefn-tn (continuation-tn node block fdefn)) (cont-tns (loop for arg in args collect (continuation-tn node block arg)))) (vop allocate-frame node block nil fp nfp) (vop* cl-user::call-typed-named node block (fp nfp fdefn-tn (reference-tn-list cont-tns nil)) ((reference-tn-list result-tns t)) arg-tns stack-frame-size) (move-continuation-result node block result-tns cont))))) ;; Tests for typed calling convention.
(in-package :cl-user)
(eval-when (:compile-toplevel) (fmakunbound '(:typed-entry-point fid)) (fmakunbound '(:typed-entry-point sum-prod)) (fmakunbound '(:typed-entry-point cons-sum)) (fmakunbound '(:typed-entry-point id)))
;; First we create a declaration (declaim (ftype (function (double-float) double-float) (:typed-entry-point fid)))
;; then the typed entry point (defun (:typed-entry-point fid) (f) f)
;; then the "normal" entry point, which calls the typed version. (defun fid (f) (%typed-call '(:typed-entry-point fid :adapt nil) f))
;; and also a compiler macro so that named calls automtomatically call ;; the typed entry point. (define-compiler-macro fid (f) `(%typed-call '(:typed-entry-point fid :adapt t) ,f))
(defun test-fid-1 () (assert (= (fid 1d0) 1d0)))
;; (fid 1d0) ;; (disassemble '(:typed-entry-point fid)) ;; (disassemble 'fid)
;; Let's wrap that up as macro. In a more polished implementation ;; DEFUN could do all this. (defmacro defun-typed (name (&rest args) ((&rest arg-types) return-type) &body body) `(progn (declaim (ftype (function ,arg-types ,return-type) (:typed-entry-point ,name))) (defun (:typed-entry-point ,name) ,args (the ,return-type . ,body)) (defun ,name ,args (%typed-call '(:typed-entry-point ,name :adapt nil) . ,args)) (define-compiler-macro ,name ,args `(%typed-call '(:typed-entry-point ,',name :adapt t) . ,(list . ,args)))))
(defun-typed f+ (x y) ((double-float double-float) double-float) (+ x y))
;; (disassemble '(:typed-entry-point f+))
(defun-typed sum-prod (x y z u v w) ((double-float double-float double-float double-float double-float double-float) (values double-float double-float)) (values (+ x y z u v w) (* x y z u v w)))
;; (disassemble '(:typed-entry-point sum-prod)) ;; (ext:info function linkage '(:typed-entry-point sum-prod))
(defun test-sum-prod-1 () (multiple-value-bind (sum prod) (sum-prod 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0))) (assert (= prod (* 2d0 3d0 4d0 5d0 6d0 7d0)))))
(defun test-sum-prod-2 () (multiple-value-bind (sum) (sum-prod 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0)))))
(defun test-sum-prod-3-aux (x y z u v w) (sum-prod x y z u v w))
(defun test-sum-prod-3 () (multiple-value-bind (sum prod) (test-sum-prod-3-aux 2d0 3d0 4d0 5d0 6d0 7d0) (assert (= sum (+ 2d0 3d0 4d0 5d0 6d0 7d0))) (assert (= prod (* 2d0 3d0 4d0 5d0 6d0 7d0)))))
(defun-typed id (x) ((t) t) x)
;; (id 1)
(defun test-id-1 () (assert (eql (id 1) 1)))
(defun test-id-2 () (assert (eql (id 1d0) 1d0)))
;; This one has both boxed and unboxed arguments. (defun-typed cons-sum (o1 f1 o2 f2) ((t double-float t double-float) (values cons double-float)) (values (cons o1 o2) (+ f1 f2)))
(defun test-cons-sum-1 () (multiple-value-bind (cons sum) (cons-sum 1 2d0 3 4d0) (assert (equal cons '(1 . 3))) (assert (= sum (+ 2d0 4d0)))))
;; SUM will be redefined with different types to exercise the linker a ;; bit. (defun-typed sum (f1 f2) ((double-float double-float) double-float) (+ f1 f2))
(defun test-sum-1 () (assert (= (sum 2d0 3d0) 5d0)))
(defun-typed sum (f1 f2) ((t t) t) (+ f1 f2))
(defun test-sum-2 () (assert (= (sum 2d0 3d0) 5d0)))
(defun test-sum-3 () (handler-case (progn (sum 2 3) (assert nil)) (type-error (c) (assert (equal (type-error-datum c) 3)) (assert (eq (type-error-expected-type c) 'double-float)))))
(defun-typed sum (f1 f2) ((t double-float) double-float) (+ f1 f2))
(defun test-sum-4 () (assert (= (sum 2d0 3d0) 5d0)))
(defun test-sum-5 () (assert (= (sum 2 3d0) 5d0)))
(defun test-sum-6 () (handler-case (progn (sum #c(0 1) 3d0) (assert nil)) (type-error (c) (assert (equal (type-error-datum c) #c(3d0 1d0))) (assert (eq (type-error-expected-type c) 'double-float)))))
#+(or) (defun foo () (labels ((sum (x y) (+ x y))) (declare (ftype (function (double-float double-float) double-float) sum)) (list (sum 2d0 4d0) (sum 2 4))))
(defun tests () (test-fid-1) (test-sum-prod-1) (test-sum-prod-2) (test-sum-prod-3) (test-id-1) (test-id-2) (test-cons-sum-1) (test-sum-1) (test-sum-2) (test-sum-3) (test-sum-4) (test-sum-5) (test-sum-6) )
;; (tests) _______________________________________________ cmucl-imp mailing list cmucl-imp@cmucl.cons.org http://lists.zs64.net/mailman/listinfo/cmucl-imp
-- Marco Antoniotti
* Marco Antoniotti [2012-06-10 06:39] writes:
Cool.
I have not really looked at the code, but how generalizable would this approach be? I.e. why stop at FTYPEd floats?
For named calls we could delay some decisions to link-time. At link time we know more about the entry-point/callee and don't need to be quite so conservative. E.g. we could perform the argument count check only at link time and then skip it at runtime. Also optional/keyword parsing could be avoided or at least be more efficient. If we can patch all callsites we could also use direct calls instead of indirect calls (though that would probably be problematic GC-wise). It might also be useful for generic functions, e.g. if the static type of a call-site determines a unique method we could link the two directly. We can't easily change the representations that are used immediately before the callsite, though. E.g. situations like this are problematic: (declaim (ftype (function (double-float) double-float) f)) (defun g () (f 1)) (declaim (ftype (function (number) number) f)) (defun h () (f 1)) Although G and H look the same, in G execution will never reach the callsite because at compile-time we have committed to unboxed double-float representation. Since there is no way to convert an integer to an unboxed double-float we will generate code that signals an error before the call. In H we choose boxed representation and the call works just fine. I think the link-time tricks are quite neat, but what really matters are representation choices and those have to be made at compile-time. Therefore I focus on (trustworthy) ftype declarations.
I am asking because of Pascal Costanza's recent post on the PRO mailing list about stack-allocation of values (a DYNAMIC-EXTENT on steroids)
I don't know the details of that message, but stack-allocation is also a kind of representation choice. To make it work efficiently we'd also need to make assumptions and choices at compile time. If those assumptions turn out to be wrong, linker tricks will probably not suffice. I guess a JIT compiler with deoptimization support is needed for the hard cases. Helmut
On 6/9/12 12:46 AM, Helmut Eller wrote:
Here is an idea for a calling convention that supports unboxed floats:
This is pretty cool. I've often wanted to be able to do that. I haven't looked at your code yet, but I was wondering how this compares with block compilation. I'm pretty sure block compilation allows functions to jump directly to the no-arg-parsing entry point, so unboxed objects can be passed directly from the caller to the callee. Functions not in the block-compilation block can still call those functions, but those callers use the regular entry point. Ray
* Raymond Toy [2012-06-10 17:13] writes:
On 6/9/12 12:46 AM, Helmut Eller wrote:
Here is an idea for a calling convention that supports unboxed floats:
This is pretty cool. I've often wanted to be able to do that.
I haven't looked at your code yet, but I was wondering how this compares with block compilation. I'm pretty sure block compilation allows functions to jump directly to the no-arg-parsing entry point, so unboxed objects can be passed directly from the caller to the callee. Functions not in the block-compilation block can still call those functions, but those callers use the regular entry point.
Yes, calls inside a block-compilation unit and calls to labels/flets can work with unboxed objects. The new calling convention would us allow to reach the "no-arg-parsing entry point" from outside without going through the regular entry point. The disassembler seems to know about the no-arg-parsing entry point from the debug info. Maybe the debug info is good enough for the calling convention, but maybe not. I guess the compiler is free to pass the arguments to local functions where and in whatever format seems appropriate, but for a calling convention we probably want some more explicit rules. There is also a bit of a problem a the return point. The regular entry point usually tail-calls the no-arg-parsing entry point. That implies that the two functions have the same representation for the return values. With the new convention we don't want that, or at least not all the time. The regular entry point should box the return values after the the no-arg-parsing entry point returns, i.e. tail-calls can't be used. Helmut
* Helmut Eller [2012-06-09 07:46] writes:
Attached is a prototype implementation to illustrate how this could be done. See the comments in the file for the main points. A more polished implementation would probably change more places then MAKE-XEP-LAMBDA.
I've done some more work on this and put it on github: https://github.com/ellerh/cmucl/tree/tcall-convention The current version recognizes a new declaration calling-convention: (defun f+ (x y) (declare (double-float x y) (c::calling-convention :typed)) (+ x y)) tells the compiler that f+ should have the special entry point for unboxed floats. The function will also have a regular XEP that will in turn call the typed entry point (with a local call). Calling f+ should now transparently choose the appropriate entry point (i.e. named calls use the typed entry point others the XEP). Adapter functions are now created like so (lambda (x y) (declare (double-float x y) (c::calling-convention :typed-no-xep)) (the double-float (f+ x y))) the :typed-no-xep version doesn't create regular XEPs as those would never be used for adapters. I could use a bit of feedback regarding the names. Currently I use "typed calling convention" resp. "typed entry point" and for the regular entry point I use "external entry point" or XEP. For the :typed-no-xep case the naming is a bit misleading as the typed entry point is the only entry point and in the IR has the lambda-kind set to :external, i.e. the XEP is a typed entry. Is there a better word for it? Allegro CL seems to have a similar feature, which they call "immediate arg call": ftp://ftp.franz.com/pub/duane/ilc07/immediate-args.html Helmut
On 6/17/12 1:16 PM, Helmut Eller wrote:
* Helmut Eller [2012-06-09 07:46] writes:
Attached is a prototype implementation to illustrate how this could be done. See the comments in the file for the main points. A more polished implementation would probably change more places then MAKE-XEP-LAMBDA.
[snip]
I could use a bit of feedback regarding the names. Currently I use "typed calling convention" resp. "typed entry point" and for the regular entry point I use "external entry point" or XEP. For the :typed-no-xep case the naming is a bit misleading as the typed entry point is the only entry point and in the IR has the lambda-kind set to :external, i.e. the XEP is a typed entry. Is there a better word for it?
Sorry that I still haven't had a chance to look at your code or even play with it. I find the :typed and :typed-no-xep hard to remember, though. Could it be :unboxed instead? That makes it pretty clear that you're using calling convention that supports unboxed args. Also, how complex is it to keep track of this information so that the compiler can make the appropriate call? Oh, and is there a use case you're interested in where block compilation doesn't do what you want? (In my particular, hypothetical use case, I can see doing this with f2cl'ed code where the preferred technique is putting one Fortran function per file. But this seems to rather uncommon but would be very useful.) Ray
* Raymond Toy [2012-06-22 04:36] writes:
I could use a bit of feedback regarding the names. Currently I use "typed calling convention" resp. "typed entry point" and for the regular entry point I use "external entry point" or XEP. For the :typed-no-xep case the naming is a bit misleading as the typed entry point is the only entry point and in the IR has the lambda-kind set to :external, i.e. the XEP is a typed entry. Is there a better word for it?
Sorry that I still haven't had a chance to look at your code or even play with it.
I find the :typed and :typed-no-xep hard to remember, though. Could it be :unboxed instead? That makes it pretty clear that you're using calling convention that supports unboxed args.
The only issue I have with "unboxed" is that the calling convention also supports boxed values, e.g. arrays, for which the type check was already done in the caller.
Also, how complex is it to keep track of this information so that the compiler can make the appropriate call?
I added a new info type in the globaldb that keeps track of the calling convention. Essentially one bit that says that this name should be called with the unboxed convention. Currently I do this in a way that is similar to how struct accessors/setters are inlined: when the compiler sees a call it recognizes that that name has the bit set and puts the function-info for %typed-call in the combination-kind slot. The ir2-converter for %typed-call then looks at the type for the global-var name. We keep track of the type anyway, i.e. this isn't new. The type determines the representation for the call. The other new information is the linker table. For callsites we also record the type. The callsites that call the same function and the same type share one entry. I expect that without redefinitions that all callsite of the same function will share a single entry. We also need to inspect the type of function objects during linking. Functions carry the type, expect if compiled with (optimize space). The only new bit here is that we disable the space optimization for the new calling convention. In the IR I need to mark some lambdas as unboxed entry points. Currently I put something in the lambda-plist for that. Right now, I'm thinking about making the unboxed entry a new slot of the optional-dispatch struct which might be a bit cleaner. There are a couple of new VOPs which must still be written for the non-x86 ports.
Oh, and is there a use case you're interested in where block compilation doesn't do what you want? (In my particular, hypothetical use case, I can see doing this with f2cl'ed code where the preferred technique is putting one Fortran function per file. But this seems to rather uncommon but would be very useful.)
Say we have a math library with dozens of functions that operate on floats. To get the benefits of block compilation, we would need to compile the library together with the caller of the library in one block. Also, if we change something we need to recompile the entire block. Yes, that's possible, but not very convenient. Helmut
On 6/22/12 12:32 AM, Helmut Eller wrote:
* Raymond Toy [2012-06-22 04:36] writes:
I could use a bit of feedback regarding the names. Currently I use "typed calling convention" resp. "typed entry point" and for the regular entry point I use "external entry point" or XEP. For the :typed-no-xep case the naming is a bit misleading as the typed entry point is the only entry point and in the IR has the lambda-kind set to :external, i.e. the XEP is a typed entry. Is there a better word for it? Sorry that I still haven't had a chance to look at your code or even play with it.
I find the :typed and :typed-no-xep hard to remember, though. Could it be :unboxed instead? That makes it pretty clear that you're using calling convention that supports unboxed args. The only issue I have with "unboxed" is that the calling convention also supports boxed values, e.g. arrays, for which the type check was already done in the caller.
That's true, but I don't think anyone expects unboxed arrays. I find "typed" confusing because everything has a type. Maybe acl's immediate-arg is better? But cmucl already has a concept of immediates, so this can be confusing too. So, how do I build your changes? I assume can just do a normal cross-compile using tccxboot.lisp? Ray
* Raymond Toy [2012-06-23 15:30] writes:
On 6/22/12 12:32 AM, Helmut Eller wrote:
The only issue I have with "unboxed" is that the calling convention also supports boxed values, e.g. arrays, for which the type check was already done in the caller.
That's true, but I don't think anyone expects unboxed arrays. I find "typed" confusing because everything has a type.
Well, yes but not everything has a non-trivial static type that doesn't need to be verified at runtime.
Maybe acl's immediate-arg is better? But cmucl already has a concept of immediates, so this can be confusing too.
Then I guess "unboxed" is the least confusing.
So, how do I build your changes? I assume can just do a normal cross-compile using tccxboot.lisp?
Yes, that's the idea. I use the snapshot from June. During the build process you need to answer two questions: 1. for the constant IR1-ATTRIBUTE-TRANSLATIONS choose "Go ahead" 2. for the struct VM-SUPPORT-ROUTINES choose "CLOBBER-IT" I also added a file src/tests/unboxed-convention.lisp with some examples that do work. Tho, I'm sure that there are cases left that don't work. Helmut
On 6/23/12 11:05 AM, Helmut Eller wrote:
* Raymond Toy [2012-06-23 15:30] writes:
So, how do I build your changes? I assume can just do a normal cross-compile using tccxboot.lisp?
Yes, that's the idea. I use the snapshot from June. During the build process you need to answer two questions:
1. for the constant IR1-ATTRIBUTE-TRANSLATIONS choose "Go ahead" 2. for the struct VM-SUPPORT-ROUTINES choose "CLOBBER-IT"
Yep, I just guessed that those are the answers, so I have recompiled cmucl with the new feature. We'll have to fix the issue with the questions if possible. I also noticed that when compiling the target, I get an error: ; File: /Volumes/share2/src/clnet/cmucl/cmucl/src/compiler/dump.lisp ; In: DEFUN DUMP-CODE-OBJECT ; (DUMP-FOP 'LISP::FOP-TYPED-ENTRY-POINT FILE) ; Error: (during macroexpansion) ; ; Error in function LISP::ASSERT-ERROR: ; Compiler bug: LISP::FOP-TYPED-ENTRY-POINT not a legal fasload operator. Seems to be harmless, because the result can rebuild cmucl just fine. But we should fix this too.
I also added a file src/tests/unboxed-convention.lisp with some examples that do work. Tho, I'm sure that there are cases left that don't work.
I'll do some more tests with this a bit later. I'd be interested in some bigger tests with maxima, once I hack f2cl to generate the ftype declarations for the translated functions. This might tease out a few corner cases that you missed. And it might give a nice speed up for those functions. (Although, I think most functions are fairly long, so boxing and unboxing is probably in the noise, but it might save on some GC time.) Ray
* Raymond Toy [2012-06-23 20:41] writes:
On 6/23/12 11:05 AM, Helmut Eller wrote:
* Raymond Toy [2012-06-23 15:30] writes:
So, how do I build your changes? I assume can just do a normal cross-compile using tccxboot.lisp?
Yes, that's the idea. I use the snapshot from June. During the build process you need to answer two questions:
1. for the constant IR1-ATTRIBUTE-TRANSLATIONS choose "Go ahead" 2. for the struct VM-SUPPORT-ROUTINES choose "CLOBBER-IT"
Yep, I just guessed that those are the answers, so I have recompiled cmucl with the new feature. We'll have to fix the issue with the questions if possible.
I also noticed that when compiling the target, I get an error:
; File: /Volumes/share2/src/clnet/cmucl/cmucl/src/compiler/dump.lisp
; In: DEFUN DUMP-CODE-OBJECT
; (DUMP-FOP 'LISP::FOP-TYPED-ENTRY-POINT FILE) ; Error: (during macroexpansion) ; ; Error in function LISP::ASSERT-ERROR: ; Compiler bug: LISP::FOP-TYPED-ENTRY-POINT not a legal fasload operator.
Seems to be harmless, because the result can rebuild cmucl just fine. But we should fix this too.
Hmm, I don't see this one.
I also added a file src/tests/unboxed-convention.lisp with some examples that do work. Tho, I'm sure that there are cases left that don't work.
I'll do some more tests with this a bit later. I'd be interested in some bigger tests with maxima, once I hack f2cl to generate the ftype declarations for the translated functions. This might tease out a few corner cases that you missed. And it might give a nice speed up for those functions. (Although, I think most functions are fairly long, so boxing and unboxing is probably in the noise, but it might save on some GC time.)
I worked a bit more on this and enabled the unboxed convention during built time for all the easy cases (not for &keyword/&rest and that kind of thing). After fixing some bugs it seems to work good enough to compile and load PCL. Unsurprisingly, Genesis doesn't work yet. I guess that we should do some linking during cold-load but it's a bit complicated: 1. that would need some checks/verification that the types of callsites and unboxed entry points agree, but which typesystem should be used? The best would probably be some way to model the types that are being loaded but that seems complicated. Can we just use the types of the compiler's target backend? 2. if the types don't match we usually generate adapters but that's also not so easy during Genesis. I guess creating a function with some fake name, cross compiling it, and cold loading the fasl file would work. But it also seems messy. OTHO, I don't even know whether there are any type mismatches. Maybe this is just the price to pay for a meta-circular implementation, but I'm wondering if I overlook some shortcuts. Helmut
On 6/30/12 2:47 AM, Helmut Eller wrote:
* Raymond Toy [2012-06-23 20:41] writes:
; File: /Volumes/share2/src/clnet/cmucl/cmucl/src/compiler/dump.lisp
; In: DEFUN DUMP-CODE-OBJECT
; (DUMP-FOP 'LISP::FOP-TYPED-ENTRY-POINT FILE) ; Error: (during macroexpansion) ; ; Error in function LISP::ASSERT-ERROR: ; Compiler bug: LISP::FOP-TYPED-ENTRY-POINT not a legal fasload operator.
Seems to be harmless, because the result can rebuild cmucl just fine. But we should fix this too. Hmm, I don't see this one.
Maybe I messed up. I'll take a closer look later.
I also added a file src/tests/unboxed-convention.lisp with some examples that do work. Tho, I'm sure that there are cases left that don't work. I'll do some more tests with this a bit later. I'd be interested in some bigger tests with maxima, once I hack f2cl to generate the ftype declarations for the translated functions. This might tease out a few corner cases that you missed. And it might give a nice speed up for those functions. (Although, I think most functions are fairly long, so boxing and unboxing is probably in the noise, but it might save on some GC time.) I worked a bit more on this and enabled the unboxed convention during built time for all the easy cases (not for &keyword/&rest and that kind of thing). After fixing some bugs it seems to work good enough to compile and load PCL.
You mean that you can compile cmucl with the unboxed convention enabled? I don't have any numbers to back this up, but I'm guessing that's not really necessary and doesn't gain anything. Why? I think for the parts that matter, they're all wrapped in a start-block/end-block to block-compile them. Plus, if user code calls into cmucl, because the compiler couldn't optimize it away, there's not much to be gained because the routines are probably pretty complicated that not having to do arg-parsing is in the noise. I'd rather just leave cmucl code as is, and make user code go faster. BTW, is it possible to add some global variable to disable the unboxed arg support? That would make testing easier. Or if a problem comes up, it can be turned off easily. Ray
participants (3)
-
Helmut Eller -
Marco Antoniotti -
Raymond Toy