After struggling mentally with this for a few weeks, I would like to have some consultation before I introduce some changes in ECL -- not that I expect many users here, but at least some implementor-fellows and power users of other implementations.
My concerns right now relate to how declarations should be used by a compiler, and in particular how declarations interact with SAFETY levels. Please correct me if I am wrong, but I have seen more or less the following approaches
[a]- Most implementations blindly believe declarations below a certain safety level. Above it, they seem more or less useless.
[b]- SBCL takes declarations (and THE) as type assertions. For instance, in (LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be checked to be a FIXNUM. This means the type declaration is actually enforced and believed and only at SAFETY 0 the checks are dropped (*)
In both cases one ends up with a model in which in order to truly believe a declaration and have no extra burden (assertions), one has to drop to SAFETY 0 in all code that is involved with it, which is a mess, because it might inadvertently affect other parts of the code. It is for this reason that I am considering an alternative model for ECL which would grade safety as follows
- Type declarations are always believed - SAFETY >= 1 adds type checks to enforce them. - SAFETY = 0, no checks. - SAFETY = 1, the special form THE or additional proclamations on the functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM (FOO X))) ...)
This would allow one to keep most code safe, while deactivating some checks when they are really known to be true (**). Do you think this is useful/useless? The problem I see with this approach is that all code around is written for model [a] or [b], but I could not come up with something more sensible so far.
Juanjo
(*) Actually the checks are also deactivated when SBCL can infer the type of the value that is assigned to Y. This is somewhat contradictory, because (SETF Y (THE FIXNUM (FOO X))) would still generate a check, but proclaiming FOO to return a FIXNUM would completely bypass the check.
(**) Yes, indeed I know that LOCALLY exists for a reason, but it does more than THE. For instance, if I (LOCALLY (DECLARE (SAFETY 0)) (THE FIXNUM (FOO (SLOT-ACCESSOR X)))... this influences the safety of the code that accesses a structure, which is not good.
P.S.: Thanks to Paul Khuong for pointing out that SBCL behaves differently w.r.t. declarations.
On 29 December 2011 12:24, Juan Jose Garcia-Ripoll juanjose.garciaripoll@googlemail.com wrote:
[b]- SBCL takes declarations (and THE) as type assertions. For instance, in (LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be checked to be a FIXNUM. This means the type declaration is actually enforced and believed and only at SAFETY 0 the checks are dropped (*)
This is correct, but incomplete. At SAFETY 1 SBCL will weaken complex assertions: eg.
(OR (INTEGER 0 2) (INTEGER 7 10))
will be simplified to a range check for (INTEGER 0 10). At SAFETY 2 no types are weakened. At SAFETY 3 all the extra bells and whistles required by ANSI come into play.
CMUCL's approach is very similar to SBCL's, but IIRC the policy on weakening assertions is a bit different.
In both cases one ends up with a model in which in order to truly believe a declaration and have no extra burden (assertions), one has to drop to SAFETY 0 in all code that is involved with it, which is a mess, because it might inadvertently affect other parts of the code. It is for this reason that I am considering an alternative model for ECL which would grade safety as follows
Actual cost of assertions (for SBCL generated code at least) is fairly small. They should for the most part be branches which the static branch prediction model gets right every time.
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM (FOO X))) ...)
This would allow one to keep most code safe, while deactivating some checks when they are really known to be true (**). Do you think this is useful/useless? The problem I see with this approach is that all code around is written for model [a] or [b], but I could not come up with something more sensible so far.
I somewhat dislike making THE a loophole -- IMO it complicates the mental model necessary to understand how things work, especially if it works differently at a specific SAFETY level. SBCL has SB-EXT:TRULY-THE for just this purpose, which is roughly equivalent to:
(defmacro truly-the (type values) `(flet ((the-values () ,values)) (declare (optimize (safety 0))) (the ,type (the-values))))
CMUCL has the equivalent as EXT:TRULY-THE. You may want to consider something like that as well.
I know that when I write (UNSAFE-FUN-THAT-CHECKS-NOTHING (THE FIXNUM X)) I intend the THE as an assertion. Granted, most of the code I write is intended for SBCL-only consumption, so this is probably a moot point. Still, loading bunch of stuff from Quicklisp and instrumenting the compiler to see how often THE's like that occur might be instructive.
At the end, as long as SAFETY 0 = trust everything blindly and SAFETY 3 = check everything, I think you're well within the bounds of custom and sanity if you choose to make SAFETY 1 a bit magical.
(*) Actually the checks are also deactivated when SBCL can infer the type of the value that is assigned to Y.
This is actually a major point for us. Because SBCL open codes / partial-evaluates things rather agressively, and has a fairly extensive derivation machinery, in idiomatic Lisp code with type declarations type checks mostly occur only for function arguments, return values, and iteration variables -- and the cost of those type checks if trivial for the most part. What sometimes makes them look more expensive then they actually are is the suboptimal representation selection they cause.
This is somewhat contradictory, because (SETF Y (THE FIXNUM (FOO X))) would still generate a check, but proclaiming FOO to return a FIXNUM would completely bypass the check.
Yes, but if you proclaim FOO to return a fixnum before compiling it, then FOO will take care of that assertion. (Trusting proclamations made after a function has been compiled is considered a long-standing bug, not a feature.)
Cheers,
-- Nikodemus
On Thu, Dec 29, 2011 at 1:33 PM, Nikodemus Siivola < nikodemus@random-state.net> wrote:
On 29 December 2011 12:24, Juan Jose Garcia-Ripoll juanjose.garciaripoll@googlemail.com wrote:
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM (FOO X))) ...)
I somewhat dislike making THE a loophole [...] CMUCL has the equivalent as EXT:TRULY-THE. You may want to consider something like that as well. [...] At the end, as long as SAFETY 0 = trust everything blindly and SAFETY 3 = check everything, I think you're well within the bounds of custom and sanity if you choose to make SAFETY 1 a bit magical.
I believe there can be a compromise between safety and speed. There are many macros and user code that can perform assertions about the code that the compiler will never be able to, and it is in my opinion unfortunate that all the safety checks have to be removed to take full advantage of those.
I also understand that some of the type checks are cheap, specially if the compiler is allowed to "simplify" them, as SBCL does for SAFETY=1, but the result is code bloat. Lots of avoidable checks, branching and error messages that we could do without, without actually sacrificing safety. That does not seem like a bad case for something in between both extremes.
Conceptually, in the model above, I do not see the THE as a loophole, but rather as two different things: variable declarations = type checked assignments, value declarations = compiler hints. For instance, if I invoke a function with a THE argument, SBCL will not generate a check: (FOO (THE FIXNUM X)) is just (FOO X), am I wrong? (I just checked in Ubuntu's SBCL) Then in that sense THE does not really make much sense at all, because the type checks are introduced by assignments to variables, not by this special form.
Juanjo
On 29 December 2011 16:04, Juan Jose Garcia-Ripoll juanjose.garciaripoll@googlemail.com wrote:
Conceptually, in the model above, I do not see the THE as a loophole, but rather as two different things: variable declarations = type checked assignments, value declarations = compiler hints.
Fair enough. SBCL disagrees, but it and CMUCL stand apart from most implementations when it comes to handling of types.
For instance, if I invoke a function with a THE argument, SBCL will not generate a check: (FOO (THE FIXNUM X)) is just (FOO X), am I wrong? (I just checked in Ubuntu's SBCL)
Actually, SBCL /should/ generate the check, unless you are using the interpreter. If it didn't then I'm guessing the Ubuntu version is an old one:
CL-USER> (defun foo (x) x) CL-USER> (foo (the fixnum t)) ; in: FOO (THE FIXNUM T) ; (THE FIXNUM T) ; ; caught WARNING: ; Constant T conflicts with its asserted type FIXNUM. ; See also: ; The SBCL Manual, Node "Handling of Types" ; ; compilation unit finished ; caught 1 WARNING condition ; Evaluation aborted on #<SIMPLE-TYPE-ERROR expected-type: FIXNUM datum: T>.
...plus entry to debugger is the expected behaviour.
Both THE generates and and assignment to a variable whose type has been declared generates a identical cast node in SBCL's IR.
Cheers,
-- Nikodemus
Using declarations vs using THE is often a stylistic consideration, and while you may be able to get ECL-only users to accept your additional semantics, you might have trouble getting maintainers of portable libraries to observe this arbitrary distinction.
Why not let SPEED into the mix? E.g. if SPEED > SAFETY then don't compile typechecks.
On Thu, Dec 29, 2011 at 5:24 AM, Juan Jose Garcia-Ripoll < juanjose.garciaripoll@googlemail.com> wrote:
After struggling mentally with this for a few weeks, I would like to have some consultation before I introduce some changes in ECL -- not that I expect many users here, but at least some implementor-fellows and power users of other implementations.
My concerns right now relate to how declarations should be used by a compiler, and in particular how declarations interact with SAFETY levels. Please correct me if I am wrong, but I have seen more or less the following approaches
[a]- Most implementations blindly believe declarations below a certain safety level. Above it, they seem more or less useless.
[b]- SBCL takes declarations (and THE) as type assertions. For instance, in (LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be checked to be a FIXNUM. This means the type declaration is actually enforced and believed and only at SAFETY 0 the checks are dropped (*)
In both cases one ends up with a model in which in order to truly believe a declaration and have no extra burden (assertions), one has to drop to SAFETY 0 in all code that is involved with it, which is a mess, because it might inadvertently affect other parts of the code. It is for this reason that I am considering an alternative model for ECL which would grade safety as follows
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM (FOO X))) ...)
This would allow one to keep most code safe, while deactivating some checks when they are really known to be true (**). Do you think this is useful/useless? The problem I see with this approach is that all code around is written for model [a] or [b], but I could not come up with something more sensible so far.
Juanjo
(*) Actually the checks are also deactivated when SBCL can infer the type of the value that is assigned to Y. This is somewhat contradictory, because (SETF Y (THE FIXNUM (FOO X))) would still generate a check, but proclaiming FOO to return a FIXNUM would completely bypass the check.
(**) Yes, indeed I know that LOCALLY exists for a reason, but it does more than THE. For instance, if I (LOCALLY (DECLARE (SAFETY 0)) (THE FIXNUM (FOO (SLOT-ACCESSOR X)))... this influences the safety of the code that accesses a structure, which is not good.
P.S.: Thanks to Paul Khuong for pointing out that SBCL behaves differently w.r.t. declarations.
-- Instituto de Física Fundamental, CSIC c/ Serrano, 113b, Madrid 28006 (Spain) http://juanjose.garciaripoll.googlepages.com
pro mailing list pro@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
On Thu, Dec 29, 2011 at 5:46 PM, Gail Zacharias gz@clozure.com wrote:
Using declarations vs using THE is often a stylistic consideration, and while you may be able to get ECL-only users to accept your additional semantics, you might have trouble getting maintainers of portable libraries to observe this arbitrary distinction.
Precisely what I mean is that the current semantics is really inconvenient for library writers. I also believe that this change can be introduced at no cost for library maintainers because it effectively does not change the semantics at the safety levels that code are typically compiled (0 or default ones). Let me try to explain it further below.
Why not let SPEED into the mix? E.g. if SPEED > SAFETY then don't compile typechecks.
The issue is not SPEED, it is safety. Safety need not be sacrificed to gain speed. Moreover, the problem with this SAFETY vs SPEED thing is that it has no granularity at all. It is a simplistic view of the world which assumes that all code is the same.
Let me explain the situation with an ordinary library, say a regular expression parser. Somebody who writes the library has to understand that there are various types of routines or sections of code that she is going to write:
2- Code that handles user input (strings, lists which might be malformed, etc) 1- Code that handles internal data (structures that will not change, sealed classes, lists of known lengths) 0- Small sections of code that handles internal data and needs speed
I would expect that only 0 should be compiled with SAFETY = 0, and explicitly marked so. However, we also have 1 and 2, which typically 1 and 2 are going to coexist and sometimes appear intermixed in the same function. Here one must either resort to high safety levels for everything, or end up wrapping around different sections of code with (LOCALLY (... UNSAFE ...) ...) declarations. This is not good in my opinion.
The problem is that we are implicitly advocating that SAFETY = 0 is good for everything once the code is mature enough and you need speed, but such level implies much more than believing type declarations, it typically implies that the arguments to functions are not checked at all. Take (CAR (THE CONS X)). There are multiple ways in which this CAR call can be inlined. To get the optimal case in this situation where I am telling the compiler "believe me, this is a CONS", I may be opening the can of worms by lifting all type checks in other uses of CAR.
Why do I believe this does not really change the semantics in a significant way? First of all because apart from SBCL's declaration policy there is not an explicitly written commitment in any of the free (natively compiling) common lisps out there about the meaning of optimization settings. In such a panorama, I would guess that currently library maintainers more or less follow the approach of lowering safety levels to 0 in speed-critical code and leaving it at some default value that works with their favorite implementation elsewhere. See for instance CL-PPCRE
(defvar *standard-optimize-settings* '(optimize speed (safety 0)(space 0) (debug 1) (compilation-speed 0) #+:lispworks (hcl:fixnum-safety 0))...
From the user's point of view, the approach seems to be: if safety level is
zero, the compiler will make fast code, in default settings mode, I will get type checking. The PCL also suggests this, and it seems to be a common entry point for many new users. Moreover, users also cannot rely on CMUCL's or SBCL's or ECL's type checking behavior for function arguments, because they are not really standard, and manual type checking is required in most libraries.
OTOH, if one comes up with a set of sensible settings that users may choose from and which may be applicable throughout the library, without disrupting the current behavior at SAFETY 0 or default, then the cost of adoption is zero.
I am just trying to figure out a non-disruptive way of choosing those settings, documenting them ( http://ecls.sourceforge.net/new-manual/ch02.html#ansi.declarations.optimize), and perhaps even sparking a debate about it, so that there may be some more uniformity throughout implementations.
Cheers,
Juanjo
On Thu, 29 Dec 2011 11:24:37 +0100, Juan Jose Garcia-Ripoll said:
After struggling mentally with this for a few weeks, I would like to have some consultation before I introduce some changes in ECL -- not that I expect many users here, but at least some implementor-fellows and power users of other implementations.
My concerns right now relate to how declarations should be used by a compiler, and in particular how declarations interact with SAFETY levels. Please correct me if I am wrong, but I have seen more or less the following approaches
[a]- Most implementations blindly believe declarations below a certain safety level. Above it, they seem more or less useless.
[b]- SBCL takes declarations (and THE) as type assertions. For instance, in (LET ((Y (FOO X))) (DECLARE (FIXNUM Y))) ...) the assignment to Y would be checked to be a FIXNUM. This means the type declaration is actually enforced and believed and only at SAFETY 0 the checks are dropped (*)
In both cases one ends up with a model in which in order to truly believe a declaration and have no extra burden (assertions), one has to drop to SAFETY 0 in all code that is involved with it, which is a mess, because it might inadvertently affect other parts of the code. It is for this reason that I am considering an alternative model for ECL which would grade safety as follows
- Type declarations are always believed
- SAFETY >= 1 adds type checks to enforce them.
- SAFETY = 0, no checks.
- SAFETY = 1, the special form THE or additional proclamations on the
functions can be used to deactivate the check. As in (LET ((Y (THE FIXNUM (FOO X))) ...)
This would allow one to keep most code safe, while deactivating some checks when they are really known to be true (**). Do you think this is useful/useless? The problem I see with this approach is that all code around is written for model [a] or [b], but I could not come up with something more sensible so far.
I don't like this because it contradicts the CL spec:
"The meaning of a type declaration is equivalent to changing each reference to a variable (var) within the scope of the declaration to (the typespec var), changing each expression assigned to the variable (new-value) within the scope of the declaration to (the typespec new-value), and executing (the typespec var) at the moment the scope of the declaration is entered."
(from http://www.lispworks.com/documentation/HyperSpec/Body/d_type.htm).
In LispWorks, type declarations and THE forms have the same semantics and they are checked when safety = 3 and debug = 3. The reason for involving debug is that the checking code can be large and relatively slow.
__Martin