Just curious for other opinions... but wouldn't this (Heartbleed) sort of buffer excess read-back failure have been prevented by utilizing a "safe" language like Lisp or SML?
I used to be an "unsafe" language bigot -- having mastered C/C++ for many years, and actually producing C compilers for a living at one time. I felt there should be no barriers to me as master of my machine, and not the other way around.
But today's software systems are so complex that it boggles the mind to keep track of everything needed. I found during my transition years that I could maintain code bases no larger than an absolute max of 500 KLOC, and that I actually started losing track of details around 100 KLOC. Making the transition to a higher level language like SML or Lisp enabled greater productivity within those limits for me.
Dr. David McClain dbm@refined-audiometrics.com
I do software security professionally these days.
While it is easier (e.g., almost possible) to do memory corruption/buffer overrun/stack smashing in any language, it is certainly far easier to do so in C and C++. Many languages these days link to C libraries, thus increasing the possibility.
However, much of my work these days is done against .net applications, which is a managed, garbage collected language. The number and frequency of errors in the code is not smaller than with C. It is still possible to get remote code execution through the IIS/.net web stack.
Application security is very difficult, and not very many of us write error-free code.
To me the issue with OpenSSL (and there are still some that remain, although the ones that I know about are not as severe) is that the code is very unclear and hard to reason about. In fact, the best static code analyzers had to be tweaked to see the issue.
Having many years experience in both C and C++, I find that working in Lisp is much easier to make assertions about its fine-grained behavior, pretty much agreeing with your experience.
I would like to rephrase the question: which language makes it easier to reason about a large code base? My vote is for the Lisp family. However, keep in mind one of the best-written programs out there, Qmail, is written in C. There is a lot to be said for who the author/authors are as well as the language.
wglb
On Sat, Apr 12, 2014 at 4:52 PM, David McClain <dbm@refined-audiometrics.com
wrote:
Just curious for other opinions... but wouldn't this (Heartbleed) sort of buffer excess read-back failure have been prevented by utilizing a "safe" language like Lisp or SML?
I used to be an "unsafe" language bigot -- having mastered C/C++ for many years, and actually producing C compilers for a living at one time. I felt there should be no barriers to me as master of my machine, and not the other way around.
But today's software systems are so complex that it boggles the mind to keep track of everything needed. I found during my transition years that I could maintain code bases no larger than an absolute max of 500 KLOC, and that I actually started losing track of details around 100 KLOC. Making the transition to a higher level language like SML or Lisp enabled greater productivity within those limits for me.
Dr. David McClain dbm@refined-audiometrics.com
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
David McClain dbm@refined-audiometrics.com writes:
Just curious for other opinions... but wouldn't this (Heartbleed) sort of buffer excess read-back failure have been prevented by utilizing a "safe" language like Lisp or SML?
I used to be an "unsafe" language bigot -- having mastered C/C++ for many years, and actually producing C compilers for a living at one time. I felt there should be no barriers to me as master of my machine, and not the other way around.
Oh, so you are directly (if partially) responsible for the C mess!
The C standards say that:
{ char a[10]; return a[12]; }
is _undefined_.
Why, as a compiler writer, didn't you define it to raise an exception? Yes, the C standard doesn't define exceptions, why, as a compiler writer, didn't you add this obvious extension?
Notice how CLHS aref specifies:
subscripts---a list of valid array indices for the array.
Exceptional Situations: None.
and how:
1.4.4.3 The ``Arguments and Values'' Section of a Dictionary Entry
An English language description of what arguments the operator accepts and what values it returns, including information about defaults for parameters corresponding to omittable arguments (such as optional parameters and keyword parameters). For special operators and macros, their arguments are not evaluated unless it is explicitly stated in their descriptions that they are evaluated.
Except as explicitly specified otherwise, the consequences are undefined if these type restrictions are violated.
Which means that (let ((a (make-array 10))) (aref a 12)) is as undefined in CL as in C!
However, you don't see Lisp implementers allow it, and instead they all signal an error:
[pjb@kuiper :0.0 tmp]$ clall -r '(let ((a (make-array 10))) (aref a 12))'
Armed Bear Common Lisp Invalid array index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) (should be >= 0 and < 10). Clozure Common Lisp Array index 12 out of bounds for #(0 0 0 0 0 0 0 0 0 0) . CLISP AREF: index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) is out of range CMU Common Lisp Error in function LISP::%ARRAY-ROW-MAJOR-INDEX: Invalid index 12 in #(0 0 0 0 0 0 0 0 0 0) ECL In function AREF, the index into the object #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL). takes a value 12 out of the range (INTEGER 0 9). SBCL Value of SB-INT:INDEX in (THE (INTEGER 0 (10)) SB-INT:INDEX) is 12, not a (MOD 10).
And even with safety 0, which should never be used, (but perhaps on a specific function that you've proven needs to 2 cycles faster, for which you can't find a better algorithm, and when you have proven and tested that bounds and other adverse conditions couldn't occur, that is, so many conditions that they never occur in real life), non-toy implementations still check bounds:
[pjb@kuiper :0.0 tmp]$ clall -r '(declaim (optimize (safety 0) (speed 3) (debug 0) (space 3)))' '(let ((a (make-array 10))) (aref a 12))'
Armed Bear Common Lisp --> NIL Armed Bear Common Lisp Invalid array index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) (should be >= 0 and < 10). CCL /home/pjb/bin/clall: line 284: 16162 Segmentation fault "$implementation" "$@" "${user_args[@]}" > "$error" 2>&1 CLISP --> NIL CLISP AREF: index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) is out of range CMU Common Lisp --> NIL CMU Common Lisp Error in function LISP::%ARRAY-ROW-MAJOR-INDEX: Invalid index 12 in #(0 0 0 0 0 0 0 0 0 0) ECL --> NIL ECL In function AREF, the index into the object #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL). takes a value 12 out of the range (INTEGER 0 9). SBCL --> No value. SBCL --> 0
Here is how a lisp programmer implements a C compiler:
cl-user> (ql:quickload :vacietis) ;; … cl-user> (defun read-c-expression-from-string (source) (let ((*readtable* vacietis:c-readtable) (vacietis:*compiler-state* (vacietis:make-compiler-state))) (read-from-string source))) read-c-expression-from-string cl-user> (read-c-expression-from-string "char f(){ char a[10]; return a[12]; }") (vacietis::defun/1 f nil (prog* ((a (vacietis:allocate-memory 10))) (return (vacietis.c:[] a 12)))) cl-user> (eval (read-c-expression-from-string "char f(){ char a[10]; return a[12]; }")) f cl-user> (f)
Debug: Array index 12 out of bounds for #<vector 10, adjustable> . While executing: (:internal swank::invoke-default-debugger), in process repl-thread(871). Type :POP to abort, :R for a list of available restarts. Type :? for other options.
1 > :q ; Evaluation aborted on #<simple-error #x302003010B7D>. cl-user>
A final word: Don't use FFI! Implement the libraries you need in Lisp!
Hi Pascal
That was very funny! Heh!
But in case you weren't trying to be funny, then I'd have a guess that you were born sometime later than 1970 or so.
Everything happens in an historical context. And if the C language had raised exceptions on "invalid" memory accesses, then I can assure you that neither I nor anyone else at the time would have used such a language. It would have been too constraining. If you wanted such confining behavior then you might have considered the new language Ada.
I'd love to discuss at greater length but right now I'm attending a Chamber Concert and not near my computers.
From what I understand about the bug (I have not seen the code) it sounds like data length information arrived both directly and indirectly in the client message and that a conflict between them was not scrutinized.
More later...
Dr. David McClain
Sent from a mobile device
On Apr 13, 2014, at 3:01, "Pascal J. Bourguignon" pjb@informatimago.com wrote:
David McClain dbm@refined-audiometrics.com writes:
Just curious for other opinions... but wouldn't this (Heartbleed) sort of buffer excess read-back failure have been prevented by utilizing a "safe" language like Lisp or SML?
I used to be an "unsafe" language bigot -- having mastered C/C++ for many years, and actually producing C compilers for a living at one time. I felt there should be no barriers to me as master of my machine, and not the other way around.
Oh, so you are directly (if partially) responsible for the C mess!
The C standards say that:
{ char a[10]; return a[12]; }
is _undefined_.
Why, as a compiler writer, didn't you define it to raise an exception? Yes, the C standard doesn't define exceptions, why, as a compiler writer, didn't you add this obvious extension?
Notice how CLHS aref specifies:
subscripts---a list of valid array indices for the array.
Exceptional Situations: None.
and how:
1.4.4.3 The ``Arguments and Values'' Section of a Dictionary Entry
An English language description of what arguments the operator accepts and what values it returns, including information about defaults for parameters corresponding to omittable arguments (such as optional parameters and keyword parameters). For special operators and macros, their arguments are not evaluated unless it is explicitly stated in their descriptions that they are evaluated.
Except as explicitly specified otherwise, the consequences are undefined if these type restrictions are violated.
Which means that (let ((a (make-array 10))) (aref a 12)) is as undefined in CL as in C!
However, you don't see Lisp implementers allow it, and instead they all signal an error:
[pjb@kuiper :0.0 tmp]$ clall -r '(let ((a (make-array 10))) (aref a 12))'
Armed Bear Common Lisp Invalid array index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) (should be >= 0 and < 10). Clozure Common Lisp Array index 12 out of bounds for #(0 0 0 0 0 0 0 0 0 0) . CLISP AREF: index 12 for #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) is out of range CMU Common Lisp Error in function LISP::%ARRAY-ROW-MAJOR-INDEX: Invalid index 12 in #(0 0 0 0 0 0 0 0 0 0)
From what I understand about the bug (I have not seen the code) it sounds
like data length information
arrived both directly and indirectly in the client message and that a
conflict between them was not
scrutinized.
No. The bug was that the keep alive protocol in SSL mandates the server to echo arbitrary data back to the client. The bounds checks were wrong too, but at that stage it really doesn't matter. The design is just plain wrong.
. The design is just plain wrong.
Is that statement the benefit of hindsight knowledge, or do you have a more intelligent thought process behind it? (I can imagine the all-knowing smirk in the background, but I'd really like to know :-)
- DM
On Apr 23, 2014, at 01:06 AM, Max Rottenkolber max@mr.gy wrote:
From what I understand about the bug (I have not seen the code) it sounds
like data length information
arrived both directly and indirectly in the client message and that a
conflict between them was not
scrutinized.
No. The bug was that the keep alive protocol in SSL mandates the server to echo arbitrary data back to the client. The bounds checks were wrong too, but at that stage it really doesn't matter. The design is just plain wrong.
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
Dr. David McClain dbm@refined-audiometrics.com
On Wed, 23 Apr 2014 06:13:03 -0700, David McClain wrote:
. The design is just plain wrong.
Is that statement the benefit of hindsight knowledge, or do you have a more intelligent thought process behind it? (I can imagine the all-knowing smirk in the background, but I'd really like to know :-)
The exact opposite of all-knowing ;). In my opinion the TLS standard is too complex. Parts of it like the keep-alive, which is also a path MTU checking *framework*, as criticized by me (and further down discussed with Pascal).
Many security professionals have criticized the TLS committee for their standards. As a side note: OpenSSL has roughly 500k lines of code, I don't think its feasible to assure security on a code base of this magnitude.
If I imagine to implement a security protocol, e.g. "this code should be kept short and really really safe", and be confronted with e.g. the Heartbeat extension, I imagine despair.
So my conclusion is, a widely used security standard should be engineered well enough to be possible to implement correctly, even in a 4 digit ANSI C code base.
On Wed, Apr 23, 2014 at 1:06 AM, Max Rottenkolber max@mr.gy wrote:
From what I understand about the bug (I have not seen the code) it sounds
like data length information
arrived both directly and indirectly in the client message and that a
conflict between them was not
scrutinized.
No. The bug was that the keep alive protocol in SSL mandates the server to echo arbitrary data back to the client. The bounds checks were wrong too, but at that stage it really doesn't matter. The design is just plain wrong.
It is a bit curious that the protocol mandates this echoing, and one could certainly debate whether this is good protocol design, but as far as the actual vulnerability goes, David's characterization is accurate. The heartbeat request arrives with some number of bytes of data attached to it, and also with a length field that tells the server how many bytes the client would like echoed back. There was no check that the client didn't request more bytes be echoed than it had actually sent.
-- Scott
"Scott L. Burson" Scott@sympoiesis.com writes:
On Wed, Apr 23, 2014 at 1:06 AM, Max Rottenkolber max@mr.gy wrote:
From what I understand about the bug (I have not seen the code) it sounds
like data length information
arrived both directly and indirectly in the client message and that a
conflict between them was not
scrutinized.
No. The bug was that the keep alive protocol in SSL mandates the server to echo arbitrary data back to the client. The bounds checks were wrong too, but at that stage it really doesn't matter. The design is just plain wrong.
It is a bit curious that the protocol mandates this echoing, and one could certainly debate whether this is good protocol design, but as far as the actual vulnerability goes, David's characterization is accurate. The heartbeat request arrives with some number of bytes of data attached to it, and also with a length field that tells the server how many bytes the client would like echoed back. There was no check that the client didn't request more bytes be echoed than it had actually sent.
This is not what the protocol specifies.
The protocol specifies that if the length mentionned is wrong ("too big"), then nothing should be answered, and otherwise that the exact payload data received be sent back.
Nowhere does it say that the length field in the packet has any validity that the server must use it blindly.
Nowhere is it specified that the server should return data beyond the payload data.
It is obvious that the message length should be taken ito account to determine the padding_length, since it's not in the message. This is explicitely described in the protocol specifications:
padding: The padding is random content that MUST be ignored by the receiver. The length of a HeartbeatMessage is TLSPlaintext.length for TLS and DTLSPlaintext.length for DTLS. Furthermore, the length of the type field is 1 byte, and the length of the payload_length is 2. Therefore, the padding_length is TLSPlaintext.length - payload_length - 3 for TLS and DTLSPlaintext.length - payload_length - 3 for DTLS. The padding_length MUST be at least 16.
The sender of a HeartbeatMessage MUST use a random padding of at least 16 bytes. The padding of a received HeartbeatMessage message MUST be ignored.
Max Rottenkolber max@mr.gy writes:
From what I understand about the bug (I have not seen the code) it sounds
like data length information
arrived both directly and indirectly in the client message and that a
conflict between them was not
scrutinized.
No. The bug was that the keep alive protocol in SSL mandates the server to echo arbitrary data back to the client. The bounds checks were wrong too, but at that stage it really doesn't matter. The design is just plain wrong.
I don't think you can say that it's _just_ the design is just plain wrong.
If I give you as specification:
- client sends a string S of length s. - client sends an offset o and a length l. - server sends back data l bytes taken from the address of the string plus o.
and ask you to implement it only using the CL package, you won't be able to implement it in any CL implementation using non-zero safety, and you won't be able to implement it in most CL implementations using (safety 0).
But those weren't the specifications, they are obviously bogus.
But assuming they were the following (still bogus, but rather reasonable specifications)::
- client sends a string S of length s. - client sends an offset o and a length l. - server sends back the substring of S starting at offset o, containing l characters.
This you could easily implement in CL, (as easily as in C), but again, while in C this is a heartbleed bug, in CL, it poses absolutely no security problem (unless you're using some certain implementations with (safety 0), which you should not have done anyways, you're really asking for problems, aren't you).
(defun heartbeat-data (S o l) (subseq S o (+ o l)))
(heartbeat-data "Hello" 0 64000) > Debug: Bad interval for sequence operation on "Hello" : start = 0, > end = 64000
So while the protocol didn't specify apparently what to do when (> (+ o l) (length S)), this would have been handled as any other generic protocol or server error, and no private data would be bled away.
http://cacm.acm.org/blogs/blog-cacm/173827-those-who-say-code-does-not-matte... http://jameso.be/2012/02/11/language-matters.html
So it's not just the specifications, it's the language implementations that are at fault here (not the ANSI C language, which clearly says that it's undefined to read an uninitialized array or outside of allocated memory, and therefore you could expect as with any CL implementation to have exceptions signaled in such occurences (since it's undefined, implementation could define implementation specific exception mechanisms)).
But the actual protocol specifications didn't even say that! They are actually quite reasonable, and this is clearly an implementation bug:
https://tools.ietf.org/html/rfc6520
The specifications of the protocol explicitely say:
If the payload_length of a received HeartbeatMessage is too large, the received HeartbeatMessage MUST be discarded silently.
and:
When a HeartbeatRequest message is received and sending a HeartbeatResponse is not prohibited as described elsewhere in this document, the receiver MUST send a corresponding HeartbeatResponse message carrying AN EXACT COPY OF THE PAYLOAD of the received HeartbeatRequest.
On Wed, 23 Apr 2014 20:39:48 +0200, Pascal J. Bourguignon wrote:
When a HeartbeatRequest message is received and sending a HeartbeatResponse is not prohibited as described elsewhere in this document, the receiver MUST send a corresponding HeartbeatResponse message carrying AN EXACT COPY OF THE PAYLOAD of the received HeartbeatRequest.
I didn't mean to dispute that CL is a safer language. My point is that, as an implementer, the above paragraph in an SSL protocol extension should raise red lights.
What is the function of the described behavior? Why would I want to echo back data in context of a keep alive? A: None. You don't want to do that.
My position on this is to refuse to implement it. If that means my implementation is useless in the context of other implementations, I need to implement a better standard. I'd go as far as saying this is a moral issue. When implementing a standard means building a weapon pointed at half the Internet, the implementer is responsible for the resulting threat.
I have made mixed experiences with this. So far I have implemented a few standards where this approach worked just fine (email client, web server). I could just omit the behavior I deemed unacceptable and refuse to handle those messages or send a "501 Not Implemented" respectively. And while both email and the HTTP standards bear tons of legacy baggage and can be tedious to implement, I refer to them as _good_ standards bodies, because they let you safely omit their questionable components.
A security guy reading the TLS standard on the other hand, WILL think that it was written by a malicious party, optimized for being impossible to implement in a safe way. And while it is easier to implement the TLS standard correctly in Lisp, I believe it should be simple and well- written enough to be able to implement it safely even in C.
Max Rottenkolber max@mr.gy writes:
On Wed, 23 Apr 2014 20:39:48 +0200, Pascal J. Bourguignon wrote:
When a HeartbeatRequest message is received and sending a HeartbeatResponse is not prohibited as described elsewhere in this document, the receiver MUST send a corresponding HeartbeatResponse message carrying AN EXACT COPY OF THE PAYLOAD of the received HeartbeatRequest.
I didn't mean to dispute that CL is a safer language. My point is that, as an implementer, the above paragraph in an SSL protocol extension should raise red lights.
What is the function of the described behavior? Why would I want to echo back data in context of a keep alive? A: None. You don't want to do that.
You want to make sure that the answer you get corresponds to the request you sent.
You could use a counter, but it would be too easy to simulate it on the other end.
If you send random data, and compare the returned data, you make sure that there's something alive on the other end that can receive your message and respond to them, not a dead process sending fixed or previsible packets.
On Thu, 24 Apr 2014 18:13:35 +0200, Pascal J. Bourguignon wrote:
a dead process sending fixed or previsible packets
I didn't think of that. So basically you ensure the responding connection isn't compromised by exercising the encryption, which is the hardest to fake for a malicious attacker. Makes sense... Shame on me! :)
What about a fixed length input though (and maybe answering with a digest)? It still seems to me that the specified behavior is overly arbitrary/error prone.
Max Rottenkolber max@mr.gy writes:
On Thu, 24 Apr 2014 18:13:35 +0200, Pascal J. Bourguignon wrote:
a dead process sending fixed or previsible packets
I didn't think of that. So basically you ensure the responding connection isn't compromised by exercising the encryption, which is the hardest to fake for a malicious attacker. Makes sense... Shame on me! :)
What about a fixed length input though (and maybe answering with a digest)? It still seems to me that the specified behavior is overly arbitrary/error prone.
The introduction of the protocol says:
The Heartbeat Extension provides a new protocol for TLS/DTLS allowing the usage of keep-alive functionality without performing a renegotiation and a basis for path MTU (PMTU) discovery for DTLS.
So the variable size of the packet is used for this later feature, discovery of path MTU or PMTU.
There have been a lot of incorrect information and assumption on this thread. I'm not picking on Pascal here (because I know he knows better).
On Wed, Apr 23, 2014 at 11:39 AM, Pascal J. Bourguignon < pjb@informatimago.com> wrote:
and ask you to implement it only using the CL package, you won't be able to implement it in any CL implementation using non-zero safety, and you won't be able to implement it in most CL implementations using (safety 0).
In any case, you won't be able to implement an HTTP server in ANSI CL because we of X3J13 didn't get around to defining any socket interface. It was a known need, but too difficult to achieve. That's probably a good thing, because any standard socket binding circa 1990 would likely have been seriously incorrect and/or inadequate. (Compare the lack of Unicode binding.)
This you could easily implement in CL, (as easily as in C), but again, while in C this is a heartbleed bug, in CL, it poses absolutely no security problem (unless you're using some certain implementations with (safety 0), which you should not have done anyways, you're really asking for problems, aren't you).
You're making unsupported assumptions about safety 0. The ANS only makes distinction between safety _3_ and safety anything else. safety 3 is safe code and in safe code certain user-code violations are required to be signalled (usually where the ANS uses the word "should"). And there are damn few of thos places. Take for example aref, which might be used to extract octets of characters or whatever from a buffer. aref makes no guarantees even in safe code that it will signal bad array bounds.
Of course, it is unlikely that a HTTP server would use aref in this context, but more likely it would engage implementation-dependent socket and/or stream extensions. Do those extensions guarantee the kind of paranoid safe checking? Probably not, but even if they claim to do so, how does one verify? Real socket protocols use large buffers and simply pass memory pointers and lengths AT the OS. You might think that is bad practice, but you might dislike the performance in a _real_ performance web server that made too many guarantees.
(But I certainly agree that the Heatbleed bug results from a poor implementation of an obscure specification. But it isn't the language.)
So it's not just the specifications, it's the language implementations that are at fault here (not the ANSI C language, which clearly says that it's undefined to read an uninitialized array or outside of allocated memory, and therefore you could expect as with any CL implementation to have exceptions signaled in such occurences (since it's undefined, implementation could define implementation specific exception mechanisms)).
"Consequences are undefined" ïs a term of art in the ANS. Behavior might range from DWIM to destruction of the Universe. You cannot expect a CL implementation to check situations that are not specified by the ANS to be checked. I just checked the following form in SBCL and ACL -- both did undefined things and did not signal errors.
(funcall (compile nil '(lambda (x) (declare (optimize (speed 3) (safety 0))) (svref x 10))) (make-array 3))
Just like C, but at least the Universe didn't disappear. This time.
CL is not intrinsically more safe than C How any favorite implementation behaves is irrelevant to this argument. It is the programmer that must code safely. .
On Thu, Apr 24, 2014 at 7:29 PM, Steve Haflich shaflich@gmail.com wrote:
Take for example aref, which might be used to extract octets of characters or whatever from a buffer. aref makes no guarantees even in safe code that it will signal bad array bounds.
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
Still, it surprises me. I don't know of any implementation that doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so. It seems a little pedantic to insist that the _language_ isn't safe in this respect even when all known implementations are. (Am I wrong about that?)
And for the record I disagree with the committee's decision. Bounds checking aref etc. _should_ be required at safety 3 (and along with that, there should be a standardized bounds-error condition type). The reasoning behind the committee's choice here eludes me.
-- Scott
"Scott L. Burson" Scott@sympoiesis.com writes:
On Thu, Apr 24, 2014 at 7:29 PM, Steve Haflich shaflich@gmail.com wrote:
Take for example aref, which might be used to extract octets of characters or whatever from a buffer. aref makes no guarantees even in safe code that it will signal bad array bounds.
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
Still, it surprises me. I don't know of any implementation that doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so. It seems a little pedantic to insist that the _language_ isn't safe in this respect even when all known implementations are. (Am I wrong about that?)
The point is that ANSI Common Lisp compiler writers will have their compilers generate run-time bound checking code, while ANSI C compiler writters won't.
The point is that ANSI Common Lisp compiler writers will add extensions to the language or "standard library" to deal with sockets and network communications, while ANSI C compiler writers won't (relaying on library and OS API writers to do so).
The point is that ANSI Common Lisp compiler writers don't need to add exception handling as an extension because it's already specified in the language, while ANSI C compiler writers would have to do so, to deal non-trivially with run-time errors.
And for the record I disagree with the committee's decision. Bounds checking aref etc. _should_ be required at safety 3 (and along with that, there should be a standardized bounds-error condition type). The reasoning behind the committee's choice here eludes me.
Agreed, a programming language standard should not rely on the good sense of implementers to ensure the semantics of its programs, the more so in a dynamic language where code can be executed without being previously globally validated.
But again, AIUI, Common Lisp was specified as much as it was documenting the commonality in existing implementations, so that may explain why there are so many parts that are unspecified or implementation dependant.
On Fri, Apr 25, 2014 at 12:31 AM, Scott L. Burson Scott@sympoiesis.comwrote:
On Thu, Apr 24, 2014 at 7:29 PM, Steve Haflich shaflich@gmail.com wrote:
Take for example aref, which might be used to extract octets of characters or whatever from a buffer. aref makes no guarantees even in safe code that it will signal bad array bounds.
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
Still, it surprises me. I don't know of any implementation that doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so.
I am surprised too. I always understood it like you Scott but now that re-read the page on aref I see that it is exactly like Steve says, no mention of any exception and a statement that "subscripts" must be a list of valid array indices right from the start of the call to aref. Yet that leaves me even more curious to know which implementation has read the spec as strictly as Steve says it can be even under (safety 3)? Does anyone know any?
On Thu, Apr 24, 2014 at 10:35 PM, Jean-Claude Beaudoin < jean.claude.beaudoin@gmail.com> wrote:
On Fri, Apr 25, 2014 at 12:31 AM, Scott L. Burson Scott@sympoiesis.comwrote:
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
"Oversight" might be the wrong way of thinking about this. X3J13 started with the language defined by CLtL1 (the work of the infamous Gang of Five) with the purpose of turning it into a powerful, useful, real-world industrial-strength programming language. Except for new subsystems grabbed more-or-less intact from other sources (Waters' pretty printer, the condition system (Pitman and others), and CLOS (a different gang inside X3J13)) the specification started with the CLtL1 definitions to which cleanups and accretions were made. There were a lot of inconsistencies to remove, and a lot of language cleanups, and a lot of incompatibilities as modern features were added. But a lot of culture from early era Lisps (primarily MACSYMA) remain. We changed what _needed_ to be changed, cleaned up a lot of other inelegances, but there was not time or energy to attempt a thorough job. The subgroups for things like graphics and I18N and networking realized the time was not yet ripe -- the world was changing out from under them -- and the committee The process took almost 6 years, and near the end time and funding was running out. Eventually the committee standardized _only_ the programming language, more or less, and we were lucky to get it done.
Still, it surprises me. I don't know of any implementation that
doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so.
I am surprised too. I always understood it like you Scott but now that re-read the page on aref I see that it is exactly like Steve says, no mention of any exception and a statement that "subscripts" must be a list of valid array indices right from the start of the call to aref. Yet that leaves me even more curious to know which implementation has read the spec as strictly as Steve says it can be even under (safety 3)? Does anyone know any?
I don't know of any and there might not be any, at least among main-line implementations. I don't remember X3J13 considering aref (except for the non-interaction with a fill pointer) but I also can't remember what I had for breakfast this morning, so investigation of X3J13 records might reveal differently. The lack of exhaustive subtyping of cl:error was recognized as something missing, but the condition system itself wasn't in the original language, and no one had the time or energy to go through the entire specification. The sense was that a quality implementation could do so itself, and maybe agre on details in the future.
But in your paragraph above I'm bothered by its hidden assumption: It suggests that after the ANS was available sneaky implementors studied it kabalistically to find places where annoying error checks could be removed. It was exactly the opposite! Tired implementors slogged through the ANS to find places where error checking was _required_ and found missing. (Or customers did it for them.)
I agree it would be a good thing if the ANS required aref bounds checking in safe code.
To return to my important point, the language of the ANS wont let you read or write from a socket. At some point user application code will have to call some non-ANS functions, and in the real world those functions (just like C) will take a pointer into some overlarge buffer array along with a length, and that memory location will be passed further down to some system code (likely written in C) that has access to the entire user-space memory. Now, without the missing check on the length that allowed the Heartbleed bug, such an error won't allow buffer overruns in either input or output, but my point is that user C code and user CL code are little different in this regard.
On Thu, Apr 24, 2014 at 11:44 PM, Steve Haflich shaflich@gmail.com wrote:
On Thu, Apr 24, 2014 at 10:35 PM, Jean-Claude Beaudoin jean.claude.beaudoin@gmail.com wrote:
On Fri, Apr 25, 2014 at 12:31 AM, Scott L. Burson Scott@sympoiesis.com wrote:
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
"Oversight" might be the wrong way of thinking about this. [...] We changed what _needed_ to be changed, cleaned up a lot of other inelegances, but there was not time or energy to attempt a thorough job. [...]
All I mean by "oversight" is that it was not the product of a deliberate decision. From the tone of your previous message I thought that it must have been deliberate, but now it sounds like I was probably right the first time, though we don't know for sure.
Still, it surprises me. I don't know of any implementation that doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so.
I am surprised too. I always understood it like you Scott but now that re-read the page on aref I see that it is exactly like Steve says, no mention of any exception and a statement that "subscripts" must be a list of valid array indices right from the start of the call to aref. Yet that leaves me even more curious to know which implementation has read the spec as strictly as Steve says it can be even under (safety 3)? Does anyone know any?
I don't know of any and there might not be any, at least among main-line implementations. [...]
But in your paragraph above I'm bothered by its hidden assumption: It suggests that after the ANS was available sneaky implementors studied it kabalistically to find places where annoying error checks could be removed.
I don't read Jean-Claude this way. I think he was expressing surprise at the thought that an implementor might have done that.
To return to my important point, the language of the ANS wont let you read or write from a socket. At some point user application code will have to call some non-ANS functions, and in the real world those functions (just like C) will take a pointer into some overlarge buffer array along with a length, and that memory location will be passed further down to some system code (likely written in C) that has access to the entire user-space memory. Now, without the missing check on the length that allowed the Heartbleed bug, such an error won't allow buffer overruns in either input or output, but my point is that user C code and user CL code are little different in this regard.
It certainly is _possible_ to write an unsafe socket-write function (*) in a CL library. But I still think the _probability_ of someone doing so is substantially smaller in CL than in C. Writing in C is like putting
(declaim (optimize (speed 3) (safety 0))) ; damn the torpedoes!!
at the top of every source file.
When writing a safety-0 function in CL, the unsafe region is much more restricted, and one is more likely to be careful to add explicit bounds checks where appropriate. (I recall only one occasion in my career where I forgot to do this. Koff koff... but the point is, it's not an error one has the opportunity to make very often.)
-- Scott
(* Actually the missing bounds check was on a 'memcpy' call that was being used to prepare the heartbeat reply message, but the effect is the same as if it had been on the socket write.)
Team:
I would like to weigh in here as a security professional who uses Lisp in daily practice. I do Application Security assessments and advise companies on secure coding practices. I do penetration tests and have discovered a zero day in OpenSSL (not anywhere near the severity of Heartbleed.)
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends) - Configuration errors - Logic Flaws - Dangerous use of user input (leading to SQLi, XSS, XSRF) - Improper use of cryptography - Unclear protocol specification (leading to OpenSSL)
So while I would recommend to anyone who will listen to use Lisp (and likely to many who won't) as the base of their application, I would also caution them to not take their eye of the other likely sources of catastrophic application failure.
Finally, one of the most famous positive security stories is Qmail, which handles a significant fraction of all internet mail. It is written in C and has been in use for a very long time.
Thus, I feel Lisp is better but not a total panacea. For example, has the Ironclad library been examined by a cryptographer? Does it, for example, do constant-time comparisons to avoid timing leaks?
wglb
On Fri, Apr 25, 2014 at 2:56 PM, Scott L. Burson Scott@sympoiesis.comwrote:
On Thu, Apr 24, 2014 at 11:44 PM, Steve Haflich shaflich@gmail.com wrote:
On Thu, Apr 24, 2014 at 10:35 PM, Jean-Claude Beaudoin jean.claude.beaudoin@gmail.com wrote:
On Fri, Apr 25, 2014 at 12:31 AM, Scott L. Burson <Scott@sympoiesis.com
wrote:
I've long thought that was an oversight, though now that you point it out, I realize I must have been mistaken.
"Oversight" might be the wrong way of thinking about this. [...] We changed what _needed_ to be changed, cleaned up a lot of other inelegances, but there was not time or energy to
attempt a
thorough job. [...]
All I mean by "oversight" is that it was not the product of a deliberate decision. From the tone of your previous message I thought that it must have been deliberate, but now it sounds like I was probably right the first time, though we don't know for sure.
Still, it surprises me. I don't know of any implementation that doesn't bounds-check aref under normal speed/safety settings, and clearly, users expect them to do so.
I am surprised too. I always understood it like you Scott but now that re-read the page on aref I see that it is exactly like Steve says, no mention of any exception and a statement that "subscripts" must be a list of valid
array
indices right from the start of the call to aref. Yet that leaves me even more curious to know which implementation has read the spec as strictly as Steve says it can be even under (safety 3)? Does anyone know any?
I don't know of any and there might not be any, at least among main-line implementations. [...]
But in your paragraph above I'm bothered by its hidden assumption: It suggests that after the ANS was available sneaky implementors studied it kabalistically to find places where annoying error checks could be
removed.
I don't read Jean-Claude this way. I think he was expressing surprise at the thought that an implementor might have done that.
To return to my important point, the language of the ANS wont let you
read
or write from a socket. At some point user application code will have to call some non-ANS functions, and in the real world those functions (just like C) will take a pointer into some overlarge buffer array along with a length, and that memory location will be passed further down to some
system
code (likely written in C) that has access to the entire user-space
memory.
Now, without the missing check on the length that allowed the Heartbleed bug, such an error won't allow buffer overruns in either input or output, but my point is that user C code and user CL code are little different in this regard.
It certainly is _possible_ to write an unsafe socket-write function (*) in a CL library. But I still think the _probability_ of someone doing so is substantially smaller in CL than in C. Writing in C is like putting
(declaim (optimize (speed 3) (safety 0))) ; damn the torpedoes!!
at the top of every source file.
When writing a safety-0 function in CL, the unsafe region is much more restricted, and one is more likely to be careful to add explicit bounds checks where appropriate. (I recall only one occasion in my career where I forgot to do this. Koff koff... but the point is, it's not an error one has the opportunity to make very often.)
-- Scott
(* Actually the missing bounds check was on a 'memcpy' call that was being used to prepare the heartbeat reply message, but the effect is the same as if it had been on the socket write.)
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
On Fri, Apr 25, 2014 at 4:20 PM, William Lederer william.lederer@gmail.com wrote:
Thus, I feel Lisp is better but not a total panacea. For example, has the Ironclad library been examined by a cryptographer? Does it, for example, do constant-time comparisons to avoid timing leaks?
The answer to these (and many other questions of cryptographic sophistication) is no. Ironclad has many deficiencies that make it unsuitable for serious cryptographic software.
I'm not sure that several constant-time checks can even be implemented in Common Lisp without some serious assistance from and/or knowledge of the implementation.
-Nathan
On Fri, Apr 25, 2014 at 7:24 PM, Antoni Grzymała antoni@grzymala.info wrote:
Tako rzecze Nathan Froyd (2014-04-25, 16:42):
Ironclad has many deficiencies that make it unsuitable for serious cryptographic software.
I'm curious what they would be – would you be able to outline that in more detail?
Sure. In no particular order, and with no claim of exhaustiveness:
- Many ciphers are not safe against timing attacks due to the use of lookup tables. - There's nothing like Go's crypto.subtle (http://golang.org/pkg/crypto/subtle/) package for ensuring that various checks are safe against timing attacks. - The public key algorithms are definitely not production ready: they will give you the correct answers, but the implementations are not cryptographically robust. Part of this is potentially intractable, given that they rely on bignums, and the bignum implementations in Common Lisp implementations are probably not implemented with the needs of public key algorithms in mind. - The DSA signature algorithm doesn't use high-quality random numbers, which makes it unsafe. - I know there are a whole host of issues with implementing RSA safely; Ironclad has not paid attention to any of these. - There's no implementation of padding and all the subtleties that come with it for block cipher algorithms or public key algorithms.
The hash algorithm implementations are pretty solid (assuming that you choose cryptographically secure ones, of course); everything else isn't suitable for security-conscious software.
I would like to fix some of these deficiencies, of course, but I haven't sat down and taken the time to do so. Patches welcome.
-Nathan
I agree with essentially everything in wglb's message, but (once again) I'll grumpily jump in to emphasize a point which I think many on this list have missed.
On Fri, Apr 25, 2014 at 1:20 PM, William Lederer william.lederer@gmail.comwrote:
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends)
- Configuration errors
- Logic Flaws
- Dangerous use of user input (leading to SQLi, XSS, XSRF)
- Improper use of cryptography
- Unclear protocol specification (leading to OpenSSL)
This (IMO entirely worthy and correct) summary can easily be
misunderstood! Lisp may be superior because it has bounds checking. (We've previously agreed that isn't guaranteed since it isn't in the ANS, and in any platform likely depends on optimization qualities, including the optimization qualities under which internal called routines were compiled.) But bugs based on buffer overflow don't on normal operating systems in general involve bounds checking. At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
The Lisp Machine in its several manifestations might be the only counterexample, since there was no C boundary over which to cross, and because type and bounds checking was performed for free in the microcode. But Lisp machines aren't around any more largely because of the economy of scale. The number of x86 and x64 processors on the planet must be nearly on the order of 10^9, while the number of Lisp machine processors never got out of the 10^5 range, so Intel and AMD etc. could justify huge investments making those processors 3 orders of magnitude faster in raw speed. Lisp processors could not have kept up at bearable per-item cost. Alas!
It is certainly true that the Heartbleed bug resulted from an insufficiently-cautious implementation of an (overly?)complex specification. The author of the bug has essentially agreed with this analysis. But the "bounds checking" of most Lisp implementations would provide no protection against this failure (about which the original posting agrees) unless the succinctness and superior clarity of CL vs C code might help it be seen. That's a thin thread on which to hang an entire language argument.
(*) I originally saw this beautiful metaphor, that C treats memory as an undifferentiated sea of bytes, on some discussion list but can't remember the originator. Google shows current use scattered over many programming subjects, but doesnt identify the original. Anyway, it is the reason that a small hyper-efficient C-struct-in-Lisp defining macro I wrote for a certain huge CL software product is named "define-sea-struct" and (I used to be a sailor) the operator for computing offsets possibly through multiple levels of nested structs is called "following-sea". Paradoxically, http://www.ibiblio.org/hyperwar/NHC/fairwinds.htm says "following seas" means "SAFE journey, good fortune" [emphasis added].
On Sun, Apr 27, 2014 at 4:27 AM, Steve Haflich shaflich@gmail.com wrote:
I agree with essentially everything in wglb's message, but (once again) I'll grumpily jump in to emphasize a point which I think many on this list have missed.
On Fri, Apr 25, 2014 at 1:20 PM, William Lederer < william.lederer@gmail.com> wrote:
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends)
- Configuration errors
- Logic Flaws
- Dangerous use of user input (leading to SQLi, XSS, XSRF)
- Improper use of cryptography
- Unclear protocol specification (leading to OpenSSL)
This (IMO entirely worthy and correct) summary can easily be
misunderstood! Lisp may be superior because it has bounds checking. (We've previously agreed that isn't guaranteed since it isn't in the ANS, and in any platform likely depends on optimization qualities, including the optimization qualities under which internal called routines were compiled.) But bugs based on buffer overflow don't on normal operating systems in general involve bounds checking. At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
This is essentially the point I made in my email on April 13; an application program these days (even one written in Lisp) necessarily depends on a large set of libraries and support software that the application programmer has little to no control over. Naive pronouncements that we should simply write all our code in Lisp (or another "safer" language) are almost guaranteed to have limited effect because many security problems are manifest in code we depend on that is simply out of our control. Rebuilding the entire ecosystem that our applications sit on is economically infeasible and still leaves us open to the possibility of security problems in the underlying hardware (which have been shown to be real and to have been recently exploited). This in no way implies that we should not STRIVE to do better, but illustrates that the issue is more complicated than language A vs language B.
Further, assertions that compiler writers of language A tend to write compilers (or, in this case, standard libraries) that aren't safe in some way while writers of compilers for language B write systems that are is, frankly, self-congratulatory naval gazing.
The Lisp Machine in its several manifestations might be the only
counterexample,
This, however, I disagree with. There are operating systems that deal solely with managed-code objects. If one considers, e.g., IL to be the "hardware" that sits on top of the underlying native instruction set acting as microcode, then Microsoft's Singularity system could be described as approximately equivalent to a Lisp machine in this regard.
since there was no C boundary over which to cross, and because type and
bounds checking was performed for free in the microcode. But Lisp machines aren't around any more largely because of the economy of scale. The number of x86 and x64 processors on the planet must be nearly on the order of 10^9, while the number of Lisp machine processors never got out of the 10^5 range, so Intel and AMD etc. could justify huge investments making those processors 3 orders of magnitude faster in raw speed. Lisp processors could not have kept up at bearable per-item cost. Alas!
It is certainly true that the Heartbleed bug resulted from an insufficiently-cautious implementation of an (overly?)complex specification. The author of the bug has essentially agreed with this analysis. But the "bounds checking" of most Lisp implementations would provide no protection against this failure (about which the original posting agrees) unless the succinctness and superior clarity of CL vs C code might help it be seen. That's a thin thread on which to hang an entire language argument.
Actually, I'm not sure about that; in this case, the boundary violation was real and due to not taking into account the length of the input (e.g., one memcpy'd more than had been provided, reading off the end of the source buffer). But it was a rookie C programmer mistake, and I agree that this is indeed scant ammunition in a language beef.
(*) I originally saw this beautiful metaphor, that C treats memory as an
undifferentiated sea of bytes, on some discussion list but can't remember the originator. Google shows current use scattered over many programming subjects, but doesnt identify the original. Anyway, it is the reason that a small hyper-efficient C-struct-in-Lisp defining macro I wrote for a certain huge CL software product is named "define-sea-struct" and (I used to be a sailor) the operator for computing offsets possibly through multiple levels of nested structs is called "following-sea". Paradoxically, http://www.ibiblio.org/hyperwar/NHC/fairwinds.htm says "following seas" means "SAFE journey, good fortune" [emphasis added].
Semper Fi.
- Dan C.
Dan:
I mostly agree with what you are saying. However there is one point in much of this discussion that may not be covered.
Further, assertions that compiler writers of language A tend to write compilers (or, in this case, standard libraries) that aren't safe in some way while writers of compilers for language B write systems that are is, frankly, self-congratulatory naval gazing.
There is a fundamental practical difference between C and Lisp that is relevant in the security world. That is of the vast number of explicitly undefined behaviors that are in the specification of C. This is pretty much unmatched in Lisp or C# or Java. John Reghur at http://blog.regehr.org/ has done some fascinating work not only about undefined behaviors of C, but also of the substantial number of bugs in compilers.
And while what you say is true about dependencies on other libraries (this is always a major item we check for when doing assessments) is a risk for all systems (except for qmail), a significant fraction of all breaches are a result of logic errors or configuration errors. These errors compromise all systems equally, regardless of the language underneath.
wglb
On Sun, Apr 27, 2014 at 10:20 AM, Dan Cross crossd@gmail.com wrote:
On Sun, Apr 27, 2014 at 4:27 AM, Steve Haflich shaflich@gmail.com wrote:
I agree with essentially everything in wglb's message, but (once again) I'll grumpily jump in to emphasize a point which I think many on this list have missed.
On Fri, Apr 25, 2014 at 1:20 PM, William Lederer < william.lederer@gmail.com> wrote:
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends)
- Configuration errors
- Logic Flaws
- Dangerous use of user input (leading to SQLi, XSS, XSRF)
- Improper use of cryptography
- Unclear protocol specification (leading to OpenSSL)
This (IMO entirely worthy and correct) summary can easily be
misunderstood! Lisp may be superior because it has bounds checking. (We've previously agreed that isn't guaranteed since it isn't in the ANS, and in any platform likely depends on optimization qualities, including the optimization qualities under which internal called routines were compiled.) But bugs based on buffer overflow don't on normal operating systems in general involve bounds checking. At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
This is essentially the point I made in my email on April 13; an application program these days (even one written in Lisp) necessarily depends on a large set of libraries and support software that the application programmer has little to no control over. Naive pronouncements that we should simply write all our code in Lisp (or another "safer" language) are almost guaranteed to have limited effect because many security problems are manifest in code we depend on that is simply out of our control. Rebuilding the entire ecosystem that our applications sit on is economically infeasible and still leaves us open to the possibility of security problems in the underlying hardware (which have been shown to be real and to have been recently exploited). This in no way implies that we should not STRIVE to do better, but illustrates that the issue is more complicated than language A vs language B.
Further, assertions that compiler writers of language A tend to write compilers (or, in this case, standard libraries) that aren't safe in some way while writers of compilers for language B write systems that are is, frankly, self-congratulatory naval gazing.
The Lisp Machine in its several manifestations might be the only
counterexample,
This, however, I disagree with. There are operating systems that deal solely with managed-code objects. If one considers, e.g., IL to be the "hardware" that sits on top of the underlying native instruction set acting as microcode, then Microsoft's Singularity system could be described as approximately equivalent to a Lisp machine in this regard.
since there was no C boundary over which to cross, and because type and
bounds checking was performed for free in the microcode. But Lisp machines aren't around any more largely because of the economy of scale. The number of x86 and x64 processors on the planet must be nearly on the order of 10^9, while the number of Lisp machine processors never got out of the 10^5 range, so Intel and AMD etc. could justify huge investments making those processors 3 orders of magnitude faster in raw speed. Lisp processors could not have kept up at bearable per-item cost. Alas!
It is certainly true that the Heartbleed bug resulted from an insufficiently-cautious implementation of an (overly?)complex specification. The author of the bug has essentially agreed with this analysis. But the "bounds checking" of most Lisp implementations would provide no protection against this failure (about which the original posting agrees) unless the succinctness and superior clarity of CL vs C code might help it be seen. That's a thin thread on which to hang an entire language argument.
Actually, I'm not sure about that; in this case, the boundary violation was real and due to not taking into account the length of the input (e.g., one memcpy'd more than had been provided, reading off the end of the source buffer). But it was a rookie C programmer mistake, and I agree that this is indeed scant ammunition in a language beef.
(*) I originally saw this beautiful metaphor, that C treats memory as an
undifferentiated sea of bytes, on some discussion list but can't remember the originator. Google shows current use scattered over many programming subjects, but doesnt identify the original. Anyway, it is the reason that a small hyper-efficient C-struct-in-Lisp defining macro I wrote for a certain huge CL software product is named "define-sea-struct" and (I used to be a sailor) the operator for computing offsets possibly through multiple levels of nested structs is called "following-sea". Paradoxically, http://www.ibiblio.org/hyperwar/NHC/fairwinds.htm says "following seas" means "SAFE journey, good fortune" [emphasis added].
Semper Fi.
- Dan C.
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
On Sun, Apr 27, 2014 at 1:27 AM, Steve Haflich shaflich@gmail.com wrote:
At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
I think we've all understood that.
But here's the thing. If you're writing at the application level in Lisp (or Java, or Python, or Ruby, or ...) you're probably not going to code the foreign call to 'write' yourself. You're probably going to invoke some stream operation that was written either by the Lisp implementor or by the author of a portable library. This means the person who writes the foreign call (a) is probably more experienced and in more of a mindset to think about things like bounds checking; (b) is therefore likelier to insert a check that the number of bytes you want written is no greater than the length of the array you've provided; and (c) has the information available at that point in the code to make that check, because the array is not represented as just a raw byte pointer. The last point is the most important: the library writer _can_ make the check, which is not true the way things are usually done in C.
C, in contrast, has people writing dangerous low-level calls _all the time_, in _application_ code. The odds of it being done correctly every time are far poorer -- in practice, approximately zero, in any substantial program.
It certainly is possible to write better C libraries that relieve the application programmer of some of this burden; both Microsoft and Apple have done some of this. But the use of these libraries is not yet routine in portable POSIX code, and I don't know that they would have caught the Heartbleed bug anyway.
I'm not suggesting that bounds errors are the only source of security vulnerabilities -- William's list is a good one -- nor that use of a "safe" language is an absolute guarantee that one won't have them. But in practice, an attacker's time is not well spent looking for bounds errors in applications written in Lisp/Java/etc. It _is_ well spent looking for them in C code.
-- Scott
On Sun, Apr 27, 2014 at 4:27 AM, Steve Haflich shaflich@gmail.com wrote:
I agree with essentially everything in wglb's message, but (once again) I'll grumpily jump in to emphasize a point which I think many on this list have missed.
On Fri, Apr 25, 2014 at 1:20 PM, William Lederer < william.lederer@gmail.com> wrote:
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends)
- Configuration errors
- Logic Flaws
- Dangerous use of user input (leading to SQLi, XSS, XSRF)
- Improper use of cryptography
- Unclear protocol specification (leading to OpenSSL)
This (IMO entirely worthy and correct) summary can easily be
misunderstood! Lisp may be superior because it has bounds checking. (We've previously agreed that isn't guaranteed since it isn't in the ANS, and in any platform likely depends on optimization qualities, including the optimization qualities under which internal called routines were compiled.) But bugs based on buffer overflow don't on normal operating systems in general involve bounds checking. At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
The Lisp Machine in its several manifestations might be the only counterexample, since there was no C boundary over which to cross, and because type and bounds checking was performed for free in the microcode. But Lisp machines aren't around any more largely because of the economy of scale. The number of x86 and x64 processors on the planet must be nearly on the order of 10^9, while the number of Lisp machine processors never got out of the 10^5 range, so Intel and AMD etc. could justify huge investments making those processors 3 orders of magnitude faster in raw speed. Lisp processors could not have kept up at bearable per-item cost. Alas!
I think it is not only a question of level of investment or either a question of Lisp in hardware or an other higher language in hardware for that matter. There seems to be some physico-technical optimality point in question here at the hardware/software interface. From my (fading?) memories of a past era I can somewhat recall that the last (I think) major CPU architecture that took security support seriously in hardware was the Intel iAPX 432, with multiple nested security rings in hardware/descriptor supported gates/instructions. (BTW, the 432 was meant to support Ada of all languages, not C or Lisp, but it was general-purpose enough). And history has recorded how well this iAPX 432 architecture flew.
And while I am using the word "fly" I have that urge to ask you guys that question: What would you personally fly, software written in C or software written in Common Lisp? And I mean it quite literally, with you sitting in the plane. I think that the fact that one can seriously ask that question is one of the most significant evolution in the demands the general context presents to any programming language standard. In 1994 fly-by-wire was cutting edge and still quite experimental, now in 2014 it is the reality of every day on routine commercial flight.
I see (a somewhat revised?) Common Lisp as a very good starting point to address this new reality. Better than C, that is for sure. (How can you even hope to make C more safe and secure and yet still be C is beyond my understanding, FWIW).
Regarding the question
What would you personally fly, software written in C or software written in Common Lisp?
In the reality of today's fly-by-wire, the modern planes you fly in are likely to have C in some critical component. Ada is likely there as well.
But let's just examine a few software related disasters to see if they are attributable to programming language:
- Ariane 5 rocket explosion: from the official report: This loss of information was due to specification and design errors in the software of the inertial reference system. - Mars Climate Orbiter: one system used metric units, another used English - Therac 5: improper understanding of multi-tasking code - Heartbleed: Overly complex protocol combined with being able to read beyond allocated memory
Of these, only heartbleed can credit language as a contributing factor.
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
I do prefer Lisp, and as I have said before, I think it is easier to write correct and thus secure and safe programs in Lisp, but that is only a small part of the story. Other critical parts to the story are:
- How well is the software specified? - Who is the team writing the software? Are they CMM level 5? - Is the software tuned to the user situation at hand? When the engine exploded on the Quantas flight 32, the pilots had to deal with an almost overwhelming number of alerts.
You do ask a good question, but in my opinion, choice of language is not at the top of the list.
wglb (P. S. I am not a lisp expert, but I have been programming for 48 years, including real-time medical software, compilers, financial feed software. For the medical system, we used assembly language. C had not yet been invented, but turns out that doing coroutines in assembler was better than threads showing up later in C.)
On Sun, Apr 27, 2014 at 10:31 PM, Jean-Claude Beaudoin < jean.claude.beaudoin@gmail.com> wrote:
On Sun, Apr 27, 2014 at 4:27 AM, Steve Haflich shaflich@gmail.com wrote:
I agree with essentially everything in wglb's message, but (once again) I'll grumpily jump in to emphasize a point which I think many on this list have missed.
On Fri, Apr 25, 2014 at 1:20 PM, William Lederer < william.lederer@gmail.com> wrote:
I agree with the general sentiment that Lisp is a much safer language to build anything in. While several in this thread are pointing to bounds checking as one of the advantages that Lisp has over C and other languages, there is something else I find that is also very strong: It is easier to write programs about which a reader can reason about correctness. In Lisp, the programs tend to be closer to provable and errors are more evident. As in "obviously no deficiencies" vs "no obvious deficiencies".
But in my experience, vulnerabilities result from
- Buffer Overflows/lack of bounds checking (Heartbleed and friends)
- Configuration errors
- Logic Flaws
- Dangerous use of user input (leading to SQLi, XSS, XSRF)
- Improper use of cryptography
- Unclear protocol specification (leading to OpenSSL)
This (IMO entirely worthy and correct) summary can easily be
misunderstood! Lisp may be superior because it has bounds checking. (We've previously agreed that isn't guaranteed since it isn't in the ANS, and in any platform likely depends on optimization qualities, including the optimization qualities under which internal called routines were compiled.) But bugs based on buffer overflow don't on normal operating systems in general involve bounds checking. At some point on any modern OS, reading or writing to a socket stream will involve passing to the OS (generally via a thin user-mode C API layer like *nix read() and write(), or some socket analogue). Neither Lisp nor C will provide any automatic bounds checking on such a call. The OS treats the application's address space as a mostly-contiguous undifferentiated sea of bytes(*). It doesn't matter that at the app level C also has this model of a sea of bytes, while in Lisp the ocean is run-time tagged into small plots. That distinction disappears once one calls write(fd,buf,len).
The Lisp Machine in its several manifestations might be the only counterexample, since there was no C boundary over which to cross, and because type and bounds checking was performed for free in the microcode. But Lisp machines aren't around any more largely because of the economy of scale. The number of x86 and x64 processors on the planet must be nearly on the order of 10^9, while the number of Lisp machine processors never got out of the 10^5 range, so Intel and AMD etc. could justify huge investments making those processors 3 orders of magnitude faster in raw speed. Lisp processors could not have kept up at bearable per-item cost. Alas!
I think it is not only a question of level of investment or either a question of Lisp in hardware or an other higher language in hardware for that matter. There seems to be some physico-technical optimality point in question here at the hardware/software interface. From my (fading?) memories of a past era I can somewhat recall that the last (I think) major CPU architecture that took security support seriously in hardware was the Intel iAPX 432, with multiple nested security rings in hardware/descriptor supported gates/instructions. (BTW, the 432 was meant to support Ada of all languages, not C or Lisp, but it was general-purpose enough). And history has recorded how well this iAPX 432 architecture flew.
And while I am using the word "fly" I have that urge to ask you guys that question: What would you personally fly, software written in C or software written in Common Lisp? And I mean it quite literally, with you sitting in the plane. I think that the fact that one can seriously ask that question is one of the most significant evolution in the demands the general context presents to any programming language standard. In 1994 fly-by-wire was cutting edge and still quite experimental, now in 2014 it is the reality of every day on routine commercial flight.
I see (a somewhat revised?) Common Lisp as a very good starting point to address this new reality. Better than C, that is for sure. (How can you even hope to make C more safe and secure and yet still be C is beyond my understanding, FWIW).
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
Tako rzecze William Lederer (2014-04-28, 09:09):
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
I think you should stop gloryfying qmail, it has known bugs, violates some RFC's and the author (who turns out to be rather arrogant here) wouldn't pay out the bounty:
http://www.dt.e-technik.uni-dortmund.de/~ma/qmail-bugs.html
Sorry, I am familiar with the controversy regarding his personality and his argument about the denial of service issues and the claimed security bug that happens if the size allocated to qmail exceeds the number of bytes countable in 32 bits. Yes, he is arrogant, but he does work of the first order.
I stand by my recommendation, and stand by the assertion that secure coding can and has been done in C.
What is lost in this controversy is the sheer magnitude of vulnerabilities in sendmail historically.
wglb
On Mon, Apr 28, 2014 at 9:19 AM, Antoni Grzymała antoni@grzymala.infowrote:
Tako rzecze William Lederer (2014-04-28, 09:09):
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
I think you should stop gloryfying qmail, it has known bugs, violates some RFC's and the author (who turns out to be rather arrogant here) wouldn't pay out the bounty:
http://www.dt.e-technik.uni-dortmund.de/~ma/qmail-bugs.html
-- [アントシカ]
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
Also, DJB also wrote a replacement for the bug-infested BIND called djbdns. That too had a security guarantee. Someone found a bug, and DJB paid out $1000.
wglb
On Mon, Apr 28, 2014 at 9:19 AM, Antoni Grzymała antoni@grzymala.infowrote:
Tako rzecze William Lederer (2014-04-28, 09:09):
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
I think you should stop gloryfying qmail, it has known bugs, violates some RFC's and the author (who turns out to be rather arrogant here) wouldn't pay out the bounty:
http://www.dt.e-technik.uni-dortmund.de/~ma/qmail-bugs.html
-- [アントシカ]
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
William Lederer william.lederer@gmail.com writes:
Regarding the question
What would you personally fly, software written in C or software written in Common Lisp?
In the reality of today's fly-by-wire, the modern planes you fly in are likely to have C in some critical component. Ada is likely there as well.
But let's just examine a few software related disasters to see if they are attributable to programming language:
Ariane 5 rocket explosion: from the official report: This loss of information was due to specification and design errors in the software of the inertial reference system. Mars Climate Orbiter: one system used metric units, another used English Therac 5: improper understanding of multi-tasking code Heartbleed: Overly complex protocol combined with being able to read beyond allocated memory
Of these, only heartbleed can credit language as a contributing factor.
Not at all.
* Programmed in Common Lisp, either the fixnum in the Ariane 5 would have been converted into a bignum, or an condition would have been signaled, which could have been handled. This would have taken time, which could perhaps have "exploded" the real time constraints, but it is better to control your rocket slughishly than not to control it at all.
* Programmed in Common Lisp, instead of using raw numbers of physical magnitudes, you'd use objects such as:
(+ #<kilometer/hour 5.42> #<foot/fortnight 12857953.0> ) --> #<meter/second 4.7455556>
and Mars Climate Orbiter wouldn't have crashed.
* Programmed in Common Lisp, the Therac-5 bug wouldn't have occured:
"The defect was as follows: a one-byte counter in a testing routine frequently overflowed; if an operator provided manual input to the machine at the precise moment that this counter overflowed, the interlock would fail."
since again, incrementing a counter doesn't fucking overflow in lisp!
* Programmed in Common Lisp, heartbleed wouldn't have occured, because lisp implementors provide array bound checks, and lisp programmers are conscious enough to run always with (safety 3), as previously discussed in this thread.
What I'm saying is that there's a mind set out-there, of using modular arithmetic to approximate arithmetic blindly. Until you will be able to pay $1.29 for 3 kg of apples @ $2.99, people should not program with modular arithmetic!
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
postfix too is architectured to deal with security.
You can also write secure software on a Turing Machine.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
This is another question, that of the resources invested in a software ecosystem, and that of programming language mind share. Why the cryptographists don't write their libraries in Common Lisp and choose to produce piles of C instead?
Regarding why programmers don't write libraries in common lisp (ignoring what seems to be a screaming terror of the parenthetical, functional world) is that cryptography must be fast, and it must not leak timing information.
A final word here--I spend my days auditing and pen testing programs written in managed languages: C# and Java. None of the errors that bring down systems and lead to breaches in these languages result from bounds checking or buffer overflow issues. None of them are subject to the same kinds of flaws C exposes as evidenced by heartbleed. Nonetheless, there are vulnerabilities.
And I am sure that all remember the vulnerability exposed in Ycombinator which is written in Lisp. Simply writing your stuff in Lisp is not enough.
wglb
On Mon, Apr 28, 2014 at 5:40 PM, Pascal J. Bourguignon < pjb@informatimago.com> wrote:
William Lederer william.lederer@gmail.com writes:
Regarding the question
What would you personally fly, software written in C or software written in Common Lisp?
In the reality of today's fly-by-wire, the modern planes you fly in are likely to have C in some critical component. Ada is likely there as well.
But let's just examine a few software related disasters to see if they are attributable to programming language:
Ariane 5 rocket explosion: from the official report: This loss of information was due to specification and design errors in the software of the inertial reference system. Mars Climate Orbiter: one system used metric units, another used English Therac 5: improper understanding of multi-tasking code Heartbleed: Overly complex protocol combined with being able to read beyond allocated memory
Of these, only heartbleed can credit language as a contributing factor.
Not at all.
Programmed in Common Lisp, either the fixnum in the Ariane 5 would have been converted into a bignum, or an condition would have been signaled, which could have been handled. This would have taken time, which could perhaps have "exploded" the real time constraints, but it is better to control your rocket slughishly than not to control it at all.
Programmed in Common Lisp, instead of using raw numbers of physical magnitudes, you'd use objects such as:
(+ #<kilometer/hour 5.42> #<foot/fortnight 12857953.0> ) --> #<meter/second 4.7455556>
and Mars Climate Orbiter wouldn't have crashed.
Programmed in Common Lisp, the Therac-5 bug wouldn't have occured:
"The defect was as follows: a one-byte counter in a testing routine frequently overflowed; if an operator provided manual input to the machine at the precise moment that this counter overflowed, the interlock would fail."
since again, incrementing a counter doesn't fucking overflow in lisp!
Programmed in Common Lisp, heartbleed wouldn't have occured, because lisp implementors provide array bound checks, and lisp programmers are conscious enough to run always with (safety 3), as previously discussed in this thread.
What I'm saying is that there's a mind set out-there, of using modular arithmetic to approximate arithmetic blindly. Until you will be able to pay $1.29 for 3 kg of apples @ $2.99, people should not program with modular arithmetic!
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
postfix too is architectured to deal with security.
You can also write secure software on a Turing Machine.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
This is another question, that of the resources invested in a software ecosystem, and that of programming language mind share. Why the cryptographists don't write their libraries in Common Lisp and choose to produce piles of C instead?
-- __Pascal Bourguignon__ http://www.informatimago.com/ "Le mercure monte ? C'est le moment d'acheter !"
pro mailing list pro@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/pro
On Mon, Apr 28, 2014 at 09:23:03PM -0500, William Lederer wrote:
Regarding why programmers don't write libraries in common lisp (ignoring what seems to be a screaming terror of the parenthetical, functional world) is that cryptography must be fast, and it must not leak timing information.
A final word here--I spend my days auditing and pen testing programs written in managed languages: C# and Java. None of the errors that bring down systems and lead to breaches in these languages result from bounds checking or buffer overflow issues. None of them are subject to the same kinds of flaws C exposes as evidenced by heartbleed. Nonetheless, there are vulnerabilities.
And I am sure that all remember the vulnerability exposed in Ycombinator which is written in Lisp. Simply writing your stuff in Lisp is not enough.
And that is a point that bears repeating: Whatever programming language you end up using, it will not magically protect you from all errors or mistakes. Depending on its design and other details, it might protect you from _some_ classes of errors (such as shooting yourself in the foot with pointers), but no matter what language, there _will_ still be plenty of beartraps patiently waiting for the unwary. Heck, even something as heavily discipline-and-bondage as SPARK ADA leaves opportunities to screw up big time - just get your design assumptions wrong and you can be toast.
Kind regard, Alex.
On Tue, Apr 29, 2014 at 12:40:09AM +0200, Pascal J. Bourguignon wrote:
William Lederer william.lederer@gmail.com writes:
Regarding the question
What would you personally fly, software written in C or software written in Common Lisp?
In the reality of today's fly-by-wire, the modern planes you fly in are likely to have C in some critical component. Ada is likely there as well.
But let's just examine a few software related disasters to see if they are attributable to programming language:
Ariane 5 rocket explosion: from the official report: This loss of information was due to specification and design errors in the software of the inertial reference system. Mars Climate Orbiter: one system used metric units, another used English Therac 5: improper understanding of multi-tasking code Heartbleed: Overly complex protocol combined with being able to read beyond allocated memory
Of these, only heartbleed can credit language as a contributing factor.
Not at all.
Any programming language will have a hard time protecting you from design/specification errors. And providing bandaids that paper over design problems doesn't really help.
- Programmed in Common Lisp, either the fixnum in the Ariane 5 would have been converted into a bignum, or an condition would have been signaled, which could have been handled. This would have taken time, which could perhaps have "exploded" the real time constraints, but it is better to control your rocket slughishly than not to control it at all.
That was not the real problem. The root cause was the design assumption that overflowing value was _physically_ limited, i.e. during normal operation it would have been impossible to overflow and an overflow would in fact have signaled some serious problems bad enough to abort. While this held true in Ariane 4, it no longer was true in the more powerful Ariane 5.
Your "solution" would have papered over the flawed design assumptions, which is _not_ the same is fixing them.
Programmed in Common Lisp, instead of using raw numbers of physical magnitudes, you'd use objects such as:
(+ #<kilometer/hour 5.42> #<foot/fortnight 12857953.0> ) --> #<meter/second 4.7455556>
and Mars Climate Orbiter wouldn't have crashed.
This is ridiculous. If you end up mixing measurement systems (such as metric and imperial) in the same project, you are _already_ doing it horribly wrong. The design fault was mixing measurement systems, which one should _never_ do on pain of embarassing failure. Papering over this design screwup with a language environment that _supports_ this (instead of screaming bloody murder at such nonsense) doesn't really help here.
Programmed in Common Lisp, the Therac-5 bug wouldn't have occured:
"The defect was as follows: a one-byte counter in a testing routine frequently overflowed; if an operator provided manual input to the machine at the precise moment that this counter overflowed, the interlock would fail."
But why did the counter overflow in the first place? Was it simply programmer oversight that too small a datatype was used or was this actually an error that just didn't have noticeable consequences most of the times? If the later, then again, papering over it with a never overflowing counter is not a fix.
since again, incrementing a counter doesn't fucking overflow in lisp!
- Programmed in Common Lisp, heartbleed wouldn't have occured, because lisp implementors provide array bound checks, and lisp programmers are conscious enough to run always with (safety 3), as previously discussed in this thread.
Hehe, "conscious enough to run always with (safety 3)". Riiiiight. And nobody was ever tempted to trade a little runtime safety for speed, correct?
As for heartbleed: arguably, the RFC that the broken code implemented shouldn't have existed in the first place.
What I'm saying is that there's a mind set out-there, of using modular arithmetic to approximate arithmetic blindly. Until you will be able to pay $1.29 for 3 kg of apples @ $2.99, people should not program with modular arithmetic!
Well, modular arithmetic doesn't go away because one wishes it so. As a developer doing non time critical high level work one might be able to cheerfully ignore it, but the moment one writes sufficiently time critical or low level code one will have to deal with it. Because modular arithmetic is what your CPU is doing - unless you happen to have a CPU at hand that does bignums natively at the register level? No? Funny that.
And I again point out a software non-disaster qmail, whose author offered a bug bounty. Secure programs can be written in C.
postfix too is architectured to deal with security.
You can also write secure software on a Turing Machine.
Software running on _actual_ Turing machines tends to be of mostly limited use, though.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
This is another question, that of the resources invested in a software ecosystem, and that of programming language mind share. Why the cryptographists don't write their libraries in Common Lisp and choose to produce piles of C instead?
Usefulness. If I write a library in C, pretty much everything that runs on Unix can link to it (if need be, via FFI and friends) and use it. If I write a library i Common Lisp, then code written in Common Lisp can use it unless people are willing to do some interesting contortions (such wrapping it in an RPC server).
Exercise for the interested: write a library in Common Lisp that does, say, some random data frobnication and try to use it from: C, Python, Perl, C++ _without_ writing new interface infrastructure.
Kind regards, Alex.
On 29 Apr 2014, at 09:12, Alexander Schreiber als@thangorodrim.de wrote:
On Tue, Apr 29, 2014 at 12:40:09AM +0200, Pascal J. Bourguignon wrote:
- Programmed in Common Lisp, either the fixnum in the Ariane 5 would have been converted into a bignum, or an condition would have been signaled, which could have been handled. This would have taken time, which could perhaps have "exploded" the real time constraints, but it is better to control your rocket slughishly than not to control it at all.
That was not the real problem. The root cause was the design assumption that overflowing value was _physically_ limited, i.e. during normal operation it would have been impossible to overflow and an overflow would in fact have signaled some serious problems bad enough to abort. While this held true in Ariane 4, it no longer was true in the more powerful Ariane 5.
Your "solution" would have papered over the flawed design assumptions, which is _not_ the same is fixing them.
You’re forgetting we’re talking about embedded programs with real-time processes. You don’t have the time to stop everything and “debug” the design. You have to control a rocket and avoid it crashing!
That’s the reason I’ve not mentionned RAX yet: the situation was quite different, since they had the time to perform remote debugging, over several days.
- Programmed in Common Lisp, instead of using raw numbers of physical
magnitudes, you'd use objects such as:
(+ #<kilometer/hour 5.42> #<foot/fortnight 12857953.0> ) --> #<meter/second 4.7455556>
and Mars Climate Orbiter wouldn't have crashed.
This is ridiculous. If you end up mixing measurement systems (such as metric and imperial) in the same project, you are _already_ doing it horribly wrong.
It wasn’t in the same project. The data was actually sent from a remote Earth station. So this is even worse than not using magnitude with units inside the process, it was a serialization/deserialization error. But notice how Lisp prints out the speeds above! It writes the units along with the values!
Now, of course it’s not a programming language question. We already determined that, when noting that neither the ANSI Common Lisp nor the ANSI C standard imposes bound checking, but that C programmers don’t code bound checkings, and C implementers, being C programmers, implement compilers that don’t do bound checking, while the inverse is true of Common Lisp programmers.
This is always the same thing: “statically typed” proponents want to separate the checks from the code, performing (or not) the checks during design/proof/compilation, while “dynamically typed” proponents keep the checks inside the code, making the compiler and system generate and perform all the typing, bounds, etc checks at run-time. So when a C guy (any statically typed guy) sends data, he expects that the type and bounds of the data are know (before hand, by both parties). But when a Lisp guy (any dynamically typed guy) sends data, he sends it in a syntactic form that explicitely types it, and the data is parsed, validated, bound checked and typed according to the transmitted syntax on the receiving end.
Of course, generating C code doesn’t mean that you can’t design your system in a "dynamically typed” spirit. But this is not the natural noosphere of the C ecosystem.
The design fault was mixing measurement systems, which one should _never_ do on pain of embarassing failure. Papering over this design screwup with a language environment that _supports_ this (instead of screaming bloody murder at such nonsense) doesn't really help here.
Again, we are talking about an embedded program, in a real time system, where you have only seconds of burn stage on re-entry, and where you DON’T HAVE THE TIME to detect, debug, come back to the design board, compile and upload a new version!
The software that uploaded the untagged, without units, bit field *data*, instead of some meaningful *information*, hadn’t even been completed before the orbiter was in space! It wasn’t developed by the same team, and wasn’t compiled into the same executable.
Nonetheless, here a lisper would have sent *information* in a sexp, and dynamic checks and conversions would have been done.
If you will, the design would have been different in the first place!
Programmed in Common Lisp, the Therac-5 bug wouldn't have occured:
"The defect was as follows: a one-byte counter in a testing routine frequently overflowed; if an operator provided manual input to the machine at the precise moment that this counter overflowed, the interlock would fail."
But why did the counter overflow in the first place? Was it simply programmer oversight that too small a datatype was used or was this actually an error that just didn't have noticeable consequences most of the times? If the later, then again, papering over it with a never overflowing counter is not a fix.
But it if was a problem, it *would* eventually reach a bound check, and signal a condition, thus stopping the process of irradiating and killing people.
Remember: a Lisp program (any "dynamically typed” program) is FULL of checks!
since again, incrementing a counter doesn't fucking overflow in lisp!
- Programmed in Common Lisp, heartbleed wouldn't have occured, because lisp implementors provide array bound checks, and lisp programmers are conscious enough to run always with (safety 3), as previously discussed in this thread.
Hehe, "conscious enough to run always with (safety 3)". Riiiiight. And nobody was ever tempted to trade a little runtime safety for speed, correct?
Those are C programmers. You won’t find any other safety that 3 in my code. You should not find any other safety than 3 in mission critical code, much less in life threatening code.
As for heartbleed: arguably, the RFC that the broken code implemented shouldn't have existed in the first place.
What I'm saying is that there's a mind set out-there, of using modular arithmetic to approximate arithmetic blindly. Until you will be able to pay $1.29 for 3 kg of apples @ $2.99, people should not program with modular arithmetic!
Well, modular arithmetic doesn't go away because one wishes it so. As a developer doing non time critical high level work one might be able to cheerfully ignore it, but the moment one writes sufficiently time critical or low level code one will have to deal with it. Because modular arithmetic is what your CPU is doing - unless you happen to have a CPU at hand that does bignums natively at the register level? No? Funny that.
This might have been true in 1968, when adding a bit of memory added 50 gr. of payload!
Nowadays, there’s no excuse.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
This is another question, that of the resources invested in a software ecosystem, and that of programming language mind share. Why the cryptographists don't write their libraries in Common Lisp and choose to produce piles of C instead?
Usefulness. If I write a library in C, pretty much everything that runs on Unix can link to it (if need be, via FFI and friends) and use it. If I write a library i Common Lisp, then code written in Common Lisp can use it unless people are willing to do some interesting contortions (such wrapping it in an RPC server).
Anything running on unix can link to libecl.so (which is ironically a CL implementation using gcc, but we can assume it’s a temporary solution).
Exercise for the interested: write a library in Common Lisp that does, say, some random data frobnication and try to use it from: C, Python, Perl, C++ _without_ writing new interface infrastructure.
But the point is to eliminate code written in C, Perl, C++! So your exercise is academic.
— __Pascal Bourguignon__
On Tue, Apr 29, 2014 at 10:31:05AM +0200, Hans Hübner wrote:
For your amusement:
https://github.com/search?q=exec($_GET&ref=cmdform&type=Code
Execing straight from the network? What could possibly go wrong ...
Kind regards, Alex.
On Tue, Apr 29, 2014 at 09:45:55AM +0200, Pascal J. Bourguignon wrote:
On 29 Apr 2014, at 09:12, Alexander Schreiber als@thangorodrim.de wrote:
On Tue, Apr 29, 2014 at 12:40:09AM +0200, Pascal J. Bourguignon wrote:
- Programmed in Common Lisp, either the fixnum in the Ariane 5 would have been converted into a bignum, or an condition would have been signaled, which could have been handled. This would have taken time, which could perhaps have "exploded" the real time constraints, but it is better to control your rocket slughishly than not to control it at all.
That was not the real problem. The root cause was the design assumption that overflowing value was _physically_ limited, i.e. during normal operation it would have been impossible to overflow and an overflow would in fact have signaled some serious problems bad enough to abort. While this held true in Ariane 4, it no longer was true in the more powerful Ariane 5.
Your "solution" would have papered over the flawed design assumptions, which is _not_ the same is fixing them.
You’re forgetting we’re talking about embedded programs with real-time processes. You don’t have the time to stop everything and “debug” the design. You have to control a rocket and avoid it crashing!
Who spoke about debugging a live rocket?
- Programmed in Common Lisp, instead of using raw numbers of physical
magnitudes, you'd use objects such as:
(+ #<kilometer/hour 5.42> #<foot/fortnight 12857953.0> ) --> #<meter/second 4.7455556>
and Mars Climate Orbiter wouldn't have crashed.
This is ridiculous. If you end up mixing measurement systems (such as metric and imperial) in the same project, you are _already_ doing it horribly wrong.
It wasn’t in the same project. The data was actually sent from a remote Earth station. So this is even worse than not using magnitude with units inside the process, it was a serialization/deserialization error. But notice how Lisp prints out the speeds above! It writes the units along with the values!
Now, of course it’s not a programming language question. We already determined that, when noting that neither the ANSI Common Lisp nor the ANSI C standard imposes bound checking, but that C programmers don’t code bound checkings, and C implementers, being C programmers, implement compilers that don’t do bound checking, while the inverse is true of Common Lisp programmers.
This is always the same thing: “statically typed” proponents want to separate the checks from the code, performing (or not) the checks during design/proof/compilation, while “dynamically typed” proponents keep the checks inside the code, making the compiler and system generate and perform all the typing, bounds, etc checks at run-time. So when a C guy (any statically typed guy) sends data, he expects that the type and bounds of the data are know (before hand, by both parties). But when a Lisp guy (any dynamically typed guy) sends data, he sends it in a syntactic form that explicitely types it, and the data is parsed, validated, bound checked and typed according to the transmitted syntax on the receiving end.
Of course, generating C code doesn’t mean that you can’t design your system in a "dynamically typed” spirit. But this is not the natural noosphere of the C ecosystem.
The design fault was mixing measurement systems, which one should _never_ do on pain of embarassing failure. Papering over this design screwup with a language environment that _supports_ this (instead of screaming bloody murder at such nonsense) doesn't really help here.
Again, we are talking about an embedded program, in a real time system, where you have only seconds of burn stage on re-entry, and where you DON’T HAVE THE TIME to detect, debug, come back to the design board, compile and upload a new version!
Again, what is this about live debugging a flying rocket? If you propose writing your realtime control code and deploying it straight to your production enviroment (in that case, the rocket about to liftoff) you have no business writing this kind of code.
You design your system, review the design, implement, test and only deploy it live if you are confident that it will work correctly (and the tests agree).
The above problems are things that - at the latest - should have been caught by the test setups. Preferrably in the design stage. Actually, IIRc for the Ariane issue there was a test that would have revealed the problem, but it was cancelled as being too costly. Which in retrospect was of course penny-wise, pound-foolish.
The software that uploaded the untagged, without units, bit field *data*, instead of some meaningful *information*, hadn’t even been completed before the orbiter was in space! It wasn’t developed by the same team, and wasn’t compiled into the same executable.
Nonetheless, here a lisper would have sent *information* in a sexp, and dynamic checks and conversions would have been done.
If you will, the design would have been different in the first place!
Still, supporting multiple concurrent measurements systems means adding complexity. Which is rarely a good idea. So again, the better approach would have been to make sure to only use _one_ measurement system (imperial _or_ metric (preferrably metric)) which means you don't need the measurement system awareness and conversion code in the first place.
To borrow a saying from the car industry: "The cheapest and most reliable part is the one that isn't there in the first place."
- Programmed in Common Lisp, the Therac-5 bug wouldn't have
occured:
"The defect was as follows: a one-byte counter in a testing routine frequently overflowed; if an operator provided manual input to the machine at the precise moment that this counter overflowed, the interlock would fail."
But why did the counter overflow in the first place? Was it simply programmer oversight that too small a datatype was used or was this actually an error that just didn't have noticeable consequences most of the times? If the later, then again, papering over it with a never overflowing counter is not a fix.
But it if was a problem, it *would* eventually reach a bound check, and signal a condition, thus stopping the process of irradiating and killing people.
Remember: a Lisp program (any "dynamically typed” program) is FULL of checks!
since again, incrementing a counter doesn't fucking overflow in lisp!
- Programmed in Common Lisp, heartbleed wouldn't have occured,
because lisp implementors provide array bound checks, and lisp programmers are conscious enough to run always with (safety 3), as previously discussed in this thread.
Hehe, "conscious enough to run always with (safety 3)". Riiiiight. And nobody was ever tempted to trade a little runtime safety for speed, correct?
Those are C programmers. You won’t find any other safety that 3 in my code. You should not find any other safety than 3 in mission critical code, much less in life threatening code.
There is a _mountain_ of misson critical and/or life threatening code where "safety 3" is meaningless because it is not written in Lisp.
As for heartbleed: arguably, the RFC that the broken code implemented shouldn't have existed in the first place.
What I'm saying is that there's a mind set out-there, of using modular arithmetic to approximate arithmetic blindly. Until you will be able to pay $1.29 for 3 kg of apples @ $2.99, people should not program with modular arithmetic!
Well, modular arithmetic doesn't go away because one wishes it so. As a developer doing non time critical high level work one might be able to cheerfully ignore it, but the moment one writes sufficiently time critical or low level code one will have to deal with it. Because modular arithmetic is what your CPU is doing - unless you happen to have a CPU at hand that does bignums natively at the register level? No? Funny that.
This might have been true in 1968, when adding a bit of memory added 50 gr. of payload!
Nowadays, there’s no excuse.
Wrong.
If your code is sufficiently time critical, stuff like that begins to matter. At the leisurely end we have an ECU: your engine runs at, say 6000 rpm so you have an ignition coming up every 20 ms. Your code _has_ to fire the sparkplug at the correct time, with sub-millisecond precision or you'll eventually wreck the engine. While processing a realtime sensor data stream (engine intake (air, fuel), combuston (temperature, pressure), exhaust (temperature, pressure, gas mix) and others). This is routinely done using CPUs that aren't all that super powerful, in fact, using the cheapest (and that usually means slowest) CPUs (or rather: microcontrollers) that are still just fast enough. For example, the Freescale S12XS engine control chip (injector and ignition) has 8/12 KB RAM and 128/256 KB flash. You are not going to muck around with bignums in a constrained environment like that ... ;-)
And there are many, many more of those kind of embedded control systems around than PCs, tablets and phones (all of them pretty powerful platforms these days) combined.
At the faster end: networking code handling 10 GBit line speeds. With latencys in the single to double digit microsecond range, you don't have the luxury of playing with nice abstract, far-away-from-the-metal code if you want your packet handling code to run at useful speeds.
And if the flight safety of an aircraft depended upon the current Lisp version of Ironclad's impenetrability, we would be in trouble.
This is another question, that of the resources invested in a software ecosystem, and that of programming language mind share. Why the cryptographists don't write their libraries in Common Lisp and choose to produce piles of C instead?
Usefulness. If I write a library in C, pretty much everything that runs on Unix can link to it (if need be, via FFI and friends) and use it. If I write a library i Common Lisp, then code written in Common Lisp can use it unless people are willing to do some interesting contortions (such wrapping it in an RPC server).
Anything running on unix can link to libecl.so (which is ironically a CL implementation using gcc, but we can assume it’s a temporary solution).
Exercise for the interested: write a library in Common Lisp that does, say, some random data frobnication and try to use it from: C, Python, Perl, C++ _without_ writing new interface infrastructure.
But the point is to eliminate code written in C, Perl, C++! So your exercise is academic.
I can very confidently say: This will never happen. Just look at the amount of _COBOL_ code that is still in use. In fact, people are _still_ writing COBOL (dude I know does just that for a bank).
Kind regards, Alex.
On Tue, Apr 29, 2014 at 3:12 AM, Alexander Schreiber als@thangorodrim.dewrote:
Usefulness. If I write a library in C, pretty much everything that runs on Unix can link to it (if need be, via FFI and friends) and use it. If I write a library i Common Lisp, then code written in Common Lisp can use it unless people are willing to do some interesting contortions (such wrapping it in an RPC server).
I just checked http://www.cliki.net/Common%20Lisp%20implementation and I see that nearly all currently active free implementations of CL have a FFI with "callback" support and the commercial ones do too. Granted there may be a "completeness" issue in most of those FFI though. But there is hardly any serious need for a RPC server solution anymore. And on that completeness issue (that may not be of much interest but anyway) I happen to be currently hard at work. So you will soon have at least one free CL that will do full C99 interfacing, with plenty of C type inferencing and checking at runtime as Pascal seems to appreciate. The only drag with it will be that, as in ECL, you will have make a call to initialize the CL world/context before using the rest of the interface and probably call a shutdown of the CL world/context at the end. I hope it is not too much overhead.
Exercise for the interested: write a library in Common Lisp that does, say, some random data frobnication and try to use it from: C, Python, Perl, C++ _without_ writing new interface infrastructure.
What "new interface infrastructure"? What is that infrastructure supposed to do?
On Sat, Apr 12, 2014 at 5:52 PM, David McClain <dbm@refined-audiometrics.com
wrote:
Just curious for other opinions... but wouldn't this (Heartbleed) sort of buffer excess read-back failure have been prevented by utilizing a "safe" language like Lisp or SML?
I used to be an "unsafe" language bigot -- having mastered C/C++ for many years, and actually producing C compilers for a living at one time. I felt there should be no barriers to me as master of my machine, and not the other way around.
But today's software systems are so complex that it boggles the mind to keep track of everything needed. I found during my transition years that I could maintain code bases no larger than an absolute max of 500 KLOC, and that I actually started losing track of details around 100 KLOC. Making the transition to a higher level language like SML or Lisp enabled greater productivity within those limits for me.
Part of the issue vis-a-vis security is that for many applications, much of the complexity is abstracted away into some library that the programmer may only be dimly aware of. While it used to be that many largish programs were more or less self-contained, often depending only on the system libraries, now days they tend to have a very broad set of dependencies on a large set of libraries that themselves have a similarly large set of dependencies. Indeed, the applications themselves are often little more than glue tying together a (to me, anyway) surprisingly large number of disparate libraries: transitively, the dependency graph is many times larger than it was a decade or two ago and hides an astonishing amount of complexity, even for applications that themselves appear trivial. Thus, you may "introduce" a security hole into your application simply by using a library that provides some useful bit of functionality but is implemented terribly in a way that is not easily visible to you: that seems to be the case with services that are affected by heartbleed.
Could using a safe language (or even one that implemented array bounds checking) have prevented this particular bug? Absolutely. But in the general case, modern applications have a huge surface area for attack because of the layering of dependencies on large, complicated bits of software that the application program has little to no control over. Further, building all the requisite dependencies oneself in a safer language is such a daunting task as to be generally infeasible, even if it makes sense for specific applications. And even if I did that, eventually I am going to pop down into some layer in the operating system or a system library or the language runtime that is out of my control, and those things seem to have also increased in size and complexity by a few orders of magnitude over the last 20 or so years. And even if I write my own compiler, operating systems, libraries, etc, I still have to wonder whether the hardware itself is truly secure ("DEITYBOUNCE", anyone? Let alone actual, you know, errors in the hardware). And this is completely ignoring the value-loss proposition of targeting safer but lesser-used languages. For better or for worse, things like heartbleed just aren't going to sway many library writers to give up on a huge, existing target audience (even if they should).
So even if I as a programmer am extremely careful and diligent, I may still be burned by something rather distant from the work I myself am doing, and I have finite capacity to influence that.
Of course, that doesn't mean that one should not oneself be careful and diligent, or even reimplement things safely where one can! Only that the problem is rather more difficult than can be solved by simply switching implementation languages.
- Dan C.