(This is a new topic.)
Many times, people have presented a major problem with object-oriented programming as it is often/idiomatically used. The good thing is that OOP, when done right (in Lisp, that means, among other things, always using defgeneric explicitly), provides a great abstarct interface to the caller. It's good for modularity, because if you want to changes the implementation of the class, you know what you can change easily, and what you cannot because callers depend on it.
If all of the callers are under your control, you can easily make incompatible changes, but if you're providing a general-purpose library, that people all over the world are using, then it's a lot harder.
I call the set of defgenerics (plus the factory functions) the "protocol". The word "type" is sort of right but carries a lot of connotations and freight that I'd rather avoid.
Often all of the CLOS classes that implement a given protocol inherit from a common abstract base class. There are two reasons to do this. First, a common base class can provide implementations of some of the generic functions all by itself. My favorite simple example is an "output stream" protocol, that has a write-character operation and a write-string operation. The common base class provides an implementation of write-string that works by iterating over the characters of the string and calling write-character. Any output stream that can write strings in a more efficient way can override that method.
The second reason is to define a Lisp type that all of the subclasses will be of.
In my mind, as far as the first reason goes, there should be NO requirement that every type that implements the protocol *must* inherit from a common abstract base class. The abstract base class is merely an implementation convenience. In the case of the fhash library, I do not have any common base class. As a matter of fact, it's not even implemented using CLOS at all. (The caller doesn't know, and should not.) From the source file:
;;; An fhash is represented as a one-dimensional array. It can have two ;;; representations: a linear search table, or a hash table (from the ;;; underlying Common Lisp implementation). An fhash is represented as ;;; a vector, starting with a header. The 0th element says whether this ;;; fhash instance is :linear or :hash. If it's :hash, the 1th element ;;; contains the hash table, and the rest of the elements are ignored.
As far as the second reason goes, fortunately Lisp has a powerful way to let you define types. In fhash:
(defun fhash-table-p (fhash)
"Return true if the argument is an FHASH. Otherwise return NIL. This is not perfect, since some array might just happen to have these values, but it seems unlikely."
(and (vectorp fhash) (member (aref fhash +fhash-kind+) '(:linear :hash))))
(deftype fhash-table () '(satisfies fhash-table-p))
This is not perfect. A programmer might just happen to create a vector and put one of those keywords into the +fhash-kind+ slot. (+fhash-kind+ is zero but that's a mere detail.) But it's sort of good enough.
OK, back to OOP. Abstraction of the object from the caller is just fine. But, the commonly-heard objection to OOP as it is really used is that there is *not* a clean modular definition isolating *subclasses*.
Smalltalk didn't even try. CLOS, I believe, does not try and there is not an idiomatic way to do it. The only language I know that makes a good stab in this direction is C++, which has "public", "protected", and "private". Whatever else you say about C++, Bjarne understood this issue and tried to fix it. The C++ solution isn't perfect but it's sure a lot better than anything else.
How can we provide the same thing in Common Lisp? And without changing CLOS? In fact, without adding any new language feature at all?
With packages, fhash has a defpackage that provides the protocol seen by callers. If you do an abstract type with CLOS, you ought to do the same thing. (You might have one package for many types.) So the callers can "use" fhash and then call putfhash and so on.
(Digression: A slightly different approach is to define a package not intended to be "use"'ed. I did this with the API for the logging facility in QRes. For example: log:add-rule. If you did a "use", it might be confusing to remember that add-rule is from logging, and indeed you might "use" another package that has an "add-rule" function. This is a name conflict. The *whole idea* of packages is to avoid name conflicts! So I think "use" is opposed to the whole idea of packages and should be used sparingly.
What people usually do in Common Lisp, in my experience, is to name their functions so that the ones in one module (or, more generally, related ones) all have names starting with a common prefix. Well, that's what packages were invented: so that you don't have to do that! In QRes, even though there is a "qconfig" package for the configuration stuff, the functions are still all named "config-, and everybody uses "use". I think this is not as good as using packages as they were designed. Rather than, e.g., config-descriptor-property, use config:descriptor-property. Anyway.)
Now, there's a great thing about CL packages: you can have more than one defpackage for the *same* symbols. So we could have one package for the code that calls into our library, which would have the usual protocol, which we can call the "external protocol". Then we could have a *second package*, which presents a second protocol that subclases would use! This gives us modular separation, just as "protected" does.
Of course, there's nothing stopping any code at all from including a symbol from this second package in its code. But we already let people go into internal symbols using "::". Common Lisp's general attitude is that we don't make these things impossible; you're just not supposed to do them. It's very often useful to do this kind of violation when doing interactive debugging, or when writing white-box unit tests. (It would be great to have a static analysis tool that flag you when you "cheat", that you could run before releasing a new version of your stuff.)
I have not actually written anything this way, but it seems to me as if it would work. And if experience shows that it does provide what I claim it will provide, I would love for that to become idiomatic/standard usage in Common Lisp.
I admit that there's a significant problem, namely the same problem that we always have with packages. Because resolution of packages happens at runtime, it is difficult-to-impossible, in some cases, to change the package declarations during interactive debugging and have the effect that the environment behaves as if you had changed the package declaration and recompiled from scratch. All I can say is that we really ought to do something about the problems with packages, even if you don't buy anything I'm saying here.
-- Dan
On 1 December 2010 10:25, Daniel Weinreb dlw@itasoftware.com wrote:
I call the set of defgenerics (plus the factory functions) the "protocol". The word "type" is sort of right but carries a lot of connotations and freight that I'd rather avoid.
Here is a way that CL sucks badly: protocols are not first-class entities. To modify a protocol is something done to the source code *outside* of the Lisp world. It cannot be done programmatically. Adherence to the protocol cannot be enforced. Discrepancies cannot be detected.
In Racket, for instance, modules are first-class (at syntax expansion time) and units are first-class (at runtime), and you can manipulate them programmatically.
First, a common base class can provide implementations of some of the generic functions all by itself. My favorite simple example is an "output stream" protocol, that has a write-character operation and a write-string operation. The common base class provides an implementation of write-string that works by iterating over the characters of the string and calling write-character. Any output stream that can write strings in a more efficient way can override that method.
In my "pure" datastructure library (currently part of fare-utils, to be spun off as lil - lisp interface library), I use mixins to provide these "methods". So instead of adding the method to a base class, I would provide a mixin "write-string-from-write-char", and then could possibly add an opposite mixin "write-char-from-write-string", without creating a paradox that will byte you.
This is not perfect. A programmer might just happen to create a vector and put one of those keywords into the +fhash-kind+ slot. (+fhash-kind+ is zero but that's a mere detail.) But it's sort of good enough.
Solution: don't use keywords, but private symbols. Unlike keywords, private symbols are private. They can be faked, but not accidentally. CL doesn't allow you to easily define sublanguages in which things cannot be faked. That's painful.
What people usually do in Common Lisp, in my experience, [...]
Patterns mean "I have run out of language." — Rich Hickey
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] If you could kick the person in the pants responsible for most of your trouble, you wouldn't sit for a month. — Theodore Roosevelt
Faré wrote:
On 1 December 2010 10:25, Daniel Weinreb dlw@itasoftware.com wrote:
Here is a way that CL sucks badly: protocols are not first-class entities. To modify a protocol is something done to the source code *outside* of the Lisp world. It cannot be done programmatically. Adherence to the protocol cannot be enforced. Discrepancies cannot be detected.
I agree.
The relationship between the concept of "protocol" and the concept of "type" should be explored and understood. They're pretty similar but I am not sure what differences they have.
If protcols were first-class, would you be able to make a new one inherit from another?
Java has the concept of an "Interface", which is a lot like a protocol.
I have always liked the idea of having protocols say more than just "these are the functions and these are the arguments, which are optional, ane maybe what their types are. I'd love it if there were a way to say "in order to fulfill this contract, doing write-string of a string must behave exactly the same as doing write-char on each." You could imagine all kinds of integrity constratints. You could specify that some function be commutative, that some be associative with respect to each other, that one have no side effects, that one be idempotent, and so on. We could start by having a well-known template for documenting/commenting the functions in a protocol to be able to say things like this.
Fare knows this, but for everyone else: we have a macro called define-struct-function that lets you specify datatypes for each argument, and expands into check-type's (basically), and also for returned values. And corresponding versions for generics and methods. We don't use them for every single definition, but we try to use them at inter-module boundaries.
In Racket, for instance, modules are first-class (at syntax expansion time) and units are first-class (at runtime), and you can manipulate them programmatically.
First, a common base class can provide implementations of some of the generic functions all by itself. My favorite simple example is an "output stream" protocol, that has a write-character operation and a write-string operation. The common base class provides an implementation of write-string that works by iterating over the characters of the string and calling write-character. Any output stream that can write strings in a more efficient way can override that method.
In my "pure" datastructure library (currently part of fare-utils, to be spun off as lil - lisp interface library), I use mixins to provide these "methods". So instead of adding the method to a base class, I would provide a mixin "write-string-from-write-char", and then could possibly add an opposite mixin "write-char-from-write-string", without creating a paradox that will byte you.
That's good. The usual abstract base class provides a set of things like that. You have to take all or none (although you can override some). Using particular mixins to provide particular helpers is more modular.
This is not perfect. A programmer might just happen to create a vector and put one of those keywords into the +fhash-kind+ slot. (+fhash-kind+ is zero but that's a mere detail.) But it's sort of good enough.
Solution: don't use keywords, but private symbols. Unlike keywords, private symbols are private. They can be faked, but not accidentally. CL doesn't allow you to easily define sublanguages in which things cannot be faked. That's painful.
Yes, I agree with you and Peter. Definitely.
What people usually do in Common Lisp, in my experience, [...]
Patterns mean "I have run out of language." — Rich Hickey
Actually even the Gang-of-Four admit that.
-- Dan
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] If you could kick the person in the pants responsible for most of your trouble, you wouldn't sit for a month. — Theodore Roosevelt
I have always liked the idea of having protocols say more than just "these are the functions and these are the arguments, which are optional, ane maybe what their types are.
You can have that in Lisp to a point with Interface-Passing Style: http://fare.livejournal.com/155094.html
I'd love it if there were a way to say "in order to fulfill this contract, doing write-string of a string must behave exactly the same as doing write-char on each."
Common Lisp seems to be opposed to the idea of static checking of anything. Racket has a nice dynamic contract system, however, as well as a statically typed dialect, and the two play nice together.
You could imagine all kinds of integrity constratints. You could specify that some function be commutative, that some be associative with respect to each other, that one have no side effects, that one be idempotent, and so on. We could start by having a well-known template for documenting/commenting the functions in a protocol to be able to say things like this.
Such constraints are usually extremely expensive to verify dynamically, to the point of being prohibitively expensive for large runs, though you could verify only during test runs. They are also expensive to verify statically, but then the cost is finite and you only have to pay it once upfront, not all the time. Some languages like Cayenne, Epigram or Omega, that allow you to specify (and prove) all these constraints statically.
Patterns mean "I have run out of language." — Rich Hickey
Actually even the Gang-of-Four admit that.
Somehow, the message was lost — probably to themselves, to start with — for the popularity of their work has led to no improvement to the ability of languages not to run out.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Faith, n: That quality which enables us to believe what we know to be untrue.
On 12/01/2010 03:11 PM, Faré wrote:
I have always liked the idea of having protocols say more than just "these are the functions and these are the arguments, which are optional, ane maybe what their types are.
You can have that in Lisp to a point with Interface-Passing Style: http://fare.livejournal.com/155094.html
I have (ab)used this style of late as it meshes with my habit of defining protocols that are less designed than evolved. I'll have a moment of realization that a group of generic functions describe a theme which may be varied, and I'll end up with an abstract interface, some mixins, and a defining macro for subtypes. So far type hierarchies have been shallow, but I'm willing to see how far design-by-refactoring takes me.
Matt
On 1 Dec 2010, at 21:16, Daniel Weinreb wrote:
I have always liked the idea of having protocols say more than just "these are the functions and these are the arguments, which are optional, ane maybe what their types are. I'd love it if there were a way to say "in order to fulfill this contract, doing write-string of a string must behave exactly the same as doing write-char on each." You could imagine all kinds of integrity constratints. You could specify that some function be commutative, that some be associative with respect to each other, that one have no side effects, that one be idempotent, and so on. We could start by having a well-known template for documenting/commenting the functions in a protocol to be able to say things like this.
I would also like to specify that methods on a specific generic function should always halt, and would like to enforce that statically. ;-)
I'm only half joking: The fact that you cannot solve the halting problems puts some boundaries on what you may and may not be able to express in such contracts. On top of that, it is extremely hard to be precise enough when specifying such contracts.
Let's take your example "write-string must behave exactly the same as doing write-char on each". Let's assume these functions are implemented as follows for the default case:
(shadow 'write-char) (shadow 'write-string)
(defgeneric write-char (stream char) (:method ((stream t) (char char)) (cl:write-char char stream)))
(defgeneric write-string (stream string) (:method ((stream t) (string string)) (loop for char across string do (write-char stream char))))
Now assume somebody implements their own stream class and does the following:
(defmethod write-char ((stream my-stream) (char char)) (write-string stream (make-string 1 :initial-element char)))
(defmethod write-string ((stream my-stream) (string string)) (cl:write-string string stream))
This breaks your suggested contract. Why? Somebody else may want to provide some form of mixin functionality like this:
(defmethod write-char :around (stream char) (incf *write-counter*) (call-next-method))
With the original methods, this correctly counts he written characters, but with the methods for my-stream, most characters will not be counted anymore. Your suggested contract seems to suggest that this :around method is correct and the methods for my-stream break the contract. Is that what you intended?
Pascal
On 1 Dec 2010, at 18:16, Faré wrote:
On 1 December 2010 10:25, Daniel Weinreb dlw@itasoftware.com wrote:
First, a common base class can provide implementations of some of the generic functions all by itself. My favorite simple example is an "output stream" protocol, that has a write-character operation and a write-string operation. The common base class provides an implementation of write-string that works by iterating over the characters of the string and calling write-character. Any output stream that can write strings in a more efficient way can override that method.
In my "pure" datastructure library (currently part of fare-utils, to be spun off as lil - lisp interface library), I use mixins to provide these "methods". So instead of adding the method to a base class, I would provide a mixin "write-string-from-write-char", and then could possibly add an opposite mixin "write-char-from-write-string", without creating a paradox that will byte you.
The term 'mixins' sets of my alarm bells. ;) But first a question, to better understand what you mean here: How do you reconcile the notion of mixins with multiple dispatch?
Pascal
On 7 December 2010 14:00, Pascal Costanza pc@p-cos.net wrote:
On 1 Dec 2010, at 18:16, Faré wrote: The term 'mixins' sets of my alarm bells. ;) But first a question, to better understand what you mean here: How do you reconcile the notion of mixins with multiple dispatch?
I don't know that there's any problem that requires reconciliation, but that said, I haven't tried too much. There are some intrinsic problems with multiple dispatch, but I don't know that they are any specific to mixins. Maybe some better method linearization algorithm can help?
As for your other question to dlw about methods defined in terms of each other, my solution is to use mixins to segregate incompatible method transformations in disjoint mixins. A better question might be how do you enforce disjointness of some mixins. I suppose a heavy-handed use of MOP magic could do it, but oh well.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] In a family argument, if it turns out you are right — apologize at once. — Robert Heinlein, "Time Enough For Love"
On 12/07/2010 09:19 PM, Faré wrote:
A better question might be how do you enforce disjointness of some mixins. I suppose a heavy-handed use of MOP magic could do it, but oh well.
Q: Doctor, it hurts when I do this. A: Well, don't do that.
In the examples I looked at that inspired me to use IPS (<map> might have been one), sets of mixins partition (part of) a default implementation, and most of methods specialize only on interface types. As a whole, the parts of the protocol implemented by a set of mixins depend only the unimplemented generic functions, and not on any particular datatype. Kinda a Ruby Module-y, Smalltalk Trait-y sort of thing.
Matt
On Dec 7, 2010, at 11:10 PM, Matthew D. Swank wrote:
On 12/07/2010 09:19 PM, Faré wrote:
A better question might be how do you enforce disjointness of some mixins. I suppose a heavy-handed use of MOP magic could do it, but oh well.
Q: Doctor, it hurts when I do this. A: Well, don't do that.
In the examples I looked at that inspired me to use IPS (<map> might have been one), sets of mixins partition (part of) a default implementation, and most of methods specialize only on interface types. As a whole, the parts of the protocol implemented by a set of mixins depend only the unimplemented generic functions, and not on any particular datatype. Kinda a Ruby Module-y, Smalltalk Trait-y sort of thing.
This is how I use mixins, too -- as "trait"-y sort of things that provide implementation of a protocol, but don't "contribute" to the type. That is, there is a single-inheritance "backbone" that may, or may not, have implementations of the protocol(s), and zero or more mixins that provide bits of implementation. I tend to avoid having mixin classes provide any public slots, although there may be internal slots to aid in the implementation.
Just my two cents...
On Wed, 1 Dec 2010, Daniel Weinreb wrote:
Smalltalk didn't even try. CLOS, I believe, does not try and there is not an idiomatic way to do it. The only language I know that makes a good stab in this direction is C++, which has "public", "protected", and "private". Whatever else you say about C++, Bjarne understood this issue and tried to fix it. The C++ solution isn't perfect but it's sure a lot better than anything else.
If you like public, private, and protected, then you really need to study ADA a bit. CL isn't the only ~DARPA technology to have been abandoned even though it contained good ideas. All the constraints look like they could become tedious; but they seem to have good goals for defining interfaces. [No, I haven't done anything real in ADA.]
Secret on the C++ implementation: these keywords are only enforced at compile-time, and softly at best. On almost all compilers that matter, you can change them in the header files, completely violate the intended contract, and still link just fine with code compiled under the contract.
Java has a similar contractual model, but the JVM does try to enforce such boundaries at runtime.
All that said, I've started to believe that the approach is fundamentally wrong. There are always times when someone has to modify the "intrinsics" of a library. This is often needed to fix bugs, extend functionality in unanticipated directions, etc.
CL actually does have a decent way to do this, and a standard way to detect violations. Like C++, put each class in its own namespace (package). Only export symbols representing the public API. Everything else is part of the implementation/protected API. Violations of the public API the same way you find any other use of non-exported symbols.
Private is a good way of saying "are you really sure"; but the implementation is a misfeature. I suspect scary naming (a la the _underscore _prefix conventions commonly used in C/C++ and Python) are quite sufficient.
Later, Daniel
P.S. Other cute C++ tricks to get "private" data. These are strictly for debug/experimentation. Any sane coder would shoot you on sight for leaving them in a codebase.
- Protected data is easily accessed by creating a subclass and casting other pointers into it.
- LD_PRELOAD can be used to define :around methods on non-inlined functions, including methods. RTLD_NEXT can be helpful here.
- Calculate the offset of the value of interest, do a little pointer arithmetic, and cast.
- Reimplement a public member function in the same compilation unit (source file) as some code that needs to access private data. Simply copy the definition from the original source file; but add in hooks that expose private members (e.g. set globals pointing to them).
On Wed, Dec 1, 2010 at 8:29 PM, Daniel Herring dherring@tentpost.com wrote:
On Wed, 1 Dec 2010, Daniel Weinreb wrote:
Smalltalk didn't even try. CLOS, I believe, does not try and there is not an idiomatic way to do it. The only language I know that makes a good stab in this direction is C++, which has "public", "protected", and "private". Whatever else you say about C++, Bjarne understood this issue and tried to fix it. The C++ solution isn't perfect but it's sure a lot better than anything else.
[...]
First, my apologies. I didn't have Dan Weinreb's original message to respond to directly.
I disagree with Dan here. Bjarne didn't understand the issue, and public/private/protected are not the answer. He, as many others have, conflated objects with ADTs. Worse, he exacerbated the subtyping vs. subclassing issues with an early-bound type system. William Cook explains clearly the difference and its importance [1].
Gilad Bracha's Newspeak improves on the encapsulation and modularity of Smalltalk. More importantly, however, with no type system and a fully late-bound approach, Newspeak supports, in its own pure OO way, the notion of type families that early-bound ADT-oriented languages struggled to reach. C++, with its mixed bag of underpinnings, is hopeless.
So, my argument would be that you can't accomplish with CLOS what Newspeak has with classes. Instead, you will have to treat ADT-like encapsulation and modularity separate from that of CLOS. That's what OCaml had to do. And you won't have to alter CLOS.
Cheers,
Mike
On 1 Dec 2010, at 16:25, Daniel Weinreb wrote:
I call the set of defgenerics (plus the factory functions) the "protocol". The word "type" is sort of right but carries a lot of connotations and freight that I'd rather avoid.
In the Haskell community, these beasts are called 'type classes'.
How can we provide the same thing in Common Lisp? And without changing CLOS? In fact, without adding any new language feature at all?
Your suggestions to use packages for this kind of separation is good. There is another alternative that is actually already in use as well: In the CLOS MOP you have a distinction between user-level functions and extension functions. A typical example is the slot instance protocol which is specified like this:
(defun slot-value (object slot) (let ((slot-definition (find slot (class-slots (class-of object)) :key #'slot-definition-name))) (if slot-definition (slot-value-using-class class object slot-definition) (slot-missing class object slot))))
etc. for the other slot accessors. The important point for the sake of this discussion is that there is a user-level function slot-value that is supposed to be used as the front-end API, whereas slot-value-using-class exists for extensions of the CLOS functionality. Slot-value is actually a plain function, while slot-value-using-class is a generic function. I think this is a good way to organize your code and makes a clear distinction between user APIs and extension APIs. It may not work in all cases, but I think it's good to try this in your own projects. There is no real cost involved if the user-level functions can be inlined.
I admit that there's a significant problem, namely the same problem that we always have with packages. Because resolution of packages happens at runtime, it is difficult-to-impossible, in some cases, to change the package declarations during interactive debugging and have the effect that the environment behaves as if you had changed the package declaration and recompiled from scratch. All I can say is that we really ought to do something about the problems with packages, even if you don't buy anything I'm saying here.
A really strong point of packages is that they resolve symbols at read-time, while most module systems in the wild resolve identifiers at a later stage. This is a strong point because this means that packages can (a) make finer distinctions than module systems, and (b) can provide information hiding mechanisms (public vs. private symbols) completely orthogonal to whatever language constructs you may come up with. (a) is good because it's the main reason why macros are actually safe with regard to hygiene in all relevant practical cases, and (b) is good because you don't have to reinvent the wheel with regard to information hiding over and over again.
Any suggestions to change/improve packages should ensure that these strong points remain, IMHO.
Pascal