Anton,
Thank you very much for your mail. It prompted me to explain what I'm thinking of in a much more clear (I hope!!) way. Here we go:
Anton Vodonosov wrote:
What is the difference for modularity in case of subclassing, from other cases were we apply modularity? I can't find the difference.
Here's the greater idea.
If you have a module, by using CL packages (without any change to CL), it is possible to have more than one set of exports. So, you can offer different interfaces to different callers. By interfaces in this context I mean the set of symbols that are exported.
If you offer interface I1 to callers C1, C2, and C2, and interface I2 to callers C4, C5, and C6, then you can be sure that C1, C2, and C3 will keep working as long as you do not make incompatible changes to interface I1. Similarly with I2.
Now, moving specifically to OOP:
Traditionally, e.g. in Smalltalk 80, it was a big deal that the caller of an instance of a class does not know the internals of the class. It only uses methods, and does not look internally at the instance variables (slots).
(I can't remember whether ST-80 had the concept of private methods that are only intended to be used from methods of the class and not from callers.)
OOP was often compared to, or even conflated with, the concept of "abstract datatypes", in the sense meant by Barbara Liskov. In CLU and such languages, similarly there was a sharp distinction between the advertised interface presented to callers of the ADT (abstract datatype, corresponding to a class), and whatever was going on inside it.
This was a modularity boundary that allowed a separation of concerns between the users of the ADT and the implementation of the ADT.
Recently, this was considered such a big deal that Liskov was awarded a Turing Award.
ADT's did not, per se, contain the concept of inheritance. Inheritance was first exposed to the general public with ST-80.
However, there wasn't even a semblance of an effort to provide any modular separation between a class and its subclasses. So, whenever the implementation of a class changed in any way at all, you risked breaking subclasses. This was particularly a problem when library 1, under control of group-of-people-1, provided a class, and library-2 subclassed it. If a new release of library-1 came out, there was in general no way for g-o-p-1 to find everybody who had subclassed their classes; if the lib-2 people installed the new release of lib-1, they had no idea whether anythig would work.
Now, you could just say that there should be internals and externals, and the externals exposed to the callers and the externls exposed to the subclassers are exactly the same set.
This does not work for most non-trivial cases. I will assume that you know why, since I can't explain it briefly.
So, what I am proposing is to be explicit about what is being exposed to the subclassers, so that if the base class is changed in a new release but the subclasser interface is compatible, the lib-2 people can rest assured that their library will continue to work. (Unless there are bugs, of course.)
In C++ and Java, this isn't done with Lisp packages, of course. The "public" members are the interface to the caller, and the union of the "public" and "protected" members are the interface provided to the subclassers.
Whether the subclassers should be given permission to override the public things, in all cases, is an interesting but separate discussion. With Lisp packages, you can control this any way you want.
We may note that "protected" is actually public, in the sense that when we specify something as "protected", we specify that the library has external clients which can rely on this protected method or field.
In that sense, yes, absolutely.
Anyone may inherit from our public class, and access the "protected" members, therefore these members are available to public. If we change the protected member, we affect the library clients.
What I'm doing, however, is to distinguish between two sets of "public", as explained above. The callers, which would be a "big public" (lots of callers) must cope with incompatible changes relatively less often, whereas the "small public" (those who subclass) must cope relatively more often since more is revealed.
This drives me to a conclusion, that a single language feature is sufficient which allows to separate a protocol for interaction with a module from the module part we expect to change without affecting clients.
Taking you literally, we do not need a new CL feature. But translating what you're saying into the terms I am using, you're saying that we need only one protocol, whereas I am saying that it's better to have two. There is no "language feature" at the CL level, but there is a practice, or a "design pattern", that I am recommending. Having :documentation in the defpackage to explain what's going on, especially who is intended to use this packge, would be very hepful. Design patterns are essentially a way to extend a language (in a nutshell).
I made observation on many java systems, that addition to packages the public/protected/private for classes has bad influence on the system modularity.
Many java programmers tend to hide attributes of every single class behind get/set accessors, even for simple classes which just hold values passed between methods and do not contain any logic; to create hierarchies of interface/abstract class/concrete for various minor classes, employing protected methods.
There are at least two reasons to put in such methods.
First, changing the implementation while keeping compatible behavior for the public methods.
Imagine that you have a class called Point representing a point in two-space. You could use Cartesian representation: two data members x and y, and getX and getY (and, if desired, setX and setY). These are the simple getters (and setters) that you mention above.
But what if, for some reason (numerical methods, I don't know) you decide that internally using a polar representation, with two data members storing an angle and a distance from the origin, is better. You can make getX continue to work using the appropriate trigonometry operations.
Second, these days there are lots of tools that use "dependency injection" and whatnot that only work if you have such methods. Extending "ant", using Spring, and so on work this way, so it has become part of the usual Java design pattern for many classes. (See "Java Bean".)
But at the same time, people do not care to create structure at larger level, to divide the system into modules. It is not uncommon to see large systems where all these classes, subclasses in all the packages are public: everything is available to everything.
Sure.
Separate access control for classes obscures the idea of modularity (especially in case of such an OO centric language like java, the programmers are mostly concentrated on classes/object, and often don't even think about packages;).
I don't know what you mean by this. It depends what you mean by "access control". But if you mean limiting the access that callers of a module have, that is exactly the way modularity is usually realized.
Class is too small entity to form a module. A module API (protocol) usually consists of set of classes, methods, factory functions, maybe some global variables.
Ah, yes. That's why Common Lisp has packages and Java has namespaces; to be able to make groups exactly as you are saying.
Another thought is that the interface between base class and its subclasses is not a second interface, but a part of the first interface. As you said:
So we could have one package for the code that calls into our library, which would have the usual protocol, which we can call the "external protocol". Then we could have a *second package*, which presents a second protocol that subclases would use!
The second package IS for the second interface!
What would be the name for the second protocol? I.e. what is the interaction the base class and the subclass perform via this protocol?
I'm not sure what the name would be. If we were to adopt the practice I'm advocating, it would indeed be a good idea to have a standard naming practice.
If you look at some of the major Java libraries, you can see them dealing with that. For example, look at JNDI, the standard Java interface for dealing with any kind of naming service, particularly hierarchical ones such as file system directories and LDAP servers.
First, there is what they call the "API". This is used for modules that want to do things like "look up this name and tell me the contents" or "for this name, tell me all the children".
Then there is what they call the "SPI", the "Service Provider Interface". In the API, the calling module calls JNDI. For the "SPI", JNDI calls the service provider module. A service provider module is what is often called a "driver". You'd have one for file systems, one for LDAP, and so on. Java comes with some useful ones like those two, and you can add your own if you want to allow people who use the JNDI API to access some previously unsupprted naming system.
This is very much like what I'm talking about. The API and the SPI are in different Java namespaces (well, as far as I know), and writing a module that calls the API is extremely different from writing a service provider.
The SPI might or might not work by overriding classes or adding :before methods or whatever.
In fact, the whole thing that I am advocating here is NOT ONLY for OOP and subclassing. I just presented it that way because that's where I see the main "pain point". What REALLY matters is the distinction between the API and the SPI.
As you point, it may be reusing of the implementation (derived streams reuse write-string implementation from the base class).
Looks like a thing intended for reusing can rightly be called an "external protocol" too.
Well, this is something different, I think; a utility class whose behavior has to be documented. But this is a side issue and this email is alarmingly long already. :)
Another view on the interaction in subclassing is that the base class serves as a template of some functionality, and we can customize its logic by overriding some methods.
In Java, however, the two are separated: there is an Interface that specifies the contract, and then there might or might not be a standard abstract base class that provides some helpful stuff. You are not required to use the utility abstract base class, although in practice I think I've always seen it used.
Java allows you to skip having the Interface, which is too bad from the point of view of clean and simple language semantics, and for clarity of code. It's a lot like CLOS's not requiring defgeneric, and probably for the same reason; to let you be lazy or more succinct. But I think it's less clear. I try to always use explicit defgenerics. In our own code here, we particularly stress using defgenerics for functions that are felt to be exposed to other modules. (I say "felt" because our code does not isolate modules as well as it ought to.)
The output stream example may be viewed in this perspective: we plug-in our write-character implementation into the functionality implemented by the base class in order to customise it to work with different character source.
Yes.
The means to parametrize/customize the protocol behavior can also be considered as a part of the protocol. In simple case it's method parameters, in more complex case we provide, for example, custom strategies, and can do it by defining a method for the sublcass.
OK.
Therefore it looks to me that in many cases we deal with a single interface, not two distinct ones.
Well, I think I've explained my point about that above.
Sometimes a module can interact with the outside world via several protocols, but it not necessary relates to sublcassing, and it's not necessary that both (all) the protocols are defined by this module.
Yes, indeed! That goes beyond the scope of what I've said so far. You can indeed have more than one external API, regardless of what you do with SPI's.
I belive Ada lets you do all of this. More about Ada and the experience with compatible upgrades, below.
Now, there's a great thing about CL packages: you can have more than one defpackage for the *same* symbols. \
Exactly. You can have more than one API, you can have a separate SPI, you can have multiple SPI's and so on. This is crucial to everything I am proposing. (By "proposing" I do not mean to claim that what I'm saying is novel and original. It may well have been done before. I'm not trying to take credit for anything here.)
It's a great feature. I think you mean something like when we want to define one protocol as an extension of another protocol. Although in many cases I suppose it's possible instead of extending interface to have a separate interfaces. I.e. if one API is (func1, func2, func3), and another API is (finc1, func2, func3, extra-func), we could have the second API just as (extra-func).
Exactly.
A good example is needed. Maybe something like database connection API, where implementation for particular database is provided by a pluggable "driver"; or maybe swank, and swank backends for different lisps; or something simpler.
Right, exactly, as I discussed above.
For your reading pleasure, here's is some stuff about Ada. It's from Tucker Taft, one of the world's foremost experts on Ada (no kidding). Note that Ada 95 is a later version of Ada, with improvements to the original Ada definition. Tucker was very heavily involved with the definition of Ada 95, for years.
The point he makes in the first paragraph corresponds to what I've been saying: the fact that "print" calls "to_string" is NOT apparent to the caller, but IS apparent to anyone writing a subclass.
----------------------------------------
Anton,
These are all very good questions. I'm going to try to write up an essay about these topics suitable for posting on my blog, since there's so much to say.
The use of packages that I have been advocating here is not used a lot in practice in Common Lisp, but the more I think about it, the most I feel that it ought to be, or at least that it's worth a try. Using packages for shared libraries *is* used, very properly.
The use of packages for having more than one interface to a module, for separating the "call" interface from the "subclass" interface, and talking about getting away from the idea of using CLOS types as if they were CLU abstract types as far as modularity goes, have not been discussed much.
I don't know if I'm the first person to discuss this, or at least the first to address it from the point of view I'm writing from. If anybody knows of something I ought to be citing, please let me know.
I'll send mail to "pro" when I have something.
Thanks for providing encouragement for me to think about these things more deeply!
-- Dan
Anton Vodonosov wrote:
Daniel,
I think I understand the issue better now.
Not that your idea - how base class provides implementation for reuse by subclasses and using packages to specify this protocol - was impossible to understand from your first mail.
I see that my doubts and the form of my previous email are caused by the fact that I still don't completely understand OOP and classes, despite I use it for years and spent some efforts on understanding it.
I understand how the program will be behave when I use this or that language feature; I have partially understanding and partially "feeling" about what is right, know common patterns how OOP is applied in different situations. But still I don't have a mental model which clearly defines all the concepts and fits them together: classes, protocols, modularity... how these concepts overlap? what does it mean to "implement a class"?.. what belongs to a class and what does not, and so on.
BTW, I've been constantly thinking for maybe 3 days and made several steps further in clarifying my mental model (although open questions remain).
I am taking back my words that public and protected are single protocol - I understand the nature of this separation better now.
Minor clarification of the previous mail:
Taking you literally, we do not need a new CL feature. But translating what you're saying into the terms I am using, you're saying that we need only one protocol, whereas I am saying that it's better to have two.
Actually I meant that even for Java it could be better to have only package public and package private, as Common Lisp has; without public/private/protected for classes. In your example with the Point class, getX, getY would be package public, and the data members x and y would be package private (and at any time may be changed to an angle and a distance from the origin, without breaking the clients). Maybe in Java it will cause some inconvenience: e.g. with single inheritance in some situations reusing implementation may require lot of typing if we want to delegate method implementations to an "implementation utility" class from other package: void m() {implForReuse.m();}. But Java is not the subject here, so we may leave it aside.
The "print" and "to_string" example is illustrative; although the people who relied on the fact that the "print" uses "to_string" were relying on an undocumented feature; it's their own bug, not the library vendor's. Therefore IMHO the Ada 95 solution is not obligatory, I agree that people should expect only documented behaviour to be preserved in future versions. But of course, reducing the number of such undocumented entry points into the implementation may help them to avoid such bugs.
The ParaSail link is really interesting. BTW, I suppose the idea to plug-in programmer defined static checks into compiler is not intrinsically contradicting with the Common Lisp - maybe it's possible to implement something in this fashion for Lisp too (with certain limitations, but useful in practice). I've been thinking a little about that before (if we can introduce new language constructs expanded at compile time, why can't we implement compile time checks); and it's interesting to see somebody actually implements it for a language.
To not loose the point of this thread, as I said the idea to use packages to specify protocol for subclasses seems right. It's necessary to try it in practice.
Best regards,
- Anton