Anton,
Thank you very much for your mail. It prompted me
to explain what I'm thinking of in a much more
clear (I hope!!) way. Here we go:
Anton Vodonosov wrote:
> What is the difference
> for modularity in case of subclassing, from other cases were we apply modularity?
> I can't find the difference.
>
Here's the greater idea.
If you have a module, by using CL packages (without
any change to CL), it is possible to have more than
one set of exports. So, you can offer different
interfaces to different callers. By interfaces
in this context I mean the set of symbols
that are exported.
If you offer interface I1 to callers C1, C2, and C2,
and interface I2 to callers C4, C5, and C6, then
you can be sure that C1, C2, and C3 will keep
working as long as you do not make incompatible
changes to interface I1. Similarly with I2.
Now, moving specifically to OOP:
Traditionally, e.g. in Smalltalk 80, it was a big
deal that the caller of an instance of a class
does not know the internals of the class.
It only uses methods, and does not look
internally at the instance variables (slots).
(I can't remember whether ST-80 had the concept
of private methods that are only intended to
be used from methods of the class and not
from callers.)
OOP was often compared to, or even conflated
with, the concept of "abstract datatypes", in
the sense meant by Barbara Liskov. In CLU
and such languages, similarly there was a
sharp distinction between the advertised
interface presented to callers of the ADT
(abstract datatype, corresponding to a class),
and whatever was going on inside it.
This was a modularity boundary that allowed
a separation of concerns between the users
of the ADT and the implementation of the ADT.
Recently, this was considered such a big deal
that Liskov was awarded a Turing Award.
ADT's did not, per se, contain the concept of
inheritance. Inheritance was first exposed to
the general public with ST-80.
However, there wasn't even a semblance of
an effort to provide any modular separation
between a class and its subclasses. So,
whenever the implementation of a class
changed in any way at all, you risked
breaking subclasses. This was particularly
a problem when library 1, under control
of group-of-people-1, provided a class,
and library-2 subclassed it. If a new
release of library-1 came out, there was
in general no way for g-o-p-1 to find
everybody who had subclassed their
classes; if the lib-2 people installed
the new release of lib-1, they had
no idea whether anythig would work.
Now, you could just say that there should
be internals and externals, and the externals
exposed to the callers and the externls
exposed to the subclassers are exactly
the same set.
This does not work for most non-trivial cases.
I will assume that you know why, since I can't
explain it briefly.
So, what I am proposing is to be explicit
about what is being exposed to the
subclassers, so that if the base class
is changed in a new release but
the subclasser interface is compatible,
the lib-2 people can rest assured that
their library will continue to work.
(Unless there are bugs, of course.)
In C++ and Java, this isn't done with
Lisp packages, of course. The "public"
members are the interface to the caller,
and the union of the "public" and "protected"
members are the interface provided to
the subclassers.
Whether the subclassers should be given
permission to override the public things,
in all cases, is an interesting but separate
discussion. With Lisp packages, you
can control this any way you want.
>
> We may note that "protected" is actually public, in the sense
> that when we specify something as "protected", we specify
> that the library has external clients which can rely on this protected
> method or field.
In that sense, yes, absolutely.
> Anyone may inherit from our public class, and
> access the "protected" members, therefore these members are
> available to public. If we change the protected member, we affect
> the library clients.
>
What I'm doing, however, is to distinguish
between two sets of "public", as explained
above. The callers, which would be a
"big public" (lots of callers) must cope
with incompatible changes relatively
less often, whereas the "small public"
(those who subclass) must cope
relatively more often since more is
revealed.
> This drives me to a conclusion, that a single language feature is sufficient
> which allows to separate a protocol for interaction with a module
> from the module part we expect to change without affecting clients.
>
Taking you literally, we do not need a new CL feature.
But translating what you're saying into the terms
I am using, you're saying that we need only one
protocol, whereas I am saying that it's better
to have two. There is no "language feature"
at the CL level, but there is a practice, or
a "design pattern", that I am recommending.
Having :documentation in the defpackage
to explain what's going on, especially who
is intended to use this packge, would be
very hepful. Design patterns are essentially
a way to extend a language (in a nutshell).
> I made observation on many java systems, that addition to packages
> the public/protected/private for classes has bad influence on the
> system modularity.
>
> Many java programmers tend to hide attributes of every single class
> behind get/set accessors, even for simple classes which just hold
> values passed between methods and do not contain any logic; to
> create hierarchies of interface/abstract class/concrete for various
> minor classes, employing protected methods.
>
There are at least two reasons to put in such methods.
First, changing the implementation while keeping
compatible behavior for the public methods.
Imagine that you have a class called Point representing
a point in two-space. You could use Cartesian
representation: two data members x and y,
and getX and getY (and, if desired, setX
and setY). These are the simple getters
(and setters) that you mention above.
But what if, for some reason (numerical
methods, I don't know) you decide that
internally using a polar representation,
with two data members storing an
angle and a distance from the origin, is
better. You can make getX continue
to work using the appropriate
trigonometry operations.
Second, these days there are lots of tools
that use "dependency injection" and whatnot
that only work if you have such methods.
Extending "ant", using Spring, and so on
work this way, so it has become part of
the usual Java design pattern for many
classes. (See "Java Bean".)
> But at the same time, people do not care to create structure at larger
> level, to divide the system into modules. It is not uncommon to see
> large systems where all these classes, subclasses in all the
> packages are public: everything is available to everything.
>
Sure.
> Separate access control for classes obscures the idea of modularity
> (especially in case of such an OO centric language like java, the
> programmers are mostly concentrated on classes/object, and often
> don't even think about packages;).
>
I don't know what you mean by this. It depends what
you mean by "access control". But if you mean limiting
the access that callers of a module have, that is
exactly the way modularity is usually realized.
> Class is too small entity to form a module. A module API (protocol)
> usually consists of set of classes, methods, factory functions, maybe
> some global variables.
>
Ah, yes. That's why Common Lisp has packages and
Java has namespaces; to be able to make groups
exactly as you are saying.
> Another thought is that the interface between base class and
> its subclasses is not a second interface, but a part of the first interface.
> As you said:
>
>
>> So we could have one package
>> for the code that calls into our library, which would have the usual
>> protocol, which we can call the "external protocol". Then we could
>> have a *second package*, which presents a second protocol that
>> subclases would use!
>>
The second package IS for the second interface!
>
> What would be the name for the second protocol? I.e. what is the
> interaction the base class and the subclass perform via this protocol?
>
I'm not sure what the name would be. If we were
to adopt the practice I'm advocating, it would
indeed be a good idea to have a standard
naming practice.
If you look at some of the major Java libraries, you
can see them dealing with that. For example,
look at JNDI, the standard Java interface for
dealing with any kind of naming service,
particularly hierarchical ones such as file
system directories and LDAP servers.
First, there is what they call the "API".
This is used for modules that want to do things
like "look up this name and tell me the contents"
or "for this name, tell me all the children".
Then there is what they call the "SPI",
the "Service Provider Interface". In the
API, the calling module calls JNDI.
For the "SPI", JNDI calls the service
provider module. A service provider
module is what is often called a "driver".
You'd have one for file systems, one for
LDAP, and so on. Java comes with some useful
ones like those two, and you can add your
own if you want to allow people who use
the JNDI API to access some previously
unsupprted naming system.
This is very much like what I'm talking about.
The API and the SPI are in different Java
namespaces (well, as far as I know),
and writing a module that calls the API
is extremely different from writing a
service provider.
The SPI might or might not work by overriding
classes or adding :before methods or whatever.
In fact, the whole thing that I am advocating
here is NOT ONLY for OOP and subclassing.
I just presented it that way because that's
where I see the main "pain point". What
REALLY matters is the distinction between
the API and the SPI.
> As you point, it may be reusing of the implementation (derived streams
> reuse write-string implementation from the base class).
>
> Looks like a thing intended for reusing can rightly be called
> an "external protocol" too.
>
Well, this is something different, I think; a utility
class whose behavior has to be documented.
But this is a side issue and this email is
alarmingly long already. :)
> Another view on the interaction in subclassing is that the base class serves
> as a template of some functionality, and we can customize its logic by
> overriding some methods.
In Java, however, the two are separated: there is an
Interface that specifies the contract, and then
there might or might not be a standard abstract
base class that provides some helpful stuff.
You are not required to use the utility
abstract base class, although in practice I
think I've always seen it used.
Java allows you to skip having the Interface,
which is too bad from the point of view
of clean and simple language semantics,
and for clarity of code. It's a lot like
CLOS's not requiring defgeneric, and
probably for the same reason; to let
you be lazy or more succinct. But I
think it's less clear. I try to always use
explicit defgenerics. In our own code
here, we particularly stress using
defgenerics for functions that are felt
to be exposed to other modules.
(I say "felt" because our code does not isolate
modules as well as it ought to.)
> The output stream example may be viewed
> in this perspective: we plug-in our write-character implementation into
> the functionality implemented by the base class in order to customise it
> to work with different character source.
>
Yes.
> The means to parametrize/customize the protocol behavior can also
> be considered as a part of the protocol. In simple case it's method parameters,
> in more complex case we provide, for example, custom strategies, and can do it
> by defining a method for the sublcass.
>
OK.
> Therefore it looks to me that in many cases we deal with a single interface,
> not two distinct ones.
>
Well, I think I've explained my point about that above.
> Sometimes a module can interact with the outside world via several protocols, but
> it not necessary relates to sublcassing, and it's not necessary that both (all)
> the protocols are defined by this module.
>
Yes, indeed! That goes beyond the scope of what I've
said so far. You can indeed have more than one external
API, regardless of what you do with SPI's.
I belive Ada lets you do all of this. More about
Ada and the experience with compatible
upgrades, below.
>
>> Now, there's a great thing about CL packages: you can have more than
>> one defpackage for the *same* symbols. \
>>
Exactly. You can have more than one API, you can
have a separate SPI, you can have multiple SPI's
and so on. This is crucial to everything I am
proposing. (By "proposing" I do not mean to
claim that what I'm saying is novel and original.
It may well have been done before. I'm not trying
to take credit for anything here.)
>
> It's a great feature. I think you mean something like when we want to define one
> protocol as an extension of another protocol. Although in many cases I suppose
> it's possible instead of extending interface to have a separate interfaces. I.e.
> if one API is (func1, func2, func3), and another API is (finc1, func2, func3, extra-func),
> we could have the second API just as (extra-func).
>
Exactly.
> A good example is needed. Maybe something like database connection API, where implementation
> for particular database is provided by a pluggable "driver"; or maybe swank, and swank
> backends for different lisps; or something simpler.
>
Right, exactly, as I discussed above.
For your reading pleasure, here's is some stuff
about Ada. It's from Tucker Taft, one of the
world's foremost experts on Ada (no kidding).
Note that Ada 95 is a later version of Ada,
with improvements to the original Ada
definition. Tucker was very heavily involved
with the definition of Ada 95, for years.
The point he makes in the first paragraph
corresponds to what I've been saying:
the fact that "print" calls "to_string" is
NOT apparent to the caller, but IS
apparent to anyone writing a subclass.
----------------------------------------