Hello list,
I'd like to talk a bit about our Java Collections API integration, and especially a topic that has been lumbering in my mind for quite a lot of time and which has been raised today by Rudi Schlatte on #abcl. Warning: the mail turned out to be quite long.
As you probably know, some time ago I added to ABCL an implementation of Christophe Rhodes' extensible sequences proposal[1], until then only implemented in SBCL (as far as I know). It's not loaded by default, but you can get it with (require :extensible-sequences). Some time later I used that API to integrate ABCL with the Java Collections framework, so that (certain) Java collections are Lisp SEQUENCEs. Again, to get it you need to (require :java-collections).
However, there are, I think, two interesting limitations in the ext. seq. protocol, which I'm going to discuss.
Since it was designed as an extension to CL's sequences, the protocol builds directly upon the concept of CL sequences (lists and vectors) - and in particular it assumes sequences are ordered (accessible by index) and mutable. The only basic operations one must absolutely define to provide a custom sequence implementation are ELT, (SETF ELT), and LENGTH; anything else is defined by default upon those operations, albeit inefficiently. That concept of sequences roughly matches that of Java Lists (except the fact that mutability of Lists is optional). List is a subtype of Collection which implies ordering and access by index. Collection is basically an Iterable thing with a length and optional mutability (in the form of adding elements to the end and removing elements destructively). The concept of a Collection or better an Iterable is missing from the extensible sequences protocol, although it provides an iterator sub-protocol. This makes it unsuitable for those kinds of sequences that do not fully respect CL's definition of sequence (e.g. because they are not mutable of because they can only be iterated once and not accessed by index, like a stream), but that would nevertheless benefit from adhering at least in part to the sequence protocol (because then you might LOOP or, shameless plug, DO+ over them, for example). That would also allow to expose arbitrary Java Collections, and not just Lists, as Lisp sequences - including thus Sets and various specialized Collection types.
So the first point is: I'd like to extend the protocol with the concept of an Iterable. How? Ideally, Iterable would be a more basic concept than Sequence, but we cannot modify the class hierarchy of SEQUENCE because it's defined by the standard. So, either ITERABLE must be completely disjoint from SEQUENCE, or it must be a subtype of it. Of the two, I favor the latter, because many sequence operations are supposed to signal errors in case their argument is not a proper sequence, and deviating from that would probably constitute a violation of the standard. So I would define an Iterable as a special SEQUENCE that is defined in terms of an iterator; by default ELT, (SETF ELT) and LENGTH create a new iterator and, very inefficiently, use it to walk the Iterable and do their work. Of course specialized, more efficient implementations are possible. Java Collections would be Iterables, and Java Lists would retain their specialized implementation of ELT and (SETF ELT) which do not necessarily walk the list each time (depending on the type of list).
The second idea is that I think it would be beneficial to explicitly state that mutability of sequences is optional. The CL standard doesn't say anything about it (or so it seems to me), although you can clearly see that in certain passages it assumes that list and vector (which are both mutable) are the only subtypes of sequence, whereas list and vector are explicitly said not to constitute an exhaustive partition of the type sequence[2]. Currently a number of sequence functions in the protocol implementation assume mutability, for example SUBSEQ by default creates a copy of the same type and sets elements one by one using an iterator. However, to eliminate the mutability requirement, at least another piece is missing: a generalization of CONS, i.e. an operation that appends a single element to the beginning of a sequence nondestructively. This is roughly what Clojure calls conj (except that conj does not specify where the element is added, which is a very awkward decision to me) and could be defined as (concatenate (class-of sequence) (list item) sequence)... if only concatenate wasn't implemented in terms of (SETF ELT) :P The possible negative sides of such a decision are: 1. it could in the end be an aspect of non-conformity to the ANSI standard (although only relevant to non-standard sequence subtypes) 2. it would make the default implementation of certain sequence operations even more inefficient for custom mutable sequences (that could be mitigated by checking for a marker mixin class MUTABLE-NONSTANDARD-SEQUENCE) 3. it would make ABCL's implementation of sequences a bit more complex. Note also that lifting mutability would NOT benefit Java collections because, inconsistently, they provide no general means to extend a collection without mutating it.
What do you think?
Cheers, Alessio
[1] doc.gold.ac.uk/~mas01cr/papers/ilc2007/sequences-20070301.pd [2] http://www.lispworks.com/documentation/HyperSpec/Body/t_seq.htm