The separate pathname matcher for jars looks odd to me. I'd expect the listing to give whatever it gives, and the matching condition in directory.lisp to filter it. Is jar filtering so different that it requires a different matcher?
Same question applies to the wildcard matching, jar listing seems to do the wildcard matching in java, rather than in lisp? That's also different from the way directory listings are handled.
The list-directory primitive sorely needs to be split into two functions (listJar and listDirectory), it's getting long-winded. That's not a high-priority issue, but we need to mind function length, it's a huge readability issue.
On 2/22/10 6:21 PM, Ville Voutilainen wrote:
The separate pathname matcher for jars looks odd to me. I'd expect the listing to give whatever it gives, and the matching condition in directory.lisp to filter it. Is jar filtering so different that it requires a different matcher?
LIST-DIRECTORY lists the jar directory contents including directories, while MATCH-WILD-JAR-PATHNAME simply uses PATHNAME-MATCH-P to determine what to return. Since jar entries which are directories always have a trailing "/" which is not true for pathnames on the filesystem (#p"/tmp" could be a file or a directory) the two are not always equivalent.
The jar pathname part of LIST-DIRECTORY is currently unused. I implemented it first, tried to patch the Lisp in "directory.lisp" to use it, but ran into problems that weren't understandable. I stepped back, and noticed that the algorithm for wildcard matching for filesystems was fundamentally different from jar files (see next comment), implemented that algorithm as MATCH-WILD-JAR-PATHNAME, saw that it worked well enough, and went with that for a commit.
Overall, I do suspect that the way I implemented jar pathnames is not totally optimal, but in the last six weeks I have not been able to improve on the basic design of using a list for DEVICE. Often there are points in reworking 'Pathname.java' where I felt "Why I am doing this same sort of code again? Surely this is a sign a fundamental problem in abstraction." Sometimes I found a better way, sometimes not, but I was never able to come up with a better basic assumption (to use DEVICE as a list of pathnames for the jar file, DIRECTORY as the relative path within that jar). I have come to the conclusion that implementing jar pathnames the way I did pushes a lot of complexity to the associated primitives in Pathname.java, but ultimately makes quite a bit easier on the user of this abstraction. As evidence for this, I would argue that my approach *has* dramatically simplified the code in 'Load.java' (and 'Lisp.java' and 'AutoloadedFunctionProxy.java'). A weak point is that code that thinks that the DEVICE field is always a string—or that (truename (pathname-directory (truename pathname)) always yields a pathname if (truename pathname) succeeds—fails. Since a lot of PATHNAME behavior in ANSI is implementation dependent, we are still an ANSI CL, but we have very different usage of the DEVICE pathname component than is commonly assumed.
An alternative might have been to subclass PATHNAME as PATHNAME-JAR, but when I analyzed that approach it seemed to involve a lot more (if (pathname-jar-p pathanme) option1 option2) than I wanted. If all the system code taking a PATHNAME as an argument were to be defined with generic functions this would be considerably more attractive (and easier). But the dirty secret of CLOS is that it's a bolt-on via macros, which all CL implementations that I have studied bootstrap after the base system is in place. CLOS isn't even present in ABCL when the user gets to "CL-USER>", right?
Same question applies to the wildcard matching, jar listing seems to do the wildcard matching in java, rather than in lisp? That's also different from the way directory listings are handled.
DIRECTORY involves wildcards for non-trivial use (its non-wildcard use of actually doesn't even distinguish a directory from a file!) The algorithm for use of wildcard DIRECTORY is fundamentally different for the filesystem than a jar as follows. For a filesytem, you have to branch at each wildcard in the pathname. For a jar file, you are simply running down the list of all entries in the jar file contents. One could probably implement the second (jar pathname directory listing) in terms of the first, but it wouldn't make much sense and wouldn't be necessary. I couldn't do it easily coming into problems with my LIST-DIRECTORY implementation, although I did give it about an hour's effort.
The list-directory primitive sorely needs to be split into two functions (listJar and listDirectory), it's getting long-winded. That's not a high-priority issue, but we need to mind function length, it's a huge readability issue.
I am a "if the function doesn't fit into one 80x25 Emacs buffer it should be split" kinda guy", but the ABCL codebase violates that maxim at so many points (q.v. compiler-pass2.lisp) that I don't try to religously follow that principle here. I'd be happy to do such splitting, but would have thought that you of all people would have jumped on my back about the penalty for a further push to the stack. My rule of thumb is that for code refactoring like you have done with the string function where the codepath is used more than once, such splitting is worth it. But for functions like LIST-DIRECTORY, we should keep it all in one method call for efficiency. For what its worth, I *did* try to figure out how to factor the common code between LIST-DIRECTORY and PATHNAME-MATCH-P out into something separate, but Pathname.wildcardMatches() was the only thing that looked plausible to my brain.
Hopefully I understood your questions: push back if I haven't!
yers in cons, Mark
On 22 February 2010 22:20, Mark Evenson evenson@panix.com wrote:
First of all, an excellent explanation for the implementation choices. Thanks!
I am a "if the function doesn't fit into one 80x25 Emacs buffer it should be split" kinda guy", but the ABCL codebase violates that maxim at so many points (q.v. compiler-pass2.lisp) that I don't try to
The violation is cruft in our codebase, and is something we are semi-actively gettind rid of. I've been trying to shorten things wherever I touch, Closure.java being the prime example of such work. compiler-pass2 is actually another example. Note that both pass2 and Closure still have long functions in places, but we have cut a lot of that cruft into better abstractions whenever time has allowed.
religously follow that principle here. I'd be happy to do such splitting, but would have thought that you of all people would have jumped on my back about the penalty for a further push to the stack. My
Oh, not at all - I expect the compiler to inline the calls to the (hopefully forthcoming) two helper functions. I prefer readability to micro-optimizations any given day. :) I prefer readability to even _real_ optimizations.
splitting is worth it. But for functions like LIST-DIRECTORY, we should keep it all in one method call for efficiency. For what its worth, I
That would sound like a premature optimization to me. Having LIST-DIRECTORY as
if (jar) { listJar(); } else { listDir(); }
should be just fine, if the separate helper functions are private. As I said, I expect the compiler to inline such calls.
armedbear-devel@common-lisp.net