I've [committed to an initial design][1] for URLs to be used as Pathnames, which I am in the process of implementing. The primary use of this functionality will be to be able to eventually express OSGi bundles within ASDF system definitions.
[1]: http://trac.common-lisp.net/armedbear/browser/trunk/abcl/doc/design/url-path...
The gist of the proposal is to represent a URL with a Pathname whose HOST component is a list. The list will be an association list with the key :SCHEME containing the URL scheme and :AUTHORITY containing the URL authority. The DIRECTORY, NAME, and TYPE components will be used to construct the PATH of the URL.
As an example, "http://example.org:8080/org/armedbear/lisp/boot.lisp" would be converted to a Pathname as follows
pathname: { host: (:SCHEME "http" :AUTHORITY "example.org:8080"), directory: (:ABSOLUTE "org" "armedbear" "lisp"), name: "boot", type: "lisp" }
As an answer to Alessio's question (long ago) about whether URLs should really be a Pathname or if we are better off using a class, I think that they really are better abstracted as a Pathname because they a) have a hierarchical path b) the network location has a strong resemblance to how HOST is used for UNC mounts under Windows and c) being able to express OSGi bundles in ASDF has a chance of working.
Critique welcome.
On Fri, Mar 26, 2010 at 9:34 AM, Mark Evenson evenson@panix.com wrote:
I've [committed to an initial design][1] for URLs to be used as Pathnames, which I am in the process of implementing. The primary use of this functionality will be to be able to eventually express OSGi bundles within ASDF system definitions.
The gist of the proposal is to represent a URL with a Pathname whose HOST component is a list. The list will be an association list with the key :SCHEME containing the URL scheme and :AUTHORITY containing the URL authority. The DIRECTORY, NAME, and TYPE components will be used to construct the PATH of the URL.
As an example, "http://example.org:8080/org/armedbear/lisp/boot.lisp" would be converted to a Pathname as follows
pathname: { host: (:SCHEME "http" :AUTHORITY "example.org:8080"), directory: (:ABSOLUTE "org" "armedbear" "lisp"), name: "boot", type: "lisp" }
As an answer to Alessio's question (long ago) about whether URLs should really be a Pathname or if we are better off using a class, I think that they really are better abstracted as a Pathname because they a) have a hierarchical path b) the network location has a strong resemblance to how HOST is used for UNC mounts under Windows and c) being able to express OSGi bundles in ASDF has a chance of working.
Critique welcome.
FWIW, CLForJava also represents URLs as pathnames: http://clforjava.org/Documents/ELS%202008%20-%20Abstraction%20of%20Pathnames...
Bye, Alessio
On 26 March 2010 10:44, Alessio Stalla alessiostalla@gmail.com wrote:
FWIW, CLForJava also represents URLs as pathnames: http://clforjava.org/Documents/ELS%202008%20-%20Abstraction%20of%20Pathnames...
In that case it would be good to be compatible with their scheme<sic>. Unless there's some problem with that.
On Mar 26, 2010, at 10:16 AM, Ville Voutilainen wrote:
On 26 March 2010 10:44, Alessio Stalla alessiostalla@gmail.com wrote:
FWIW, CLForJava also represents URLs as pathnames: http://clforjava.org/Documents/ELS%202008%20-%20Abstraction%20of%20Pathnames...
In that case it would be good to be compatible with their scheme<sic>. Unless there's some problem with that.
From what I can understand from the paper Alessio referenced (thanks!), I don't think we want to use their abstraction in terms of Pathname components for the following reasons:
1. ClForJava *always* fills in the DEVICE component to contain the URI scheme, meaning that even ordinary files get a DEVICE containing a :FILE symbol. We would have to retrofit all the existing code for ordinary pathnames for no apparent gain which already works just fine. This use of a symbol in DEVICE is also incompatible with ABCL's use of DEVICE under windows to contain the drive letter. And it is incompatible with jar pathnames' use of DEVICE to contain the Pathname(s) of the jar(s).
2. I don't see an easy way to represent a jar pathname which needs to both contain one or two Pathnnames for the enclosing jars in addition to the reference to the entry in the jar. Part of the reason for this is that "jar URLs" are actually not URLs but a larger protocol that uses URLs to specify the location of the jar. Since as I understand the CLForJava design, which dictates that all Pathnames have a DEVICE which is a symbol, and a HOST which is a string, we don't have much room left to implement jar pathnames. We *could* wedge the jar URL in the DIRECTORY along with the entry path, but it wouldn't make much sense when someone tries to MERGE-PATHNAMES such a beast without a lot of special casing.
3. The incorporation of the URL fragment and query in the DIRECTORY is just wrong. Logically these elements belong as subordinate to the NAME i.e. for "http://example.org/command/search?s=demons" and "http://example.org/command/expel?s=demons" the query "?s=demons" should be associated with the NAME ("search" or "expel" rather than pushed to the DIRECTORY as "(:ABSOLUTE "command" "?s=demons"). And a fragment is even more strongly associated with the semantics of being a subaddress of the URL. It would make more sense to wedge the URL fragment and query components into TYPE if anything. And I don't expect MERGE-PATHNAMES to react sensibly to this design without a lot of special casing.
4. I don't see any real use of mapping URIs like "mailto:username@example.org" or "urn:some.really-opaque-234234234-string" to Pathnames. If one can't OPEN and LOAD a Pathname, what's the point? If you want a generic URI interface use PURI or something.
5. The use of HOST as a String which contains the URL authority portion is ambiguous with respect to the use of HOST for logical hosts. It is not immediately obvious from the paper how one distinguishes the two cases. I presume CLforJava does not have an implementation of logical pathnames to worry about.
What I *like* from the CLforJava paper:
1. It validates my abstraction that the a URL can be meaningfully decomposed into three parts, namely the scheme, the authority, and the path.
2. The use of additional "PATHNAME-*" functions to extract parts of the URL (like PATHANME-SCHEME to extract the scheme). The paper doesn't indicate whether these are SETF-able places, but that would be an obvious implmentation choice.
3. It validates the choice of :ABSOLUTE in DIRECTORY URL components.
Since the namestrings from the two implementations will be compatible, which is mostly how I can imagine them sharing code in ASDF, I would propose that we do not adopt the CLforJava Pathname component scheme. Adopting it would require a) refitting "ordinary" Pathnames, b) figuring out some method of expressing jar pathnames, and c) just using a broken idea of fragments and queries at the DIRECTORY. What I would propose to do is make a series of SETFable PATHNAME-URL-SCHEME, PATHNAME-URL-PATH, etc. methods to facilitate working on the Pathname structure.
--
"A screaming comes across the sky. It has happened before, but there is nothing to compare to it now."
On Fri, Mar 26, 2010 at 12:29 PM, Mark Evenson evenson@panix.com wrote:
On Mar 26, 2010, at 10:16 AM, Ville Voutilainen wrote:
On 26 March 2010 10:44, Alessio Stalla alessiostalla@gmail.com wrote:
FWIW, CLForJava also represents URLs as pathnames: http://clforjava.org/Documents/ELS%202008%20-%20Abstraction%20of%20Pathnames...
In that case it would be good to be compatible with their scheme<sic>. Unless there's some problem with that.
From what I can understand from the paper Alessio referenced (thanks!), I don't think we want to use their abstraction in terms of Pathname components for the following reasons:
- ClForJava *always* fills in the DEVICE component to contain the
URI scheme, meaning that even ordinary files get a DEVICE containing a :FILE symbol. We would have to retrofit all the existing code for ordinary pathnames for no apparent gain which already works just fine. This use of a symbol in DEVICE is also incompatible with ABCL's use of DEVICE under windows to contain the drive letter. And it is incompatible with jar pathnames' use of DEVICE to contain the Pathname(s) of the jar(s).
- I don't see an easy way to represent a jar pathname which needs
to both contain one or two Pathnnames for the enclosing jars in addition to the reference to the entry in the jar. Part of the reason for this is that "jar URLs" are actually not URLs but a larger protocol that uses URLs to specify the location of the jar. Since as I understand the CLForJava design, which dictates that all Pathnames have a DEVICE which is a symbol, and a HOST which is a string, we don't have much room left to implement jar pathnames. We *could* wedge the jar URL in the DIRECTORY along with the entry path, but it wouldn't make much sense when someone tries to MERGE-PATHNAMES such a beast without a lot of special casing.
- The incorporation of the URL fragment and query in the DIRECTORY
is just wrong. Logically these elements belong as subordinate to the NAME i.e. for "http://example.org/command/search?s=demons" and "http://example.org/command/expel?s=demons" the query "?s=demons" should be associated with the NAME ("search" or "expel" rather than pushed to the DIRECTORY as "(:ABSOLUTE "command" "?s=demons"). And a fragment is even more strongly associated with the semantics of being a subaddress of the URL. It would make more sense to wedge the URL fragment and query components into TYPE if anything. And I don't expect MERGE-PATHNAMES to react sensibly to this design without a lot of special casing.
- I don't see any real use of mapping URIs like
"mailto:username@example.org" or "urn:some.really-opaque-234234234-string" to Pathnames. If one can't OPEN and LOAD a Pathname, what's the point? If you want a generic URI interface use PURI or something.
- The use of HOST as a String which contains the URL authority
portion is ambiguous with respect to the use of HOST for logical hosts. It is not immediately obvious from the paper how one distinguishes the two cases. I presume CLforJava does not have an implementation of logical pathnames to worry about.
What I *like* from the CLforJava paper:
- It validates my abstraction that the a URL can
be meaningfully decomposed into three parts, namely the scheme, the authority, and the path.
- The use of additional "PATHNAME-*" functions to extract parts
of the URL (like PATHANME-SCHEME to extract the scheme). The paper doesn't indicate whether these are SETF-able places, but that would be an obvious implmentation choice.
- It validates the choice of :ABSOLUTE in DIRECTORY URL components.
Since the namestrings from the two implementations will be compatible, which is mostly how I can imagine them sharing code in ASDF, I would propose that we do not adopt the CLforJava Pathname component scheme. Adopting it would require a) refitting "ordinary" Pathnames, b) figuring out some method of expressing jar pathnames, and c) just using a broken idea of fragments and queries at the DIRECTORY. What I would propose to do is make a series of SETFable PATHNAME-URL-SCHEME, PATHNAME-URL-PATH, etc. methods to facilitate working on the Pathname structure.
I completely agree. I actually just skimmed the CLforJava paper; I'm glad you found it useful. The broken fragment handling stroke me too, but I didn't go much deeper than that.
Incidentally, I learned that the maintainer/main developer of CLforJava will give a talk at the ELS. I will probably be there; I hope to learn something useful for ABCL, too.
Bye, Alessio
On 26 March 2010 13:29, Mark Evenson evenson@panix.com wrote:
In that case it would be good to be compatible with their scheme<sic>. Unless there's some problem with that.
Rebuttal pruned. :)
Mark, that was.. ..very convincing. ;) If cl4java has that many problems, we don't want to be compatible.
On Mar 26, 2010, at 4:34, Mark Evenson wrote:
I've [committed to an initial design][1] for URLs to be used as Pathnames, which I am in the process of implementing. The primary use of this functionality will be to be able to eventually express OSGi bundles within ASDF system definitions.
Two general comments without having looked at the code or the mentioned paper:
* Make sure you're not assuming *too much* about URL syntax; stick to what's stated in the URI Generic Syntax RFC and/or particular scheme RFCs.
In particular, make sure you're correctly handling the 'reserved' characters and not making too many assumptions about what is equivalent, or about what cannot occur.
* On the other hand, it would seem natural that a pathname with, say, directory (:ABSOLUTE "qty" "1/2 lb") should become the URL whatever://.../qty/1%2F2%20lb/ i.e., the user should be able to ignore the encoding/syntax/ escaping issues of URLs when working with them in a data structure. But this may conflict with being able to handle the reserved characters unambiguously. The central question is "What does % mean in a PATHNAME-component string?". Make sure that the answer is consistent.
On Friday, March 26, 2010, Kevin Reid kpreid@mac.com wrote:
On Mar 26, 2010, at 4:34, Mark Evenson wrote:
I've [committed to an initial design][1] for URLs to be used as Pathnames, which I am in the process of implementing. The primary use of this functionality will be to be able to eventually express OSGi bundles within ASDF system definitions.
Two general comments without having looked at the code or the mentioned paper:
* Make sure you're not assuming *too much* about URL syntax; stick to what's stated in the URI Generic Syntax RFC and/or particular scheme RFCs.
In particular, make sure you're correctly handling the 'reserved' characters and not making too many assumptions about what is equivalent, or about what cannot occur.
* On the other hand, it would seem natural that a pathname with, say, directory (:ABSOLUTE "qty" "1/2 lb") should become the URL whatever://.../qty/1%2F2%20lb/ i.e., the user should be able to ignore the encoding/syntax/ escaping issues of URLs when working with them in a data structure. But this may conflict with being able to handle the reserved characters unambiguously. The central question is "What does % mean in a PATHNAME-component string?". Make sure that the answer is consistent.
The reserved characters are scheme specific. I'd suggest that there be a generic function eql dispatch on the scheme to compute namestrings.
Will the logical pathname system interact with URL pathnames? I'm thinking I could probably make use of the analogy between prefixes and logical hosts.
-Alan
-- Kevin Reid http://switchb.org/kpreid/
armedbear-devel mailing list armedbear-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/armedbear-devel
On 3/28/10 5:26 PM, Alan Ruttenberg wrote: […]
The reserved characters are scheme specific. I'd suggest that there be a generic function eql dispatch on the scheme to compute namestrings.
Since URL handlers can be added at runtime, which is the case with the OSGi handler for the "bundle" scheme that we are interested in supporting, we can't make an EQL specializer ahead of time for everything we will encounter. I was hoping to use the Java implementations of the URLStreamHandler for the schemes help out here, but I don't see much. At the moment I'm favoring just implementing such a specializer for "http" and "bundle", creating a protocol for a user to add a specializer, and calling it a day.
Do we want the namestring to always be the encoded form? I was planning to have the namestring be what the user used to create the URL-PATHNAME, but always encode before going to the network. I need to think about this more.
Will the logical pathname system interact with URL pathnames? I'm thinking I could probably make use of the analogy between prefixes and logical hosts.
By prefix you mean schemes like "http" or "ftp" right? I was currently planning to have URL pathnames explicitly not be allowed as logical hosts, because if one defined a logical host named "http", one could then never unambiguously construct an "http" scheme URL via #P"http://example.org/foo.lisp". We could require that a logical host *has* to be in upper case? In that case, I guess the two could co-exist.
Could you give an example of the sort of logical pathname definitions you would use? Incorporating something like wildcards for the URI authority is going to need some thought, but wouldn't be impossible.
On Mon, Mar 29, 2010 at 5:07 AM, Mark Evenson evenson@panix.com wrote:
On 3/28/10 5:26 PM, Alan Ruttenberg wrote: […]
The reserved characters are scheme specific. I'd suggest that there be a generic function eql dispatch on the scheme to compute namestrings.
Since URL handlers can be added at runtime, which is the case with the OSGi handler for the "bundle" scheme that we are interested in supporting, we can't make an EQL specializer ahead of time for everything we will encounter. I was hoping to use the Java implementations of the URLStreamHandler for the schemes help out here, but I don't see much. At the moment I'm favoring just implementing such a specializer for "http" and "bundle", creating a protocol for a user to add a specializer, and calling it a day.
+ ftp, https, urn, info would all be common in my world.
btw, I'm not sure mixing the authority and port is the best choice - they tend to be separate items. Also, how to do fragments ("#anchor") and queries (?q=blah) for http URIs?
Do we want the namestring to always be the encoded form? I was planning to have the namestring be what the user used to create the URL-PATHNAME, but always encode before going to the network. I need to think about this more.
It's a good question. The thing is that with an http URI you want to be able to format it for inclusion in a web page, or to execute an http get on it. So you need to access the encoded form in an easy way. I'm leaning to namestring as the way of retrieving that, but I suppose it doesn't matter as long as their's some function.
Will the logical pathname system interact with URL pathnames? I'm thinking I could probably make use of the analogy between prefixes and logical hosts.
By prefix you mean schemes like "http" or "ftp" right?
No, I mean qnames/curies http://www.w3.org/TR/curie/ http://www.w3.org/2001/tag/doc/qnameids-2004-03-17 http://www.w3.org/TeamSubmission/turtle/#terms "@prefix" http://www.w3.org/TR/rdf-sparql-query/#QSynIRI
I was currently planning to have URL pathnames explicitly not be allowed as logical hosts, because if one defined a logical host named "http", one could then never unambiguously construct an "http" scheme URL via #P"http://example.org/foo.lisp". We could require that a logical host *has* to be in upper case? In that case, I guess the two could co-exist.
Could you give an example of the sort of logical pathname definitions you would use? Incorporating something like wildcards for the URI authority is going to need some thought, but wouldn't be impossible.
I hadn't thought about wild cards at all. Maybe the use of logical pathnames for this is overambitious and I should think of this facility for URLs and retain my current system for dealing with URIs.
http://mumble.net:8080/svn/lsw/trunk/util/namespace.lisp http://mumble.net:8080/svn/lsw/trunk/util/uri.lisp
Best,
-Alan
-- "A screaming comes across the sky. It has happened before, but there is nothing to compare to it now."
armedbear-devel mailing list armedbear-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/armedbear-devel
On 4/1/10 5:08 AM, Alan Ruttenberg wrote:
On Mon, Mar 29, 2010 at 5:07 AM, Mark Evensonevenson@panix.com wrote:
[…]
At the moment I'm favoring just implementing such a specializer for "http" and "bundle", creating a protocol for a user to add a specializer, and calling it a day.
- ftp, https, urn, info would all be common in my world.
I currently intend to only implement URIs that can be dereferenced with respect to the network authority, which is why I call these things "URL pathnames". Something like "urn" or "info" that needs a resolution mechanism are not such good fits for CL Pathnames as far as I can tell, as the primary motivation here is to get LOAD and OPEN working for objects that java.net.URL (and its extensions via java.net.URLStreamHandler) can construct. I think if we want real URIs, I think we should look at using [PURI][1] as I don't think that java.net.URI has a schema specific escaping mechanism. If I implemented a method to convert from PURI <--> ABCL Pathname where possible, what that be suitable for your needs.
[1]: http://puri.b9.com/
btw, I'm not sure mixing the authority and port is the best choice - they tend to be separate items.
As I understand it "authority" is "host" plus (optional or implied) "port". I intend to implement SETFable functions like URL-PATHNAME-PORT URL-PATHNAME-AUTHORITY, URL-PATHNAME-HOST to allow the user to manipulate this value.
Also, how to do fragments ("#anchor") and queries (?q=blah) for http URIs?
A good question for which I'm not entirely happy with my current answer: just incorporate them in NAME, while providing functions like URL-PATHNAME-QUERY and URL-PATHNAME-FRAGMENT to allow manipulation. The other possibility I considered was somehow overloading the information in TYPE (make TYPE a LIST?) but that seems to have ambiguity problems as well.
Do we want the namestring to always be the encoded form? I was planning to have the namestring be what the user used to create the URL-PATHNAME, but always encode before going to the network. I need to think about this more.
It's a good question. The thing is that with an http URI you want to be able to format it for inclusion in a web page, or to execute an http get on it. So you need to access the encoded form in an easy way. I'm leaning to namestring as the way of retrieving that, but I suppose it doesn't matter as long as their's some function.
We'll make functions then. I suppose we need four: two variants to handle the inputs of string or pathname crossed with decoding or encoding.
[…]
I hadn't thought about wild cards at all. Maybe the use of logical pathnames for this is overambitious and I should think of this facility for URLs and retain my current system for dealing with URIs.
http://mumble.net:8080/svn/lsw/trunk/util/namespace.lisp http://mumble.net:8080/svn/lsw/trunk/util/uri.lisp
My preference would be to get de-referencable URIs (aka URLs) working first, and then look at implementing further stuff on top of that. I believe that sorting out how to handle such things probably be handled in libraries external to ABCL. I think that as long as we can get 'http', 'https', 'ftp', and 'bundle' URLs working with LOAD and OPEN (and MERGE-PATHNAME, PROBE-FILE, MATCH-PATHAME-P, etc.), we're going to cover most of the usage out there.
armedbear-devel@common-lisp.net