I noticed that splittist reported on #abcl that abcl-0.23.0 has fails to start if it is in a directory that contains #\Space characters.
The problem is related to my "fix" for loading Pathnames that contain #+ characters which reveals that we are too undecided with our encoding of a JAR Pathname.
The problem arises by not rigorously treating Pathnames with "jar:" and "file:" prefixes as having URLEncoded DIRECTORY, NAME, and TYPE components, which seems to be how the java.net.URL classes treat them. Although this seems innocent, when ABCL loads its FASLs it always using this code so this deeply effects the system operation, although normally no one actually constructs such URLs.
My proposed fix would be to treat all inputs to Pathname which construct with "jar:" and "file:" schema as being URLEncoded. Likewise, we would fix the namestring routines to provide such encoding on the "way out" to string representations.
There would therefore be a difference between (pathname-directory "file:/containing%20a%20space/") ==> (:ABSOLUTE "containing a space")
and (pathname-directory #p"/containing%20a%20space/") ==> (:ABSOLUTE "containing%20a%20space")
There would be a bit of a surprise that
(pathname-directory "file:/cl+ssl!/foo.lisp") ==> (:absolute "cl ssl")
It will take some time to chase down all the code paths here, so please comment if this solution is still ambiguous in some way that I haven't anticipated, or if someone has different proposal.
On 11/26/10 1:26 PM, Mark Evenson wrote: […]
There would be a bit of a surprise that
(pathname-directory "file:/cl+ssl!/foo.lisp") ==> (:absolute "cl ssl")
No, '+' should always be treated as '+'. What I want is the concept of URI encoding here, not URL encoding.
Attached are patches that enable pathnames created with the "file" scheme to contain #\Space characters. The causes of this a little deeper, involving a mistaken use of URL encoding rules suitable for HTML forms rather than the more proper URI encoding rules.
The patches work well under UNIX (actually OS X), but run into problems on Windows.
Comments and feedback on this direction solicited.
The plan would be to release an abcl-0.23.1 sometime in the next few days, once the patches have had some independent testing.
After testing under Windows, I've committed my proposed solution to the URI escaping issues in pathnames to the trunk as [r13506][1]. If fellow developers could build and test the ABCL-TEST-LISP system on their respective systems, I would appreciate it.
From the commit message:
Fix problems with #\Space characters in JAR pathnames.
We now require that inputs to the PATHNAME routines that have the URI scheme "jar:file" or "file" properly encode themselves as URIs according to RFC2396. Mainly this means that #\Space and #? characters in such strings should be percent encoded (i.e. "jar:file:/path%20with%20/space/and%3fquestion-mark"). The corresponding namestring routines have been adjusted to output such URI encoded representations, although the underlying PATHNAME objects contain unescaped values. The routines for loading FASLs have been adjusted to URI encode their inputs as well.
The #+ character is no longer an escape for #\Space (this was a bug).
[1]: http://trac.common-lisp.net/armedbear/changeset/13056
Hi Mark,
You asked me to verify on Windows too. Having updated to the latest and greatest of trunk, I see that I'm now able to run ABCL from a path with spaces in them, so far so good. However, you said we require spaces to be %-encoded in pathnames now. Or was that in [jar:]file: pathnames only?
Because, when I do (DIRECTORY #p"") from "D:\abcl\my docs", I see
(#P"D:/abcl/my docs/")
which is not a file: pathname, so it's to be expected?
Anyway, the immediate regression seems gone. I suppose I need to test much more, but it's getting too late for it. I'd appreciate some directions, if you have time.
Thanks!
Bye,
Erik.
On Sat, Nov 27, 2010 at 12:09 PM, Mark Evenson evenson@panix.com wrote:
After testing under Windows, I've committed my proposed solution to the URI escaping issues in pathnames to the trunk as [r13506][1]. If fellow developers could build and test the ABCL-TEST-LISP system on their respective systems, I would appreciate it.
From the commit message:
Fix problems with #\Space characters in JAR pathnames.
We now require that inputs to the PATHNAME routines that have the URI scheme "jar:file" or "file" properly encode themselves as URIs according to RFC2396. Mainly this means that #\Space and #? characters in such strings should be percent encoded (i.e. "jar:file:/path%20with%20/space/and%3fquestion-mark"). The corresponding namestring routines have been adjusted to output such URI encoded representations, although the underlying PATHNAME objects contain unescaped values. The routines for loading FASLs have been adjusted to URI encode their inputs as well.
The #+ character is no longer an escape for #\Space (this was a bug).
-- "A screaming comes across the sky. It has happened before, but there is nothing to compare to it now."
armedbear-devel mailing list armedbear-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/armedbear-devel
On Nov 28, 2010, at 23:28 , Erik Huelsmann wrote:
Hi Mark,
You asked me to verify on Windows too. Having updated to the latest and greatest of trunk, I see that I'm now able to run ABCL from a path with spaces in them, so far so good. However, you said we require spaces to be %-encoded in pathnames now. Or was that in [jar:]file: pathnames only?
Because, when I do (DIRECTORY #p"") from "D:\abcl\my docs", I see
(#P"D:/abcl/my docs/")
which is not a file: pathname, so it's to be expected
Correct, it is the behavior I expect.
Perhaps another few sentences of explanation might help everyone check my reasoning:
The underlying Lisp PATHNAME has no concept of URI encoding, so all characters "represent themselves". A problem arises in the places within the ABCL FASL loading routines where we get a java.net.URL that we need to interpret as a PATHNAME. If this URL represents a filepath, i.e. has the scheme "file", it is URI encoded according to RFC3986 (actually probably RFC2396, but the two should be identical from what I can tell). To be consistent, I implemented changes so that all input to the PATHNAME routines using a namestring containing the "file" scheme are decoded as URI encodings, mainly meaning that sequences of the form '%[hexdigit][hexdigit]' are translated to the corresponing ISO-8895-1 character. When a namestring is output that contains the "file" scheme, we encode as a URI. Currently, the only case we output a "file" scheme in namestrings is when we compute the value for an entry in a JAR contained on the local filesystem, i.e.
CL-USER> #p"jar:file:/dir%20with%20spaces/some.jar!/foo.lisp" "jar:file:/dir%20%with%20spaces/some.jar!/foo.lisp")
CL-USER> #p"file:/dir%20with%20spaces/some.jar" "/dir with spaces/some.jar"
We could maybe implement a special variable like *PATHNAME-STRICT-URIS* that always emits the URI encoded version, but I think that would confuse too many people.
-- "A screaming comes across the sky. It has happened before, but there is nothing to compare to it now."
I've [backported the fix for spaces in directory to the 0.23.x branch][1], so those interested in testing can just shake-out the prosposed abcl-0.23.1 version. To correct a possible misunderstanding, I am interested in testing in all developer environments, not just Windows, although my confidence has grown under all environments (WinXP, Solaris, OSX) that I have managed to test.
[1]: http://trac.common-lisp.net/armedbear/changeset/13065
-- "A screaming comes across the sky. It has happened before, but there is nothing to compare to it now."
armedbear-devel@common-lisp.net