On Fri, Jun 18, 2010 at 12:37 PM, Mark Evenson evenson@panix.com wrote:
On 6/17/10 7:38 PM, Alessio Stalla wrote: […]
Proposed solution: let's invent an ABCL-specific way to print arbitrary pathnames. I proposed #P"abcl:(make-pathname ...)" which is ANSI-compatible and similar enough to what the current code in Pathname.writeToString can produce. Let's use that to print pathnames[1]. Reading them back is as simple as (eval (read-from-string ...)), and no other code needs to be modified.
What do you think?
[1] if not always, at least with dump-form, and when *print-readably* is T and the namestring can't be used. In the latter case currently the #P(...) syntax is used. Btw, probably dump-form *should* bind *print-readably* to T...
Hmmm, not sure I totally like if your proposal is that a PATHNAME is always in the #P"abcl:(make-pathname …)". If you are just proposing this is used when a namestring can't be produced, then I support this (weakly).
No, this is precisely the problem we have today. On Windows, for #P"/foo/bar" a namestring *can* be produced, but it is "\foo\bar", which then can't be read back in on non-windows.
I would claim that users are pretty cognitively wired at this point to expect a path be a string containing directory separators that we want to obey the "principle of least surprise" here.
In fact, I'm in doubt whether to always use the ugly form or only when serializing to fasls and when today we would have used #P(...)
Your proposal still doesn't say what ABCL on non-Windows should do with a deserialized PATHNAME that represents a UNC path or has a drive letter in DEVICE.
What could it possibly do but fail if you try to OPEN the pathname? The problem is not Windows pathnames on non-Windows. Those cannot work and it's the user's responsibility to use them only if she knows that the software will only run on Windows. The problem are Unix pathnames that, when printed under Windows, have their slashes converted to backslashes.
The current code (as I understand it) on non-Windows would treat a UNC pathname encoded as a string , e.g. "\a\b\c\d", as an error as the '\c' doesn't represent a valid Java char escape sequence, although a case like "\n\n\n\baz" would name the file with char sequence ('' LF LF BS 'a' 'z').
That's the same under Windows. String escaping has nothing to do with pathnames.
Counter-proposal #1: Note that java.io.File *does* correctly accept "/" as a directory separator under Windows. So, we could potentially just declare "/" as our directory separator in the #P representation on all platforms. I would then make UNC pathnames not have a printable namestring, so inadvertent doubling of path separators doesn't cause confusion. For UNC and drive letters on non-Windows, I would signal a condition when a de-reference was attempted with a restart that tried to DWIM (ignore the UNC share, drop the drive letter reference, or allow a user supplied correction).
I wouldn't do this DWIM thing. A UNC or drive-letter pathname on non-windows is a user error, imho. As for always using / as separator, it would solve the problem with asdf, but it wouldn't solve the other problem that not all pathnames are currently printable by abcl as #P"..." and so it sometimes uses #P(...) to list their components, which is not ANSI-compatible.
Counter-proposal #2: Use URI (IRI?) for the namestring representation. This fits better to the nature of ABCL pathnames i.e. they aren't really just about filesystems at this point. '/' would again be the standard directory path separator. UNCs would get their own scheme (or we would enforce RFC 3986 so that 'file://server/share/dir/file' means UNC whereas 'file:///not/a/server/but/absolute' means an absolute pathname). Drive letters would be part of the URI authority ('file://c:/windows/path'). The same sort of DWIM condition/restarts for Windows-specific semantic on non-Windows would be available. As a user convenience, we might make the 'file://' prefix be optionally inferred.
The problem with URIs is that they cannot represent all of CL's pathname components (version?). We'd need to invent an encoding. My proposal is simpler because an encoding is basically already there. However, I'm not against using URIs with an appropriate encoding.
Bye, Alessio