While addressing ASDF [issue #140](https://gitlab.common-lisp.net/asdf/asdf/-/issues/140) I stumbled on what seems to me to be an error. That is that
(ensure-directory-pathname "")
at least on SBCL, returns a `pathname` that has `NIL` in all its slots except `HOST`.
To me, this seemed wrong, and looking further, I found that `directory-pathname-p` returns `T` when given this pathname, because `directory-pathname-p` is defined (in its docstring) as:
A directory-pathname is a pathname _without_ a filename.
I would have assumed that a directory pathname was *also* a pathname with a non-`NIL` `directory` component.
I am considering making this (incompatible) change, but am interested in any feedback about whether this is the correct thing to do.
Robert Goldman rpgoldman@sift.net wrote:
While addressing ASDF issue #140...
Hi Robert,
I don't know if you ever saw my writeup about this operation (sometimes called by its LispM name, PATHNAME-AS-DIRECTORY) on sbcl-devel, but here you go:
https://sourceforge.net/p/sbcl/mailman/message/37699633/
The root of the problem for ASDF #140 issue is that CMUCL-descended pathname implementations expose a subtle, low-prevlence (and IMO pointless [*]) trap for users: string-valued pathname components, and string valued arguments to MAKE-PATHNAME in "native" syntax, not namestring syntax:
;; on Unix, where #\ is the escape character * (setq p (pathname "a\\b")) #P"a\\b" * (pathname-name p) "a\b" * (file-namestring p) "a\\b"
The "native" syntax is the one you must use as arguments to MAKE-PATHNAME, but the namestring syntax is the one you're implicitly using if you pass strings around as pathname designators, and that you explicitly receive from namestring functions.
IOW, juggling pathname components (like ENSURE-DIRECTORY-PATHNAME does) or concatenating strings to compose portions of file specifications is formally "unsound" around the edge cases unless you've considered the "provenance" or "intended syntax" of each string. In particular, using the result of FILE-NAMESTRING as an element of the directory list to MAKE-PATHNAME is unsound.
Note: although I'm saying that certain things are notionally unsound, in reality of course filenames having asterisk, question mark, left-bracket, or backslash are exceedingly rare on Unix. So the the problem I'm describing here doesn't occur very often in reality. On the other hand, the very rareness of such things probably makes it /more likely/ that folks have low-prevalence, unknown-severity bugs lurking in their programs. :-(
Anyhow, in the sbcl-devel message I've linked to above, I dissected a few further semantic issues in the rendition of this operation in SB-COVER; some of those issues might also be relevant to ENSURE-DIRECTORY-PATHNAME, too, I'm not sure.
Regards, Richard
[*] The underlying issue is that Unix has no wildcard syntax, and so no need for escape syntax either. So somebody at CMU circa 1990 had to decide how to represent
(pathname-name "a*") ;; wild, by custom (pathname-name "a\*") ;; the not-wild analogue of the preceding (pathname-name "a\\") ;; not wild, a logical result given the preceding
They could've just said "a pathname's component strings are subsequences of the namestring" and be done with it, but instead they decided
(pathname-name "a*") => #<PATTERN "a" :MULTI-CHAR-WILD> (pathname-name "a\*") => "a*" (pathname-name "a\\") => "a\"
I imagine that these representations were chosen to speed up manufacturing filenames for system calls: you can concatenate components' strings without examining those strings' elements.
If that was the rationale, that use could have been solved for differently: had they stored subsequences of namestrings, they could have stored 3 bits somewhere in the pathname indicating whether the name, type and directory contained strings that needed "examination" (i.e., either they contain wildcard characters or the escape character) when making a filename. This would have allowed for all reasonable filenames to be composed by simple concatenation, without exposing two almost-always-equivalent but formally subtly incompatible string syntaxes to users.
(Also, PATTERNs are pointless, space-wasting, inefficient, user-hostile nonsense. But that's a different story.)