Hi,
I'm using cxml as of 2006-01-05 and encounter problems with documents which refer to a local DTD. The important constraint here is that I'm working on a system on which all catalogs are old-style SGML catalogs and using XML catalogs is out. Before rolling my own entity resolver or working out a general solution using cxml:make-extid, I simply wanted to test SYSTEM ids. However, this doesn't work out as expected. I get an error from xstream-open-extid, which seems to get passed an absolute path instead of the relative path given in the doctype declaration. I.e.,
<!DOCTYPE review SYSTEM "review.dtd" >
results in
2: (XSTREAM-OPEN-EXTID #S(EXTID :PUBLIC NIL :SYSTEM #<URI file://+/review.dtd>)) Locals: SB-DEBUG::ARG-0 = #S(EXTID :PUBLIC NIL :SYSTEM #<URI file://+/review.dt>)
which is afterwards translated to /review.dtd.
I did some fairly low-level digging through the code, and think that the error is related to the following piece of code in p/doctype-decl:
(when extid (let* ((effective-extid (extid-using-catalog (absolute-extid input extid)))
I don't understand why the effective-extid is computed based on an absolute-extid -- I would have expected some test whether the path is absolute or relative. However, maybe I misinterpret something, and the comment below is related to the problem:
(defun absolute-uri (sysid source-stream) (let ((base-sysid (zstream-base-sysid source-stream))) ;; XXX is the IF correct? (if base-sysid (puri:merge-uris sysid base-sysid) sysid)))
I have an additional theory here: base-sysid is "file://+/", and indeed inspecting source-stream shows that the URI associated with it is indeed just
The object is a STRUCTURE-OBJECT of type STREAM-NAME. [type: STREAM-NAME] -------------------- ENTITY-NAME: "main document" ENTITY-KIND: :MAIN URI: #<URI file://+>
This also doesn't seem to make to much sense to me either.
However, I'm just debugging cluelessly around, so ultimately, I'm just asking for help. :-)
With kind regards,
Holger
Quoting Holger.Schauer@gmx.de (Holger.Schauer@gmx.de):
I don't understand why the effective-extid is computed based on an absolute-extid -- I would have expected some test whether the path is absolute or relative. However, maybe I misinterpret something, and the comment below is related to the problem:
The bug is, I think, that a relative base URI made it into the zstream at all. CXML is meant to deal with absolute base URIs internally, otherwise the merging of URIs isn't going to work out.
So I have just committed the following patch that merges *d-p-d* into pathnames passed to CXML before turning them into URIs. (Please test!)
Thanks for the report, David
--- /project/cxml/cvsroot/cxml/xml/xml-parse.lisp 2006/01/23 21:45:48 1.59 +++ /project/cxml/cvsroot/cxml/xml/xml-parse.lisp 2006/03/20 12:42:26 1.60 @@ -2978,7 +2978,7 @@ (make-stream-name :entity-name "main document" :entity-kind :main - :uri (pathname-to-uri filename))) + :uri (pathname-to-uri (merge-pathnames filename)))) (apply #'parse-xstream input handler args)))
(defun resolve-synonym-stream (stream) @@ -2991,7 +2991,7 @@ ;; ignore-errors, because sb-bsd-sockets creates instances of ;; FILE-STREAMs that aren't (ignore-errors (pathname stream))) - (pathname-to-uri (pathname stream)) + (pathname-to-uri (merge-pathnames (pathname stream))) nil))
(defun parse-stream (stream handler &rest args)
On Mar 20, 2006, at 2:00 PM, David Lichteblau wrote:
Quoting Holger.Schauer@gmx.de (Holger.Schauer@gmx.de):
I don't understand why the effective-extid is computed based on an absolute-extid -- I would have expected some test whether the path is absolute or relative. However, maybe I misinterpret something, and the comment below is related to the problem:
The bug is, I think, that a relative base URI made it into the zstream at all. CXML is meant to deal with absolute base URIs internally, otherwise the merging of URIs isn't going to work out.
So I have just committed the following patch that merges *d-p-d* into pathnames passed to CXML before turning them into URIs. (Please test!)
Just to be nitpicking. Merging *d-p-d* (which I assume to be *DEFAULT-PATHNAME-DEFAULTS*) isn't guaranteed to work. *d-p-d* can be a relative pathname. Wouldn't that cause the same problem?
Cheers
Marco
Thanks for the report, David
--- /project/cxml/cvsroot/cxml/xml/xml-parse.lisp 2006/01/23 21:45:48 1.59 +++ /project/cxml/cvsroot/cxml/xml/xml-parse.lisp 2006/03/20 12:42:26 1.60 @@ -2978,7 +2978,7 @@ (make-stream-name :entity-name "main document" :entity-kind :main
:uri (pathname-to-uri filename)))
(apply #'parse-xstream input handler args))):uri (pathname-to-uri (merge-pathnames filename))))
(defun resolve-synonym-stream (stream) @@ -2991,7 +2991,7 @@ ;; ignore-errors, because sb-bsd-sockets creates instances of ;; FILE-STREAMs that aren't (ignore-errors (pathname stream)))
(pathname-to-uri (pathname stream))
(pathname-to-uri (merge-pathnames (pathname stream))) nil))
(defun parse-stream (stream handler &rest args)
-- Marco Antoniotti http://bioinformatics.nyu.edu/~marcoxa NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 715 Broadway 10th FL fax. +1 - 212 - 998 3484 New York, NY, 10003, U.S.A.
Quoting Marco Antoniotti (marcoxa@cs.nyu.edu):
Just to be nitpicking. Merging *d-p-d* (which I assume to be *DEFAULT-PATHNAME-DEFAULTS*) isn't guaranteed to work. *d-p-d* can be a relative pathname. Wouldn't that cause the same problem?
Well, ISTR that CMUCL starts with an empty *default-pathname-defaults*, so I recognize that CMUCL users might indeed still have this problem even with the patch.
And since the spec is terribly vague on this issue (like most issues relating to pathnames), the implementation choice made by CMUCL maintainers is technically correct.
However, - the spec does say that *default-pathname-defaults* is "typically in the working directory that was current when Common Lisp was started up". - it's the only approximation of the concept of a "working directory" I am aware of in portable Common Lisp and many Lisps implement it like that. - if the implementation chooses not to set it to the working directory, users can easily fix that by setting or binding it themselves. (Or, of course, by passing an absolute pathname to cxml:parse-file in the first place...)
So using it seems like a reasonable compromise to me, and improves the situation on some Lisps while not making it worse on the others.
If there is a better way to fix this I'm open to suggestions.
d.
On Mar 20, 2006, at 5:29 PM, David Lichteblau wrote:
Quoting Marco Antoniotti (marcoxa@cs.nyu.edu):
Just to be nitpicking. Merging *d-p-d* (which I assume to be *DEFAULT-PATHNAME-DEFAULTS*) isn't guaranteed to work. *d-p-d* can be a relative pathname. Wouldn't that cause the same problem?
Well, ISTR that CMUCL starts with an empty *default-pathname-defaults*, so I recognize that CMUCL users might indeed still have this problem even with the patch.
And since the spec is terribly vague on this issue (like most issues relating to pathnames), the implementation choice made by CMUCL maintainers is technically correct.
However,
- the spec does say that *default-pathname-defaults* is "typically in the working directory that was current when Common Lisp was
started up".
- it's the only approximation of the concept of a "working directory" I am aware of in portable Common Lisp and many Lisps implement it like that.
- if the implementation chooses not to set it to the working directory, users can easily fix that by setting or binding it themselves. (Or, of course, by passing an absolute pathname to cxml:parse-file in the first place...)
So using it seems like a reasonable compromise to me, and improves the situation on some Lisps while not making it worse on the others.
I do think that it still has problems. Think of
(let ((*default-pathname-defaults* (make-fiendshly-relative-pathname))) ;; run CXML code. )
It is true that *d-p-d* is usually what you think, but since it is a variable that can be manipulated, it is not safe to assume any value for it.
I would either introduce a CXML variable, or use USER-HOMEDIR-PATHNAME, which is guaranteed to be "constant".
In alternative, I would use some package that masks the implementation vagaries and gives you the notion of "working directory". CLOCC port is one of them (or, shameless plug, cl-environment)
Cheers
Marco -- Marco Antoniotti http://bioinformatics.nyu.edu/~marcoxa NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 715 Broadway 10th FL fax. +1 - 212 - 998 3484 New York, NY, 10003, U.S.A.
David Lichteblau wrote:
Quoting Holger.Schauer@gmx.de (Holger.Schauer@gmx.de): So I have just committed the following patch that merges *d-p-d* into pathnames passed to CXML before turning them into URIs. (Please test!)
It's failing in make-xstream from xstream-parse-extid with an error message which doesn't make much sense to me. Somehow, sbcl thinks that the stream it's operating on (when accessing the dtd) has somehow become a directory. When I inspect the pathname, everything looks fine, e.g., the pathname components look alright. However, I do have funny looking characters all over the place, so maybe the problem is Unicode/runes- related. FWIW, I'm using sbcl 0.9.10, with :sb-unicode in the *features*.
Holger
Quoting Holger Schauer (Holger.Schauer@gmx.de):
It's failing in make-xstream from xstream-parse-extid with an error message which doesn't make much sense to me. Somehow, sbcl thinks that the stream it's operating on (when accessing the dtd) has somehow become a directory. When I inspect the pathname, everything looks fine, e.g., the pathname components look alright. However, I do have funny looking characters all over the place, so maybe the problem is Unicode/runes- related. FWIW, I'm using sbcl 0.9.10, with :sb-unicode in the *features*.
Funny looking characters could indicate a compatibility problem involving puri.
Can you please try the same test using cxml from CVS and the recently released puri 1.4?
(If that still fails, a stacktrace would be helpful.)
d.
David Lichteblau david@lichteblau.com wrote:
When I inspect the pathname, everything looks fine, e.g., the pathname components look alright. However, I do have funny looking characters all over the place, so maybe the problem is Unicode/runes- related.
Funny looking characters could indicate a compatibility problem involving puri.
Can you please try the same test using cxml from CVS and the recently released puri 1.4?
Yepp, did just that and it's working now. Thanks David! Installing puri 1.4 solved the remaining problems. So, on to the next problem ... making SGML catalogs work. :-)
Holger