[cxml-devel] inquiry and proposals - XML schema support

20 Aug 2006

      Hello,

I would like to inquire as for whether and how there may have been proposed a
means for supoprt of XML schemas in CXML.

Certainly, I can direct my inquiry towards a search across the the cxml-devel
list @ clnet. I had thought that a more direct inquiry would be appropriate.

* Approaching an implementation

I am aware as that CXML supports a representation of DTDs.

I intend to become more familiar with the code by which CXML provides support
for DTD representation. I consider that I will have to become more familiar
with it, in order to continue about some intended projects.

I have been pressing along with some projects to the Tioga project -- now
having the documentation done for the first, ableit trivial release to the
Tioga Auxiliary Library's tal-base system. Before releasing the documentation,
I have been trying to get some things designed-out, for supporting how that
documentation will be built -- across all the documentation to the project.

Shortly before beginning to write this message, I had not been sure if I would
be able to use CXML, immediately, in managing the system of documentatry items
to the Tioga project. It had appeared that it was going to result in a bunch
of DTD-hacking -- at some point, hopefully dovetailing with an application
of CXML.

Presently, I have recalled a matter, upon which I had wanted to raise the
above inquiry.

I do not mean to sound angered about SGML or XML. It is the stuff of a big
system, SGML and XML, whether in integration of the both, or disjunctly.

In all, XML is widely popular -- even at such as xml.house.gov for instance,
let alone in ebXML, WebDAV, and furthermoe -- and SGML has been an object to
quite some popularity and quite some work, in its time, as at  HyTime,
DSSSL, ISMID, and furthermore.

Upon the prospect of having an efficient, Common Lisp programming system that
supports XML -- namely, CXML -- one thing that I recall, now, that I can
apprecate of it: That I may be able to stop using the generally awkward (i.e
involving) DTD syntax, and not have to use the more awkard (i.e. involving,
and not in CL code) XSD sytnax, so in order to represent structure in
markup. We can use Lisp for it.

Upon my observing how the following appears -- it was an item, put like on a
restaurant napkin, within a resource-references document being written to the
Tioga project:

<!-- e.g.
	<seg><cvs-archive
	cvsroot=":pserver:common-lisp.net:/project/cxml/cvsroot"
        user="anonymous"
        password="anonymous"
        id="sccm.cvs.clnet.cxml"/></seg>

	<seg><archive-component name="cxml"><archive><xref linkend="sccm.cvs.clnet.cxml"/></archive></seg>

<!ELEMENT archive-component ((archive | xref)?)>
<!ATTLIST archive-component
   name CDATA REQUIRED>

-->

(That may not be a valid DTD-segment, there; I'm still a bit rusty on DTD
syntax. The segment would be applied as in extension onto the DocBook DTD.)

It was on my lookin at that text, at which I came to recall how much I would
rather use CL for it, even for the initialization of DTD information.

* Constraining the Markup Definition - DTD or XSD

For represenation of information apart from the markup-element names,
attribute names, and content models -- e.g. representation of attribute value
types more discrete than 'cdata' -- it may be done in a  format compatible to
a DTD, using <?processing instructions?> within the DTD markup. It may be more
approached towards a more more succint representation of the information,
however, if it would be appraoched as to be implemented onto XML schemas.

So, I thought I would inquire, as for whether and how XML schemas may have
been proposed to be supported in CXML.

If there is existing work about such, and if there are proposals toward how it
would be approached, I should be glad to direct my attention to it.

If there is no existing work about such, I would propose that I can try to
"take the matter on".

* Initial propoals for a design of XSD support onto CXML

** Shared Functionality - DTDs and XSDs, as documentary schemas

I would propose that the CXML DTD support code would be regarded, as for what
degree of shared functionality would be applicable of the code, as between DTD
support and XSD support.

** Operations in the Parser

In the operations of the XSD parser, there may be applied some operations for
a type-actuated value-translation mechanmism, e.g.

  UNMARSHAL-VALUE TYPE IN-SUBSTRATE  [generic function]
  MARSHAL-VALUE TYPE OUT-SUBSTRATE IN-SUBSTRATE    [generic function]

Regarding UNMARSHAL-VALUE, in the case of an XSD parser:

 -  the TYPE would be an object representative of a type -- a class
    metaobject, if not such as a CMUCL/SBCL CTYPE object (a CTYPE driven
    approach would be implementation-specific; I am not aware as for whether
    or how the code for it may be ported to other implementations. I do not
    know how any implementations beside CMUCL and SBCL would approach 'type
    handling' and 'type representation', at any level more finite than of
    a CLASS metaobject. Regardless of it it would be implementation-specific,
    It would be very convenient w.r.t type translation, to use CTYPE classes
    as specializers).

    The TYPE value would indicate the type of the object that must be
    initialized of the method. (Some extension onto MOP method specialization
    may be approached as for some  optimization -- as for to ensure that the
    method's return-value type, and the return-value type of the effective
    method resulting with an end at that method, would be denoited to the
    compiler as being of the same type as the TYPE argument; this optimization
    would require that only a class-typed specialization of methods would be
    supported on the generic function. I would propose this optimization as a
    "feasible, though not directly necessary" kind of 'milestone' step, as in a
    project 'roadmap'.)

  - the IN-SUBSTRATE value, I suspect it would be such as either:

     (1, w.r.t a SAX approach) a stream, with the cursor positioned somewhere
     about the input that would have resulted in a certain SAX event

     (2, w.r.t a DOM approach) an instance of a class in the DOM type system;
       an object that would be representative of an XML elment.

*** DOM or SAX? (Proposed: DOM)

A directly SAX-driven approach may be appropriate, there.

I consider that a DOM-driven approach may be the more appropriate. I consider
that it would be easier to make a DOM-driven approach, and easier to make that
approach, in paralllel to other systems that may use a DOM-driven mechanism,
on the same input information.

In the case of an  approach directly utilizing the SAX API, the input infoset
-- the in-substrate -- it would be a stream,  parsed-across, once (and it
would not be determinable until "late" if the  infoset would be valid and
well-formed). Then, the in-substrate object would still be available, as a
stream (whether or not that stream would support repositioning of the stream
cursor -- as a socket stream may not, and in "Linux space", would not).

In the case of a DOM-driven approach, the input infoset would be parsed into a
DOM representation -- ensured, then, as that it would be valid (if not
well-formed ??), if it may even be represented in a DOM node-tree. To the
UNMARSHAL-VALUE method, the IN-SUBSTRATE would then be a DOM object.

After processing in UNMARSHAL-VALUE, the DOM representation of the initial
infoset may then be disregarded, or may be retained for other uses; perhaps it
may be used as to  process the original DOM information, for representaton of
that same information onto a CLIM pane.

I would propose that an XSD parser in CXML would be addressed onto a
DOM-driven approach.

*** UNMARSHAL-VALUE

In the case of the UNMARSHAL-VALUE operation, the TYPE value would be
representative of something about the type (and element-name) of the "target
DOM node". The IN-SUBSTRATE would be something similar to what would have been
produced of MARSHAL-VALUE (e.g. a class, or a CTYPE). The OUT-SUBSTRATE would
be the DOM node supposed to contain the generated DOM node, i.e. the DOM node
that UNMARSHAL-VALUE would generate.

Barring a non-local return from the method, the process calling
UNMARSHAL-VALUE method would be responsible for "linking" the resulting DOM
node into the containing node. This would involve a step, outside of the
UNMARSHAL-VALUE method; it should result in  a gretaer modularization of the
code.

*** MARSHAL-VALUE

I would appraoch the implementaion of MARSHAL-VALUE as it being a later
'milestone'.

*** The calling process

What would call UNMARSHAL-VALUE, in unmarshaling of an XML schema object :
 another UNMARSHAL-VALUE method, specizlied on such that would represent the
 'root document' of an infoset, and a TYPE representative of a
 "container" for the XSD information, such as an XML-SCHEMA object.

What would call MARSHAL-VALUE, in the marshaling of CL information onto an XML
schema object :
 another MARSHAL-VALUE method, with the following arguments :
 - IN-SUBSTRATE  representing a a container of XSD information
 - OUT-SUBSTRATE representing either a DOM node, a stream, or a pathname; the
   DOM-area specialization would be the first I'd suggest to take
 - TYPE object being NULL; the type information for operations of methods
    resulting from the call should be determinable on the IN-SUBSTRATE.

*** regarding the TYPE argument in MARSHAL-VALUE

To retain the TYPE value as an argument to MARSHAL-VALUE, it would serve to
retain some consistency if the method would be specialized onto CFFI system
objects.

**** Exmple onto CFFI

A CL INTEGER value may be represented as a value written into a buffer of any
given width at or beyond the integer-length of the value -- with allowance for
the encoding of a negative value. To export an INTEGER onto a raw, malloc'd
memory block,  it would be necessary to specify a numeric type  for the
output, so in order to ensure that the value could be correctly
marshaled (i.e encoded onto the external substrate).

**** Example onto CXML

An XML schema may include any number of type definitions, and may depend on
any number of type definitions from another schema.

In example: A US citizen's SSN may be represented as a nine-integer
value. That nine-integer value may be represented -- typically -- in a
conventional CL environment, using a (VECTOR (UNSIGNED-BYTE 4) 9).

Typically, in an XML schema, that nine-integer value may be represented as it
being an object of type 'SSN'.

An object may be initialized so as to represent an XSD-defined type 'SSN'.
Given a mechanism for it, that SSN type may be mapped, explicitly, onto the
type (VECTOR (UNSIGNED-BYTE 4) 9). Such a mechanism may be implemented
directly onto an XML schema, but would have to be operable without requiring
modification  of an XML schema.

Perhaps the XSD-to-CL type-translation mechanism would be sufficiently
operable, when operating in an automated manner. Some specialization might
still be appropriate.

** XML-SCHEMA, slot TYPES ; class XML-SCHEMA-TYPE

To represent a schema's contained body of type definitions within
a single unit, it may be approached with an index initialized into an
XML-SCHEMA instance, that index containing of a set of values all of the same
type -- perhaps, of a class XML-SCHEMA-TYPE.

Given an appropriate mechamism for it, such an index may be defined directly
onto a VECTOR typed object, with approriate key functions being cached in the
thing.

** Class XML-SCHEMA-TYPE

an XML-SCHEMA-TYPE object would contain information representative of:
  1) the schema-local representation of the value -- supportive of
     marshaling of the XML-SCHEMA-TYPE onto an XSD document
  2) the CL-local representation of the value -- supportive of unmarshaling of
     an XML node in a document using the associated schema.

** Questions

1) How would a document and a schema be associated, in the CL environment?

Each document is mapped to zero or one DTDs.

Each *element* in a document *may* be mapped to one schema (NB: I'm not sure
if that's officially of the XSD spec, but it would be feasible. It would
require some sort of a conventional approach for the identification of the
schema that would be intended as to be assigned to a ndoe; one could propose
that an xml:schema attribute might be proposed, for it. One could use any
namespace, in the development of the proposal. One should have to require that
the element containin gthe foo:schema attribute would be valid on the
identified schema; something would have to be done, in regards to namespaces,
to ensure that the element would also be valid within the schema for the node
containing the element.)

(You know, I've tried to use Trac for project whiteboarding. I was thrown at
the syntax used in the Trac wiki pages; it doesn't appear to support HTML
markup, either, in the Trac wiki pages. I stopped short of requesting that the
CLNet maintainers would consider providing a Cliki instance for each project;
perhaps they would consider it to be a viable proposal, but I have not wanted
to increase their workload, in any.)(

I think there's an XML processing instruction, specifically applicable for
mapping an XML document to an XSD schema -- similar to the DTD declaration,
though using a different syntax, like an <?xfoo whatfoo="URI"?> PI.

** Possible adaptation on a ??? class -- slot SCHEM

a ??? class would have to be modified so as to contain a slot SCHEMA.

slot value type for the SCHEMA slot : (OR NULL XML-SCHEMA)
initial form: NIL

** Possible Adaptation on a DOM Parser

To handle an XMl schema on a document -- to associate an XML schema with an
object representing the document, or something within the document -- A DOM
parser would have towatch for that <?xfoo?> XML-schema PI.

One  could specialize a method about XML processing-instruction DOM objects;
one could check the 'name' processing instruction, then dispatch on when
name would match the name of the <?xfoo?> XML-schema PI.

At that point, the XML schema would have to be already initialized -- then
retrieved -- or would have to be initialized, newly, the whatfoo="URI"
identified schema would be available.

*** Exceptional Situation

If the whatfoo="URI" identified schema would *not* be available, a condition
should be signaled for it -- just a type of CONDITION.

The document must still be parsed, though it could not be validated.

** Identification of XML schema objects - XML-SCHEMA, slot URI

 # related items : PURI; object indexing; CXML XML catalogues API

An XML-SCHEMA object may be identified according to a URI.

The mechanism for that identification -- for indexing an XML-SCHEMA object by
its identity, and retriving an object by its identity (or triggering the
initializationof a new object, then) -- that should be approached in
integration with the XML catalogues mechanism.

* Onto Implementation

I'd like to approach this as into a prototype in the TAL codebaes to the Tioga
project. I would propose to approach it as so, in order to facilitate:

1) that I would use the TAL-base system with it, of which I am familiar
2) that I would be able to implement it, without requiring any more work to
   the project administrators on the CXML project
3) that it would be integral with the documentary system proposed (*cough*)
   and being developed onto the Tioga projects

Of the said documentary system, the design of it is the last hangup before I
may make a first release of the TAL codebase. Without a sufficient mechanism
for processing the documentation and presenting it in HTML form, it would be
in a release incomplete.

* Onto Conclusion

At that point, I am reminded of why I had wanted to address the initial
inquiry, regarding supoprt for XML schemas in CXML. I want to extend the
DocBook DTD; I intend to make the extension, firstly, with Common Lisp -- then
generating any DTD contents, in the end.

If the code I propose to implement of it would require an application of code
that I have not released, I must take the implemenation as it  being,
effectively, nonoperable.

To hold-up a release in concern about the documentation, then upon this item,
I cannot. To approach this item, it would serve to support development of the
system by which the documentation would be handled.

I will have to take the text, above, as it constituting a plain/text edition,
as a first draft towards some items of reference documentation.

To approach it into implementation, I will have to have migrated some
measure more of my own code into the TAL archives -- code that I am familiar
with, and which I have tested and know the intended efficiency of, besides
that it may require some minor refactoring and cleanup, before the source will
be checked into the project's source archive.

Upon doing so, I must document each item, as in now, as while my attention is
directed to it. I can glue the reference documents together, later -- will be
producing one reference document per each distinct system defnition. Each
system definition may be associated  with a roadmap about milestones proposed
for the implementation of the system, furthermore. Then, there will be a body
of resource reference pages, a body of glossary entry pages, a body of
bibliographic entries, and I may extract the code-item refentry pages into
individual files.

I should address the above proposal about an XML-schema system as it being the
body a system that would be named tdoc-xml-xsd.

I will have to make an index-reference pointing to the documentation that
would be made about it.

I can send across what will be the Arch archive identifier for it, and what
information  would be necessary for making sure that one can access the
archives (there's a point in regards to GPG signing on the archives, reuiring
that a certain a script will be available in one's Arch configuration, with
some range as for how that would be approached -- side-barred with a point
about something called  'agpg' and the 'quintuple-agent' system containing
it. Then, there's a point in regards to available interfaces on Arch --
e.g. xtla, xetla -- and a sidebar about key management, e.g. via Seahorse, and
there's the clnet keyring, such that one should want to have imported into
one's GPG configuration, in order to access the archives.)

That documentation should include explanation of each of those points,
about how one would access and use the archives to the Tioga project -- using
Arch, using some stuff in regards to GPG signing on the archives, and using
probably XTLA or XETLA as an interface.  Once those tools are installed and
configured, then they are readily usable.

I will transpose the above design proposals into some DocBook SGML. The
documents can be convered to XML, with no lossage in the conversion -- no
SGML-specific features being used in the documents. As being SGML, the
documents may be processed without issue, using the DSSSL stylesheets for
DocBook.

After the material, denoted above, will be finally transposed into SGML, then
I should want to put the material onto a side shelf; I should then move-in
the code that I would intend to use for it, finally checkin-in that code into
the TAL codebase. That would be a more viable approach than if I was to use
source code from archives that I have not  published, in the XSD
implementation.

After that TAL-area material will be imported into the TAL  archives, then I
may side-shelve that mateiral and return my attention to the XSD work.

At some point, I will have to make a roadmap-milestone for testing the work onto
the W3C's test code.

I may apologize if I have not  edited the above to enough of a state of
"doneness". I consider that it may be to the interest of the CXML project, if
I would mention the design proposed for it, openly.

I regard the proposal, above, as it being material like onto a whiteboard. It
is material about to be drafted into a working document. I consider that I
have figured out some of how I may approach the matter. I should welcome
response about the proposals.

If there is work already proposed about XSD support in CXML, I hope to denote:
I do not intend that this would seem as if it had run over that work. I would
be glad to hear of it, in consideration towards the design of an approach for
it.

In regards to how the tdoc-xml-xsd system would be made most finally
available, I propose to develop it in the TDoc codebase. In so doing, I will
be able to develop it, using the tools that I will be using onto other Tioga
projects. It could be made available from within that codebase, as a
stand-alone distribution -- asdf-installable, if not debian-packaged. 

Upon all consideration of the licensing terms about the item -- Franz LLGPL,
namely -- It may be mirrored from within that codebase, furthermore.

I've an errand to follow-up about, before I will continue with the above.

Good evening

--
Sean Champ