Hi,
I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
<test att="one|two">content | something more | and done</test>
I need to transform to
<test att="one|two">content <bar/> something more <bar/> and done</test>
Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
=== (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (call-next-method handler (cl-ppcre:regex-replace "\|" data "<bar/>"))) ===
But of course, it produces a string (escaped) not a tag in the final XML.
WML> (cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Any idea or directions?
Best,
---- Alexandre Rademaker http://arademaker.github.com
Howdy,
You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.
EG: (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (let ((chunks (cl-ppcre:split "\|" data))) (if (= 1 (length chunks)) (call-next-method) (loop for c in chunks for first? = t then nil do (unless first? (sax:start-element handler nil nil "bar" nil) (sax:end-element handler nil nil "bar")) (sax:characters handler c)))))
(cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) => "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Hope this helps, Russ Tyndall Acceleration.net
On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
Hi,
I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
<test att="one|two">content | something more | and done</test>
I need to transform to
<test att="one|two">content <bar/> something more <bar/> and done</test>
Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
=== (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (call-next-method handler (cl-ppcre:regex-replace "\|" data "<bar/>"))) ===
But of course, it produces a string (escaped) not a tag in the final XML.
WML> (cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Any idea or directions?
Best,
Alexandre Rademaker http://arademaker.github.com
Cxml-devel mailing list Cxml-devel@common-lisp.net http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
Thank you very much Russ! It works as expected! I have one last question. Running the parser with the command:
(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output) (let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out)))) (cxml:parse #P"harem.xml" h :validate t)))
where the file harem.xml begins with (see the doctype):
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE colHAREM SYSTEM "harem.dtd"> <colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010"> <DOC DOCID="H2-dftre765"> <p>...
the command produces in the teste.xml output file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE colHAREM SYSTEM "harem.dtd"<!ELEMENT EM #PCDATA> <!ATTLIST EM ID CDATA #REQUIRED> <!ATTLIST EM CATEG CDATA #IMPLIED> <!ATTLIST EM TIPO CDATA #IMPLIED> <!ATTLIST EM COMENT CDATA #IMPLIED> <!ATTLIST EM SUBTIPO CDATA #IMPLIED> <!ELEMENT ALT (#PCDATA|EM)*> <!ELEMENT OMITIDO (#PCDATA|EM|ALT|p)*> <!ELEMENT colHAREM (DOC)*> <!ATTLIST colHAREM versao CDATA #REQUIRED> <!ELEMENT p (#PCDATA|EM|OMITIDO|ALT)*> <!ATTLIST p xml:space (default|preserve) "default"> <!ELEMENT DOC (#PCDATA|p|OMITIDO)*> <!ATTLIST DOC DOCID CDATA #REQUIRED>
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010"> ...
That is, the handler writes the DTD inside the output but in the wrong way, without the [ ]. Is it a bug in the library or in my code?
Thank you very much for this additional help!
Best,
---- Alexandre Rademaker http://arademaker.github.com
On Nov 3, 2014, at 1:35 PM, Russ Tyndall russ@acceleration.net wrote:
Howdy,
You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.
EG: (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (let ((chunks (cl-ppcre:split "\|" data))) (if (= 1 (length chunks)) (call-next-method) (loop for c in chunks for first? = t then nil do (unless first? (sax:start-element handler nil nil "bar" nil) (sax:end-element handler nil nil "bar")) (sax:characters handler c)))))
(cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) => "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Hope this helps, Russ Tyndall Acceleration.net
On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
Hi,
I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
<test att="one|two">content | something more | and done</test>
I need to transform to
<test att="one|two">content <bar/> something more <bar/> and done</test>
Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
=== (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (call-next-method handler (cl-ppcre:regex-replace "\|" data "<bar/>"))) ===
But of course, it produces a string (escaped) not a tag in the final XML.
WML> (cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Any idea or directions?
Best,
Alexandre Rademaker http://arademaker.github.com
Cxml-devel mailing list Cxml-devel@common-lisp.net http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
_______________________________________________ Cxml-devel mailing list Cxml-devel@common-lisp.net http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
To clarify my last question, I know that I can use
(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output) (flet ((resolver (pubid sysid) (declare (ignore pubid sysid)) (flexi-streams:make-in-memory-input-stream nil))) (let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out)))) (cxml:parse #P"CDSegundoHAREMclassico.xml" h :validate nil :entity-resolver #'resolver))))
to skip loading the DTD, but it would force me to also skip the validation of the input! It would be nicer to control the output of the declarations and DOCTYPE definition. Anyway, the code in my last message is producing an invalid XML.
Another idea would be to use a DOM as the output of the proxy handler and serialize it with map-document avoiding the inclusion of the doctype declarations:
(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output) (let* ((h (make-instance 'preproc :chained-handler (cxml-dom:make-dom-builder))) (dom (cxml:parse #P"CDSegundoHAREMclassico.xml" h :validate t))) (dom:map-document out dom :include-doctype nil)))
But this code produces a lot of warnings like the one below without writing anything in the output.
WARNING: deprecated SAX default method used by a handler that is not a subclass of SAX:ABSTRACT-HANDLER or HAX:ABSTRACT-HANDLER
Best,
---- Alexandre Rademaker http://arademaker.github.com
On Nov 3, 2014, at 2:33 PM, Alexandre Rademaker arademaker@gmail.com wrote:
Thank you very much Russ! It works as expected! I have one last question. Running the parser with the command:
(with-open-file (out #P"teste.xml" :if-exists :supersede :direction :output) (let ((h (make-instance 'preproc :chained-handler (cxml:make-character-stream-sink out)))) (cxml:parse #P"harem.xml" h :validate t)))
where the file harem.xml begins with (see the doctype):
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE colHAREM SYSTEM "harem.dtd"> <colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010"> <DOC DOCID="H2-dftre765"> <p>...
the command produces in the teste.xml output file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE colHAREM SYSTEM "harem.dtd"<!ELEMENT EM #PCDATA> <!ATTLIST EM ID CDATA #REQUIRED> <!ATTLIST EM CATEG CDATA #IMPLIED> <!ATTLIST EM TIPO CDATA #IMPLIED> <!ATTLIST EM COMENT CDATA #IMPLIED> <!ATTLIST EM SUBTIPO CDATA #IMPLIED> <!ELEMENT ALT (#PCDATA|EM)*> <!ELEMENT OMITIDO (#PCDATA|EM|ALT|p)*> <!ELEMENT colHAREM (DOC)*> <!ATTLIST colHAREM versao CDATA #REQUIRED> <!ELEMENT p (#PCDATA|EM|OMITIDO|ALT)*> <!ATTLIST p xml:space (default|preserve) "default"> <!ELEMENT DOC (#PCDATA|p|OMITIDO)*> <!ATTLIST DOC DOCID CDATA #REQUIRED>
<colHAREM versao="Segundo_dourada_com_relacoes_14Abril2010"> ...
That is, the handler writes the DTD inside the output but in the wrong way, without the [ ]. Is it a bug in the library or in my code?
Thank you very much for this additional help!
Best,
---- Alexandre Rademaker http://arademaker.github.com
On Nov 3, 2014, at 1:35 PM, Russ Tyndall russ@acceleration.net wrote:
Howdy,
You will need to issue sax:start-element and sax:end-element calls instead of doing a string replace.Essentially you will replace the single sax:characters call with a series of characters / elements calls.
EG: (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (let ((chunks (cl-ppcre:split "\|" data))) (if (= 1 (length chunks)) (call-next-method) (loop for c in chunks for first? = t then nil do (unless first? (sax:start-element handler nil nil "bar" nil) (sax:end-element handler nil nil "bar")) (sax:characters handler c)))))
(cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) => "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Hope this helps, Russ Tyndall Acceleration.net
On 11/03/2014 07:47 AM, Alexandre Rademaker wrote:
Hi,
I need to transform all characters | to tags <bar/> in all texts blocks of a big XML file. That is, whenever I found
<test att="one|two">content | something more | and done</test>
I need to transform to
<test att="one|two">content <bar/> something more <bar/> and done</test>
Note that | can also occur in attributes values and, in that case, they must be keeped unchanged. Reading the slide http://common-lisp.net/project/cxml/saxoverview/pages/11.html I wrote
=== (defclass preproc (cxml:sax-proxy) ())
(defmethod sax:characters ((handler preproc) data) (call-next-method handler (cl-ppcre:regex-replace "\|" data "<bar/>"))) ===
But of course, it produces a string (escaped) not a tag in the final XML.
WML> (cxml:parse "<test>content | ola</test>" (make-instance 'preproc :chained-handler (cxml:make-string-sink))) "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <test>content <bar/> ola</test>"
Any idea or directions?
Best,
Alexandre Rademaker http://arademaker.github.com
Cxml-devel mailing list Cxml-devel@common-lisp.net http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel
_______________________________________________ Cxml-devel mailing list Cxml-devel@common-lisp.net http://mailman.common-lisp.net/cgi-bin/mailman/listinfo/cxml-devel