On 9/5/05, Ian Clelland clelland@gmail.com wrote:
On 9/4/05, Vehbi Sinan Tunalioglu vehbisinan@gmail.com wrote:
Thanks Ian...
I could not find the #'esc documentation either... I'm trying to fix it, too.
I looked into it a bit more yesterday, there is no "esc" function per se; rather cl-who interprets a form like (esc "some string") as if it were an actual function call to (escape-char "some string")
It's the cl-who:escape-char function which is giving us these problems. I'm looking at possible ways of shadowing that function with our own version, or possibly just bypassing it and writing a utf-8 decoding function for cl-wiki.
After a lot of hassle dealing with cl-who, and after determining that the sbcl that comes with Debian Sarge does not, in fact, support unicode, and after thinking a lot about where to position a custom UTF-8 decoder, I have realised that the best solution to this problem is to simply not use the form (esc content). Instead, we can use (str (escape-for-html content)), and use exactly the same formatting rules in the editting textarea as we do for the standard page output.
------------------------------------------------------------ Quick patch file:
--- /home/ian/cl-wiki-0.0.3/wiki.lisp 2005-09-05 21:07:10.000000000 -0700 +++ wiki.lisp 2005-09-05 21:05:14.000000000 -0700 @@ -127,7 +127,7 @@ (:legend "Edit page") (:input :type "hidden" :name "action" :value "save") (:p - (:textarea :name "content" :rows "15" :cols "60" (esc content))) + (:textarea :name "content" :rows "15" :cols "60" (str (escape-for-html content)))) (:input :type "submit" :value "Save")))))) (execute-main-template page body :edit t)))
------------------------------------------------------------
I've tried this on our running site, and it loads and saves unicode documents with no problems.
Let me know if this works for you.
Regards, Ian Clelland clelland@gmail.com
Ian Clelland wrote:
On 9/5/05, Ian Clelland clelland@gmail.com wrote:
On 9/4/05, Vehbi Sinan Tunalioglu vehbisinan@gmail.com wrote:
Thanks Ian...
I could not find the #'esc documentation either... I'm trying to fix it, too.
I looked into it a bit more yesterday, there is no "esc" function per se; rather cl-who interprets a form like (esc "some string") as if it were an actual function call to (escape-char "some string")
It's the cl-who:escape-char function which is giving us these problems. I'm looking at possible ways of shadowing that function with our own version, or possibly just bypassing it and writing a utf-8 decoding function for cl-wiki.
After a lot of hassle dealing with cl-who, and after determining that the sbcl that comes with Debian Sarge does not, in fact, support unicode,
But the one that comes with Debian SID (unstable) does support Unicode, as far as I know. I had similar problems with older versions of SBCL on my Debian GNU/Linux PC at home and the solution was to upgrade to the unstable package (which also needed an upgrade from linux 2.4 to 2.6.11).
And I think CLISP supports it too. So is it a problem only related to the Lisp compiler used for cl-wiki?
and after thinking a lot about where to position a custom UTF-8 decoder, I have realised that the best solution to this problem is to simply not use the form (esc content). Instead, we can use (str (escape-for-html content)), and use exactly the same formatting rules in the editting textarea as we do for the standard page output.
... I've tried this on our running site, and it loads and saves unicode documents with no problems.
Let me know if this works for you.
I hope it works for us, too. (Sinan will probably apply and try and inform everybody today) :)
Istanbul Bilgi University's Lisp User's Group trusts you:
http://church.cs.bilgi.edu.tr/lcg
On 9/6/05, Emre Sevinç emres@bilgi.edu.tr wrote:
But the one that comes with Debian SID (unstable) does support Unicode, as far as I know. I had similar problems with older versions of SBCL on my Debian GNU/Linux PC at home and the solution was to upgrade to the unstable package (which also needed an upgrade from linux 2.4 to 2.6.11).
And I think CLISP supports it too. So is it a problem only related to the Lisp compiler used for cl-wiki?
Well, the wiki pages on disk are not loaded in any special way, so I believe that they default to 8-bit ascii mode. Because of this, the escape-string function of cl-who is inappropriate for handling utf-8-encoded data (it just encodes each byte as a separate character).
If there was full unicode support in my version of sbcl, we might be able to pass :external-format :utf-8 to #'with-open-file and get multi-byte support internally. If that were the case, then escape-string would probably work correctly, and would create the proper sgml entities for unicode characters.
I don't know if that technique is portable at all across the different implementations of common lisp.
The change that I made just makes cl-wiki ignore high-ascii characters, and only escape <, >, ", ', and &. Since HTTP is an 8-bit-safe protocol, there is no problem sending unicode characters directly.
(The only other change I made, which I forgot to mention earlier, is that I added the line <meta http-equiv="Content-type" content="text/html; charset=UTF-8" /> to the head section of my main template.)
I hope it works for us, too. (Sinan will probably apply and try and inform everybody today) :)
Istanbul Bilgi University's Lisp User's Group trusts you:
Thanks; I'll try not to let you down :)
Regards,
Ian Clelland clelland@gmail.com
It works!!!
No need for: <meta http-equiv="Content-type" content="text/html; charset=UTF-8" /> Since the http body content is already is marked as utf8 in header...
I just changed (esc content) to (str (escape-for-html content))
Thanks...
Ian Clelland wrote:
On 9/6/05, Emre Sevinç emres@bilgi.edu.tr wrote:
But the one that comes with Debian SID (unstable) does support Unicode, as far as I know. I had similar problems with older versions of SBCL on my Debian GNU/Linux PC at home and the solution was to upgrade to the unstable package (which also needed an upgrade from linux 2.4 to 2.6.11).
And I think CLISP supports it too. So is it a problem only related to the Lisp compiler used for cl-wiki?
Well, the wiki pages on disk are not loaded in any special way, so I believe that they default to 8-bit ascii mode. Because of this, the escape-string function of cl-who is inappropriate for handling utf-8-encoded data (it just encodes each byte as a separate character).
If there was full unicode support in my version of sbcl, we might be able to pass :external-format :utf-8 to #'with-open-file and get multi-byte support internally. If that were the case, then escape-string would probably work correctly, and would create the proper sgml entities for unicode characters.
I don't know if that technique is portable at all across the different implementations of common lisp.
The change that I made just makes cl-wiki ignore high-ascii characters, and only escape <, >, ", ', and &. Since HTTP is an 8-bit-safe protocol, there is no problem sending unicode characters directly.
(The only other change I made, which I forgot to mention earlier, is that I added the line
<meta http-equiv="Content-type" content="text/html; charset=UTF-8" /> to the head section of my main template.)
I hope it works for us, too. (Sinan will probably apply and try and inform everybody today) :)
Istanbul Bilgi University's Lisp User's Group trusts you:
Thanks; I'll try not to let you down :)
Regards,
Ian Clelland clelland@gmail.com _______________________________________________ cl-wiki-devel mailing list cl-wiki-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/cl-wiki-devel
On 2005-09-07 08:58:19, Vehbi Sinan Tunalioglu wrote:
I just changed (esc content) to (str (escape-for-html content))
Hmm. It seems this could be fixed by setting CL-WHO's *ESCAPE-CHAR-P* to an other test function which then lets CL-WHO's ESCAPE-STRING behave like CL-WIKI's ESCAPE-FOR-HTML.
I'll give it a try after work.
On 2005-09-07 14:39:37, Stefan Scholl wrote:
On 2005-09-07 08:58:19, Vehbi Sinan Tunalioglu wrote:
I just changed (esc content) to (str (escape-for-html content))
Hmm. It seems this could be fixed by setting CL-WHO's *ESCAPE-CHAR-P* to an other test function which then lets CL-WHO's ESCAPE-STRING behave like CL-WIKI's ESCAPE-FOR-HTML.
Yeah, right.
I'm setting it in INIT
*escape-char-p* #'(lambda (c) (find c "<>&"))
Now it's possible to use the symbol ESC with CL-WHO like intended.
On 9/7/05, Stefan Scholl sscholl@common-lisp.net wrote:
I'm setting it in INIT
*escape-char-p* #'(lambda (c) (find c "<>&"))
Now it's possible to use the symbol ESC with CL-WHO like intended.
So, at this point, is there any difference at all between #'escape-for-html in cl-wiki and #'escape-string in cl-who?
As far as I can tell, they do exactly the same thing (though in slightly different ways). #'escape-string is a bit more flexible, but is about 45% slower on my test machine. It seems a bit odd to have both of these functions available now, just to call one at display time and the other at edit time.
Regards, Ian Clelland clelland@gmail.com
On 2005-09-07 11:47:19, Ian Clelland wrote:
On 9/7/05, Stefan Scholl sscholl@common-lisp.net wrote:
I'm setting it in INIT
*escape-char-p* #'(lambda (c) (find c "<>&"))
Now it's possible to use the symbol ESC with CL-WHO like intended.
So, at this point, is there any difference at all between #'escape-for-html in cl-wiki and #'escape-string in cl-who?
ESCAPE-STRING gets called when the symbol ESC is found inside CL-WHO's WITH-HTML-OUTPUT-TO-STRING
As far as I can tell, they do exactly the same thing (though in slightly different ways). #'escape-string is a bit more flexible, but is about 45% slower on my test machine. It seems a bit odd to have both of these functions available now, just to call one at display time and the other at edit time.
ESC is the common style when using CL-WHO. I don't want it to look more complicated than necessary at the moment.
All the examples for CL-WHO use ESC.
Speed isn't that important at the moment. The page is displayed more often than edited. We can tweak this later when needed. Maybe with a patch for CL-WHO itself.
Have you tested both functions with a larger string? About 20 KiB or something? From the code I've seen ESCAPE-STRING seems to be optimized that way that it outputs larger chunks of code between the characters that needed escaping.
Regards, Stefan
On 9/8/05, Stefan Scholl sscholl@common-lisp.net wrote:
ESCAPE-STRING gets called when the symbol ESC is found inside CL-WHO's WITH-HTML-OUTPUT-TO-STRING
That's what I discovered while looking at the Unicode issue a couple of days ago. Does this mean that we don't need #'escape-for-html at all, and can just call #'escape-string in its place?
Assuming, of course, that you don't have bigger and better plans for #'escape-for-html in the future.
Have you tested both functions with a larger string? About 20 KiB or something? From the code I've seen ESCAPE-STRING seems to be optimized that way that it outputs larger chunks of code between the characters that needed escaping.
The code looks like it's optimised for large blocks of characters which don't need to be escaped, with just a few escaped characters here and there. Of course, since we're not using it to escape all chars > 127, that should describe most text that it gets called on.
I only tested it on a small string, ~100B. 1.5e6 iterations of #'escape-for-html ran in 282s, averaged over several runs, while #'esacape-string took 412s. Not the most scientific test, I'm sure, but I was just looking for big, order-of-magnitude differences.
On 2005-09-08 11:06:08, Ian Clelland wrote:
On 9/8/05, Stefan Scholl sscholl@common-lisp.net wrote:
ESCAPE-STRING gets called when the symbol ESC is found inside CL-WHO's WITH-HTML-OUTPUT-TO-STRING
That's what I discovered while looking at the Unicode issue a couple of days ago. Does this mean that we don't need #'escape-for-html at all, and can just call #'escape-string in its place?
Yes, we can. :-)
Assuming, of course, that you don't have bigger and better plans for #'escape-for-html in the future.
Not at the moment. ESCAPE-FOR-HTML came first -- before CL-WHO.
I've just tested ESCAPE-FOR-HTML and ESCAPE-STRING on my slow (Dual Pentium III, 650 MHz, 512 MiB RAM) system:
WIKI> (let ((data (contents-of-file "Testpage"))) (time (test-escape-string data 10000)))
Warning: TIME form in a non-null environment, forced to interpret. Compiling entire form will produce more accurate times.
; Evaluation took: ; 106.06 seconds of real time ; 102.52042 seconds of user run time ; 3.545461 seconds of system run time ; 68,755,459,972 CPU cycles ; [Run times include 5.84 seconds GC run time] ; 0 page faults and ; 820,724,440 bytes consed. ; NIL WIKI> (let ((data (contents-of-file "Testpage"))) (time (test-escape-for-html data 10000)))
Warning: TIME form in a non-null environment, forced to interpret. Compiling entire form will produce more accurate times.
; Evaluation took: ; 106.14 seconds of real time ; 100.6897 seconds of user run time ; 5.441173 seconds of system run time ; 68,810,304,097 CPU cycles ; [Run times include 7.21 seconds GC run time] ; 0 page faults and ; 1,140,486,136 bytes consed. ; NIL
(Test loop is inside the _compiled_ test functions.)
"Testpage" has 32 KiB and 4 #& to escape. In lines 18, 42, 103, and 368 of 433 lines. CMUCL 18e, Linux 2.6.
ESCAPE-STRING seems to be better with this kind of texts. It's a bit early to optimize a program, but not too early to decide if a function is obsolete when there's an alternative.
Regards, Stefan
PS: Removed ESCAPE-FOR-HTML in development repository.
On 9/9/05, Stefan Scholl sscholl@common-lisp.net wrote:
On 2005-09-08 11:06:08, Ian Clelland wrote:
On 9/8/05, Stefan Scholl sscholl@common-lisp.net wrote:
ESCAPE-STRING seems to be better with this kind of texts. It's a bit early to optimize a program, but not too early to decide if a function is obsolete when there's an alternative.
It's all still working well over here (that darcs repository is already coming in handy :)
I agree, it's never too early to remove redundant code and simplify the system.