On Tue, 29 Nov 2005 00:18:08 +0200, Ignas Mikalajunas ignas.mikalajunas@gmail.com wrote:
Content length is calculated by calling (length content) which produces wrong results with unicode characters in the string. Piso on #lisp proposed a solution - using (length (string-to-octets string :external-format :utf-8)) which translates to just (length (string-to-octets string :external-format)) in the code.
I won't do that because it's most likely a terrible performance hog if you convert each page to octets be default (assuming that most users already send octets).
Sorry i was not aware of that. If i understand you correctly the right way is converting all of my pages (they all are utf-8) to octets before sending them to tbnl?
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
Because the first one is cl-user:string-to-octets and the second one is tbnl:string-to-octets.
because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
What i meant was: (length "ąčęė") returns 4 though (lenght (string-to-octests "ąčęė")) is 8. Which means that tbnl would try to fit an 8 octet body with a content length of 4 and IE/Opera would display that as "ąč". That's how it works on SBCL.
Ignas