On Tue, 29 Nov 2005 13:24:44 +0200, Ignas Mikalajunas ignas.mikalajunas@gmail.com wrote:
Sorry i was not aware of that. If i understand you correctly the right way is converting all of my pages (they all are utf-8) to octets before sending them to tbnl?
Yep. Either that or (with the new version) figure out the octet length with other, less expensive means and setting it directly - if SBCL allows you to return a random Unicode string to TBNL.
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
Because the first one is cl-user:string-to-octets and the second one is tbnl:string-to-octets.
Ah, OK. Assuming you meant :UTF-8 instead of :EXTERNAL-FORMAT in the second form it's rather the other way around, though. A call to TBNL::STRING-TO-OCTETS will be translated to a call to the corresponding function in SB-EXT.
because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
What i meant was: (length "ąčęė") returns 4 though (lenght (string-to-octests "ąčęė")) is 8. Which means that tbnl would try to fit an 8 octet body with a content length of 4 and IE/Opera would display that as "ąč". That's how it works on SBCL.
But you said "each" character. UTF-8 is a variable-length encoding where one character can have any length from one to six octets. For example, if your characters are all within the ASCII charset you won't lose any octets at all.
Cheers, Edi.