Content length is calculated by calling (length content) which produces wrong results with unicode characters in the string. Piso on #lisp proposed a solution - using (length (string-to-octets string :external-format :utf-8)) which translates to just (length (string-to-octets string :external-format)) in the code. The true way to solve this would be using (file-string-length), but the function is not working properly on sbcl yet. So could you please fix the (send-output), because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it. Ignas
On Tue, 29 Nov 2005 00:18:08 +0200, Ignas Mikalajunas ignas.mikalajunas@gmail.com wrote:
Content length is calculated by calling (length content) which produces wrong results with unicode characters in the string. Piso on #lisp proposed a solution - using (length (string-to-octets string :external-format :utf-8)) which translates to just (length (string-to-octets string :external-format)) in the code.
I won't do that because it's most likely a terrible performance hog if you convert each page to octets be default (assuming that most users already send octets).
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
The true way to solve this would be using (file-string-length), but the function is not working properly on sbcl yet.
Huh? How is that supposed to work (even if it would work on SBCL)? *TBNL-STREAM* is a binary stream which accepts octets, isn't it?
So could you please fix the (send-output),
IMHO there's nothing to "fix" because TBNL works as expected. The docs clearly say that you're supposed to send octets, see for example here:
Note that the UTF-8 example that comes with TBNL sends a correct header.
FWIW, I've just released a new version where you can manually set the CONTENT-LENGTH slot of the REPLY object. If it is not NIL TBNL won't bother to compute the content length so you can set it to any value you want. Note, though, that you'll run into trouble w.r.t. TBNL/Apache interaction if you set a wrong value there.
because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
Cheers, Edi.
On Tue, 29 Nov 2005 00:18:08 +0200, Ignas Mikalajunas ignas.mikalajunas@gmail.com wrote:
Content length is calculated by calling (length content) which produces wrong results with unicode characters in the string. Piso on #lisp proposed a solution - using (length (string-to-octets string :external-format :utf-8)) which translates to just (length (string-to-octets string :external-format)) in the code.
I won't do that because it's most likely a terrible performance hog if you convert each page to octets be default (assuming that most users already send octets).
Sorry i was not aware of that. If i understand you correctly the right way is converting all of my pages (they all are utf-8) to octets before sending them to tbnl?
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
Because the first one is cl-user:string-to-octets and the second one is tbnl:string-to-octets.
because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
What i meant was: (length "ąčęė") returns 4 though (lenght (string-to-octests "ąčęė")) is 8. Which means that tbnl would try to fit an 8 octet body with a content length of 4 and IE/Opera would display that as "ąč". That's how it works on SBCL.
Ignas
On Tue, 29 Nov 2005 13:24:44 +0200, Ignas Mikalajunas ignas.mikalajunas@gmail.com wrote:
Sorry i was not aware of that. If i understand you correctly the right way is converting all of my pages (they all are utf-8) to octets before sending them to tbnl?
Yep. Either that or (with the new version) figure out the octet length with other, less expensive means and setting it directly - if SBCL allows you to return a random Unicode string to TBNL.
I also don't understand why
(length (string-to-octets string :external-format :utf-8))
translates to
(length (string-to-octets string :external-format))
Because the first one is cl-user:string-to-octets and the second one is tbnl:string-to-octets.
Ah, OK. Assuming you meant :UTF-8 instead of :EXTERNAL-FORMAT in the second form it's rather the other way around, though. A call to TBNL::STRING-TO-OCTETS will be translated to a call to the corresponding function in SB-EXT.
because with current setup browsers that strictly adhere to the content-lenght (IE 6.0, Opera) would trim 1 character of the responses body for each UTF-8 character in it.
Nope, that's not how UTF-8 works.
What i meant was: (length "ąčęė") returns 4 though (lenght (string-to-octests "ąčęė")) is 8. Which means that tbnl would try to fit an 8 octet body with a content length of 4 and IE/Opera would display that as "ąč". That's how it works on SBCL.
But you said "each" character. UTF-8 is a variable-length encoding where one character can have any length from one to six octets. For example, if your characters are all within the ASCII charset you won't lose any octets at all.
Cheers, Edi.