Hello folks,
I have bumped into the following error while playing with Hunchentoot. (It is originated from url-decoding GET parameters with *hunchentoot-default-external-format*.)
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) => "??"
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) -> signals: This sequence can't be decoded using UTF-8 as it is too short. 1 octet missing at then end.
The reason is rather "simple": the decoder invokes the following chain of calls: compute-number-of-chars -> check-end -> signal-encoding-error
This contrasts to the most of decoder code, which directly calls recover-from-encoding-error instead of signal-encoding-error. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru
Sorry for the delay. I think this is more or less "on purpose." (It's been a while since I wrote that stuff...)
The recover-from-encoding-error helper function is used when during decoding we encounter something which "looks like" a character (so to say) but isn't one - in which case we can e.g. replace it with the substitution character.
I think the error you mention happens earlier - when the length is checked.
Of course, one could argue that one could just as well use the same restart here. Maybe you can just submit a patch (including documentation if needed and ideally with new tests) and convince Hans to make a new release?
Thanks, Edi.
On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov divanov11@gmail.com wrote:
Hello folks,
I have bumped into the following error while playing with Hunchentoot. (It is originated from url-decoding GET parameters with *hunchentoot-default-external-format*.)
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) => "??"
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) -> signals: This sequence can't be decoded using UTF-8 as it is too short. 1 octet missing at then end.
The reason is rather "simple": the decoder invokes the following chain of calls: compute-number-of-chars -> check-end -> signal-encoding-error
This contrasts to the most of decoder code, which directly calls recover-from-encoding-error instead of signal-encoding-error. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru
flexi-streams-devel mailing list flexi-streams-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
To make these two aspects - length calculation and error recovery - consistent, the following approach may be good:
Length calculation never signals encoding error. Instead, it takes into account that wrong byte sequences may be replaced by a character, provided via *substitution-char* or use-value restart. I.e. every wrong byte sequence is counted as one character.
In decoding process which follows the length calculation two cases are possible: 1. some error is not recovered (no *substitution-char* provided or use-value restait doesn't matter what length was calculated 2.
10.02.2012, 01:21, "Edi Weitz" edi@agharta.de:
Sorry for the delay. I think this is more or less "on purpose." (It's been a while since I wrote that stuff...)
The recover-from-encoding-error helper function is used when during decoding we encounter something which "looks like" a character (so to say) but isn't one - in which case we can e.g. replace it with the substitution character.
I think the error you mention happens earlier - when the length is checked.
Of course, one could argue that one could just as well use the same restart here. Maybe you can just submit a patch (including documentation if needed and ideally with new tests) and convince Hans to make a new release?
Thanks, Edi.
On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov divanov11@gmail.com wrote:
Hello folks,
I have bumped into the following error while playing with Hunchentoot. (It is originated from url-decoding GET parameters with *hunchentoot-default-external-format*.)
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) => "??"
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) -> signals: This sequence can't be decoded using UTF-8 as it is too short. 1 octet missing at then end.
The reason is rather "simple": the decoder invokes the following chain of calls: compute-number-of-chars -> check-end -> signal-encoding-error
This contrasts to the most of decoder code, which directly calls recover-from-encoding-error instead of signal-encoding-error. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru
_______________________________________________ flexi-streams-devel mailing list flexi-streams-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
flexi-streams-devel mailing list flexi-streams-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
[Sorry, accidentially hit Enter and sent unfinished letter. So, once again: ]
To make these two aspects - length calculation and error recovery - consistent, the following approach may be good:
Length calculation never signals encoding error. Instead, it takes into account that wrong byte sequences may be replaced by a character, provided via *substitution-char* or use-value restart. I.e. every wrong byte sequence is counted as one character.
In decoding process which follows the length calculation two cases are possible: 1. some error is not recovered (no *substitution-char* provided or use-value invoked). The decoding fails completely and it doesn't matter what length was calculated. 2. All the wrong sequences were substituted. In this case the length where all the wrong sequences are counted as one character exactly matches the need of decoding process.
Unfortunately I can not work on patch for this now and in the near future.
Best regards, - Anton
10.02.2012, 01:21, "Edi Weitz" edi@agharta.de:
Sorry for the delay. I think this is more or less "on purpose." (It's been a while since I wrote that stuff...)
The recover-from-encoding-error helper function is used when during decoding we encounter something which "looks like" a character (so to say) but isn't one - in which case we can e.g. replace it with the substitution character.
I think the error you mention happens earlier - when the length is checked.
Of course, one could argue that one could just as well use the same restart here. Maybe you can just submit a patch (including documentation if needed and ideally with new tests) and convince Hans to make a new release?
Thanks, Edi.
On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov divanov11@gmail.com wrote:
Hello folks,
I have bumped into the following error while playing with Hunchentoot. (It is originated from url-decoding GET parameters with *hunchentoot-default-external-format*.)
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) => "??"
(let ((flex:*substitution-char* #?)) (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) -> signals: This sequence can't be decoded using UTF-8 as it is too short. 1 octet missing at then end.
The reason is rather "simple": the decoder invokes the following chain of calls: compute-number-of-chars -> check-end -> signal-encoding-error
This contrasts to the most of decoder code, which directly calls recover-from-encoding-error instead of signal-encoding-error. -- Sincerely, Dmitriy Ivanov lisp.ystok.ru
_______________________________________________ flexi-streams-devel mailing list flexi-streams-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
flexi-streams-devel mailing list flexi-streams-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
flexi-streams-devel@common-lisp.net