11 May
2011
11 May
'11
7:32 a.m.
Hello, Sorry for the late reply. On Thu, Apr 21, 2011 at 10:36 PM, Rob Blackwell <rob.blackwell@aws.net> wrote:
I'm still a little confused as to why the length is 4 and not 3 - shouldn’t the byte order mark have been discarded?
I'm not sure. I couldn't find any clear indications on how leading BOMs should be handled for UTF-8. The BOM FAQ seems to indicate they should be converted to ZERO WIDTH NON-BREAKING SPACEs, maybe. Any comments? It would perhaps be interesting to check what well established libraries such as ICU do. Cheers, -- Luís Oliveira http://r42.eu/~luis/