Hello,
Sorry for the late reply.
On Thu, Apr 21, 2011 at 10:36 PM, Rob Blackwell rob.blackwell@aws.net wrote:
I'm still a little confused as to why the length is 4 and not 3 - shouldn’t the byte order mark have been discarded?
I'm not sure. I couldn't find any clear indications on how leading BOMs should be handled for UTF-8. The BOM FAQ seems to indicate they should be converted to ZERO WIDTH NON-BREAKING SPACEs, maybe. Any comments? It would perhaps be interesting to check what well established libraries such as ICU do.
Cheers,