#36: file-position broken for utf16 and utf32 ---------------------+------------------------------------------------------ Reporter: rtoy | Owner: somebody Type: defect | Status: new Priority: minor | Milestone: Component: Core | Version: 2010-01 Resolution: | Keywords: ---------------------+------------------------------------------------------
Comment(by rtoy):
Keeping track of the octets is probably the only "correct" solution. There's no guarantee that the input (octet-to-code) state has any relationship to the output (code-to-octet) state, so there may be no consistent way run string-encode correctly.
Some tests with keeping track of the char lengths indicate that the cost is fairly low, at least when reading characters one at a time (but the conversion is still done a block at a time and doled out one character at a time).