#58: UTF-16 buffering problem ---------------------+------------------------------------------------------ Reporter: rtoy | Owner: Type: defect | Status: new Priority: major | Milestone: Component: Unicode | Version: 2012-04 Keywords: | ---------------------+------------------------------------------------------ The following code should not cause errors: {{{ (with-open-file (s "test.txt" :direction :output :external-format :utf-16) (dotimes (i 300) (write-char (code-char i) s)))
(with-open-file (s "test.txt" :direction :input :external-format :utf-16) (dotimes (i 300) (let ((ch (read-char s nil nil))) (unless (= i (char-code ch)) (format t "Error at ~D: ~S, ~4X~%" i ch (char-code ch))))))
}}}
#58: UTF-16 buffering problem ---------------------+------------------------------------------------------ Reporter: rtoy | Owner: Type: defect | Status: new Priority: major | Milestone: Component: Unicode | Version: 2012-04 Keywords: | ---------------------+------------------------------------------------------
Comment(by rtoy):
The issue is caused by the BOM (byte-order mark) that is inserted in the test file. This is ok, but when reading the file back in, the fast stream buffering code is confused because for all intents and purposes the BOM doesn't exist. But the buffering code needs to know that the BOM was there so that the internal buffers can be updated correctly.
The easiest solution is to disable the fast buffering code for utf16 and utf32. The BOM is not used for other encodings.
#58: UTF-16 buffering problem ----------------------+----------------------------------------------------- Reporter: rtoy | Owner: Type: defect | Status: closed Priority: major | Milestone: Component: Unicode | Version: 2012-04 Resolution: fixed | Keywords: ----------------------+----------------------------------------------------- Changes (by toy.raymond@…):
* status: new => closed * resolution: => fixed
Comment:
commit f3db74d49bf24c108053873f06905dbb2ed3cebd Author: Raymond Toy toy.raymond@gmail.com Date: Wed Apr 18 23:53:31 2012 -0700
Fix ticket:58. Handle the BOM character for utf-16 and utf-32. This is a bit of a hack.
* src/code/stream.lisp: * Check the state to see if a BOM was read. This critically depends on knowing the format of the state variable for utf16 and utf32 formats, but the stream code shouldn't have to know the state internals.
* src/general-info/release-20d.txt * Update.