Raymond Toy pushed to branch issue-367-count-octets-for-encoding at cmucl / cmucl Commits: 31a76ff7 by Raymond Toy at 2025-01-12T14:08:01-08:00 Include BOM in octet count for UTF-16 and UTF-32 `string-to-octets` includes the BOM when encoding strings. To be consistent, update `string-octet-count` to include the BOM when computing the number of octets. This is only needed for :UTF-16 and :UTF-32 formats. The other utf-16 and utf-32 formats don't include the BOM. Enable tests for these two formats too. - - - - - 3 changed files: - src/pcl/simple-streams/external-formats/utf-16.lisp - src/pcl/simple-streams/external-formats/utf-32.lisp - tests/external-formats.lisp Changes: ===================================== src/pcl/simple-streams/external-formats/utf-16.lisp ===================================== @@ -158,16 +158,16 @@ Unicode replacement character.") ;; The state is list. Copy it `(copy-list ,state)) (octet-count (code state error) - `(progn - #+nil + `(let ((bom-count 0)) (unless ,state ;; Output BOM - (output #xFEFF) + (setf bom-count 2) (setf ,state t)) - (cond ((< ,code #x10000) - 2) - ((< ,code #x110000) - 4) - (t - ;; Replacement character is 2 octets - 2))))) + (+ bom-count + (cond ((< ,code #x10000) + 2) + ((< ,code #x110000) + 4) + (t + ;; Replacement character is 2 octets + 2)))))) ===================================== src/pcl/simple-streams/external-formats/utf-32.lisp ===================================== @@ -116,11 +116,9 @@ Unicode replacement character.") ;; The state is either NIL or T, so we can just return that. `(progn ,state)) (octet-count (code state error) - `(progn - ;; Should we count the BOM? - #+nil + `(let ((bom-count 0)) (unless ,state - (out #xFEFF) + (setf bom-count 4) (setf ,state t)) (cond ((lisp::surrogatep ,code) (if ,error @@ -130,6 +128,6 @@ Unicode replacement character.") (funcall ,error "Surrogate code #x~4,'0X is illegal for UTF32 output" ,code)) ;; Replacement character is 2 octets - 2)) + (+ 2 bom-count))) (t - 4))))) + (+ 4 bom-count)))))) ===================================== tests/external-formats.lisp ===================================== @@ -36,7 +36,6 @@ (:tag :octet-count) (test-octet-count *test-unicode* :utf-8)) -#+nil (define-test octet-count.utf-16 (:tag :octet-count) (test-octet-count *test-unicode* :utf-16)) @@ -49,7 +48,6 @@ (:tag :octet-count) (test-octet-count *test-unicode* :utf-16-le)) -#+nil (define-test octet-count.utf-32 (:tag :octet-count) (test-octet-count *test-unicode* :utf-32)) View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/31a76ff7e674b0dac9f89d9b... -- View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/31a76ff7e674b0dac9f89d9b... You're receiving this email because of your account on gitlab.common-lisp.net.
participants (1)
-
Raymond Toy (@rtoy)