On Fri, Oct 18, 2013 at 6:11 PM, Faré fahree@gmail.com wrote:
I'm curious about what happens to string with non-latin1 characters in them: do they cause the loading to abort, or are they interned as string with different lengths depending on the unicode support? (Similarly for latin1 strings that are malformed as utf-8.)
For attached file unicode.lisp:
on Linux:
In mlisp: ======= cl-user> (excl:locale-external-format excl:*locale*) #<external-format :utf8 [(crlf-base-ef :utf8)] @ #x2013f322> cl-user> (load (compile-file "/tmp/unicode.lisp")) ;;; Compiling file /tmp/unicode.lisp ;;; Writing fasl file /tmp/unicode.fasl ;;; Fasl write complete ; Fast loading /tmp/unicode.fasl t cl-user> *ch-string* "选项" cl-user> (aref *ch-string* 0) #\选 cl-user> (aref *ch-string* 1) #\项 cl-user> (char-code (aref *ch-string* 0)) 36873 cl-user> (char-code (aref *ch-string* 1)) 39033 cl-user>
then in alisp8: ============ CL-USER> (excl:locale-external-format excl:*locale*) #<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x20094152> CL-USER> (load "/tmp/unicode.fasl") ; Fast loading /tmp/unicode.fasl T CL-USER> *ch-string* "??" CL-USER> (aref *ch-string* 0) #? CL-USER> (aref *ch-string* 1) #? CL-USER> (char-code (aref *ch-string* 0)) 63 CL-USER> (char-code (aref *ch-string* 1)) 63 CL-USER> (char-code (aref *ch-string* 2)) ; Evaluation aborted on #<TYPE-ERROR @ #x224ad05a>. CL-USER> (load (compile-file "/tmp/unicode.lisp")) ;;; Compiling file /tmp/unicode.lisp ;;; Writing fasl file /tmp/unicode.fasl ;;; Fasl write complete ; Fast loading /tmp/unicode.fasl T CL-USER> *ch-string* "选项" CL-USER> (aref *ch-string* 0) #\é CL-USER> (aref *ch-string* 1) #%null CL-USER> (aref *ch-string* 2) #%tab CL-USER> (aref *ch-string* 3) #\é CL-USER> (aref *ch-string* 4) #\¡ CL-USER> (aref *ch-string* 5) #\¹ CL-USER> (aref *ch-string* 6) ; Evaluation aborted on #<TYPE-ERROR @ #x220d29f2>. CL-USER> (char-code (aref *ch-string* 0)) 233 CL-USER> (char-code (aref *ch-string* 1)) 128 CL-USER> (char-code (aref *ch-string* 2)) 137 CL-USER> (char-code (aref *ch-string* 3)) 233 CL-USER> (char-code (aref *ch-string* 4)) 161 CL-USER> (char-code (aref *ch-string* 5)) 185
On Windows:
in mlisp: =======
cl-user> (excl:locale-external-format excl:*locale*) #<external-format :|1252| ['(:e-crlf :1252-base)] @ #x202bda2a> cl-user> (load (compile-file "~/genworks/tmp/unicode.lisp" :external-format :utf-8)) ;;; Compiling file C:\Users\dcooper8\genworks\tmp\unicode.lisp ;;; Writing fasl file C:\Users\dcooper8\genworks\tmp\unicode.fasl ;;; Fasl write complete ; Fast loading C:\Users\dcooper8\genworks\tmp\unicode.fasl t
;;; ;;; ;;; ;; The rest is the same as Linux -- but note that on Windows you have to ;; say (compile-file ... :external-format :utf-8), because the default ;; external-format is '(:e-crlf :1252-base) while on Linux it's already ;; (crlf-base-ef :utf8) (at least in my locale). ;;
I'm not sure how to make a latin1 string which is malformed as utf-8.