On Fri, Oct 18, 2013 at 6:11 PM, Faré <fahree@gmail.com> wrote:

I'm curious about what happens to string with non-latin1 characters in them:
do they cause the loading to abort, or are they interned as string
with different lengths
depending on the unicode support? (Similarly for latin1 strings that
are malformed as utf-8.)


For attached file unicode.lisp:

on Linux:

In mlisp:
=======
cl-user> (excl:locale-external-format excl:*locale*)
#<external-format :utf8 [(crlf-base-ef :utf8)] @ #x2013f322>
cl-user> (load (compile-file "/tmp/unicode.lisp"))
;;; Compiling file /tmp/unicode.lisp
;;; Writing fasl file /tmp/unicode.fasl
;;; Fasl write complete
; Fast loading /tmp/unicode.fasl
t
cl-user> *ch-string*
"选项"
cl-user> (aref *ch-string* 0)
#\选
cl-user> (aref *ch-string* 1)
#\项
cl-user> (char-code (aref *ch-string* 0))
36873
cl-user> (char-code (aref *ch-string* 1))
39033
cl-user> 



then in alisp8:
============ 
CL-USER> (excl:locale-external-format excl:*locale*)
#<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x20094152>
CL-USER> (load "/tmp/unicode.fasl")
; Fast loading /tmp/unicode.fasl
T
CL-USER> *ch-string*
"??"
CL-USER> (aref *ch-string* 0)
#\?
CL-USER> (aref *ch-string* 1)
#\?
CL-USER> (char-code (aref *ch-string* 0))
63
CL-USER> (char-code (aref *ch-string* 1))
63
CL-USER> (char-code (aref *ch-string* 2))
; Evaluation aborted on #<TYPE-ERROR @ #x224ad05a>.
CL-USER> (load (compile-file "/tmp/unicode.lisp"))
;;; Compiling file /tmp/unicode.lisp
;;; Writing fasl file /tmp/unicode.fasl
;;; Fasl write complete
; Fast loading /tmp/unicode.fasl
T
CL-USER> *ch-string*
"选项"
CL-USER> (aref *ch-string* 0)
#\é
CL-USER> (aref *ch-string* 1)
#\%null
CL-USER> (aref *ch-string* 2)
#\%tab
CL-USER> (aref *ch-string* 3)
#\é
CL-USER> (aref *ch-string* 4)
#\¡
CL-USER> (aref *ch-string* 5)
#\¹
CL-USER> (aref *ch-string* 6)
; Evaluation aborted on #<TYPE-ERROR @ #x220d29f2>.
CL-USER> (char-code (aref *ch-string* 0))
233
CL-USER> (char-code (aref *ch-string* 1))
128
CL-USER> (char-code (aref *ch-string* 2))
137
CL-USER> (char-code (aref *ch-string* 3))
233
CL-USER> (char-code (aref *ch-string* 4))
161
CL-USER> (char-code (aref *ch-string* 5))
185


On Windows:

in mlisp:
=======

cl-user> (excl:locale-external-format excl:*locale*)
#<external-format :|1252| ['(:e-crlf :1252-base)] @ #x202bda2a>
cl-user> (load (compile-file "~/genworks/tmp/unicode.lisp" :external-format :utf-8))
;;; Compiling file C:\Users\dcooper8\genworks\tmp\unicode.lisp
;;; Writing fasl file C:\Users\dcooper8\genworks\tmp\unicode.fasl
;;; Fasl write complete
; Fast loading C:\Users\dcooper8\genworks\tmp\unicode.fasl
t


;;;
;;;
;;;
;;  The rest is the same as Linux -- but note that on Windows you have to 
;;  say (compile-file ... :external-format :utf-8), because the default
;;  external-format is '(:e-crlf :1252-base) while on Linux it's already 
;; (crlf-base-ef :utf8)  (at least in my locale). 
;;



I'm not sure how to make a latin1 string which is malformed as utf-8. 



-- 
My Best,

Dave Cooper, Genworks Support
david.cooper@genworks.com, dave.genworks.com(skype)