On Fri, Oct 18, 2013 at 6:11 PM, Faré <fahree@gmail.com> wrote:

I'm curious about what happens to string with non-latin1 characters in them:
do they cause the loading to abort, or are they interned as string
with different lengths
depending on the unicode support? (Similarly for latin1 strings that
are malformed as utf-8.)

For attached file unicode.lisp:

on Linux:

In mlisp:

=======

cl-user> (excl:locale-external-format excl:*locale*)

#<external-format :utf8 [(crlf-base-ef :utf8)] @ #x2013f322>

cl-user> (load (compile-file "/tmp/unicode.lisp"))

;;; Compiling file /tmp/unicode.lisp

;;; Writing fasl file /tmp/unicode.fasl

;;; Fasl write complete

; Fast loading /tmp/unicode.fasl

cl-user> *ch-string*

"选项"

cl-user> (aref *ch-string* 0)

#\选

cl-user> (aref *ch-string* 1)

#\项

cl-user> (char-code (aref *ch-string* 0))

36873

cl-user> (char-code (aref *ch-string* 1))

39033

cl-user>

then in alisp8:

============

CL-USER> (excl:locale-external-format excl:*locale*)

#<EXTERNAL-FORMAT :LATIN1 [(CRLF-BASE-EF :LATIN1)] @ #x20094152>

CL-USER> (load "/tmp/unicode.fasl")

; Fast loading /tmp/unicode.fasl

CL-USER> *ch-string*

"??"

CL-USER> (aref *ch-string* 0)

#\?

CL-USER> (aref *ch-string* 1)

#\?

CL-USER> (char-code (aref *ch-string* 0))

CL-USER> (char-code (aref *ch-string* 1))

CL-USER> (char-code (aref *ch-string* 2))

; Evaluation aborted on #<TYPE-ERROR @ #x224ad05a>.

CL-USER> (load (compile-file "/tmp/unicode.lisp"))

;;; Compiling file /tmp/unicode.lisp

;;; Writing fasl file /tmp/unicode.fasl

;;; Fasl write complete

; Fast loading /tmp/unicode.fasl

CL-USER> *ch-string*

"é€‰é¡¹"

CL-USER> (aref *ch-string* 0)

#\é

CL-USER> (aref *ch-string* 1)

#\%null

CL-USER> (aref *ch-string* 2)

#\%tab

CL-USER> (aref *ch-string* 3)

#\é

CL-USER> (aref *ch-string* 4)

#\¡

CL-USER> (aref *ch-string* 5)

#\¹

CL-USER> (aref *ch-string* 6)

; Evaluation aborted on #<TYPE-ERROR @ #x220d29f2>.

CL-USER> (char-code (aref *ch-string* 0))

233

CL-USER> (char-code (aref *ch-string* 1))

128

CL-USER> (char-code (aref *ch-string* 2))

137

CL-USER> (char-code (aref *ch-string* 3))

233

CL-USER> (char-code (aref *ch-string* 4))

161

CL-USER> (char-code (aref *ch-string* 5))

185

On Windows:

in mlisp:

=======

cl-user> (excl:locale-external-format excl:*locale*)

#<external-format :|1252| ['(:e-crlf :1252-base)] @ #x202bda2a>

cl-user> (load (compile-file "~/genworks/tmp/unicode.lisp" :external-format :utf-8))

;;; Compiling file C:\Users\dcooper8\genworks\tmp\unicode.lisp

;;; Writing fasl file C:\Users\dcooper8\genworks\tmp\unicode.fasl

;;; Fasl write complete

; Fast loading C:\Users\dcooper8\genworks\tmp\unicode.fasl

;;;

;; The rest is the same as Linux -- but note that on Windows you have to

;; say (compile-file ... :external-format :utf-8), because the default

;; external-format is '(:e-crlf :1252-base) while on Linux it's already

;; (crlf-base-ef :utf8) (at least in my locale).

;;

I'm not sure how to make a latin1 string which is malformed as utf-8.

My Best,

Dave Cooper, Genworks Support
david.cooper@genworks.com, dave.genworks.com(skype)