Raymond Toy pushed to branch issue-139-set-filename-encoding-to-utf8 at cmucl / cmucl
Commits: 689db03d by Raymond Toy at 2022-12-10T11:45:53-08:00 Convert runtime strings to utf-16
The C runtime initializes several variables from either the environment or the command line. These are, of course, encoded in the locale, so we need to convert these strings into utf-16 format. Add `decode-runtime-strings` to do just that.
This problem showed up when installing cmucl in /tmp/αβ and trying to run lisp. We get an error when printing the herald for the core file that is being used because we call truename on the path. Since the path is encoded, we can't find the file. We need to convert the path to a utf-16 string that we can use.
- - - - - 6720015c by Raymond Toy at 2022-12-10T11:55:03-08:00 Remove lisp:: package specifer
We're in the lisp package, so we don't need the prefix.
- - - - - 234d86a7 by Raymond Toy at 2022-12-10T16:03:53-08:00 Handle decode-runtime-strings and environment more carefully.
`decode-runtime-strings` needs to be more careful in converting the strings from the C runtime. For the command line parameters and the environment list, we need to use the locale when converting to Lisp strings. But `*cmucl-lib*`, `*cmucl-core-path*`, and `*unidata-path*` needs to use the file encoding to convert the C result to Lisp.
We also need to call `environment-init` after calling `decode-runtime-strings` because the strings could have changed. This important for the "library:" search-list which contains file names. Without this, the search-list is mangled so we can't find anything.
Finally, call `intl::setlocale` after the `environment-init` has been called again because `intl::setlocale` needs the paths to the pot files so the "library:" search-list has to be valid pathnames. Previously, translations could be accessed early and error out because the paths were not correct.
This was testing by installing in "/tmp/αβ" and running lisp. Before these changes, we errored out printing out the path to the core file because the path to the core file was not properly decoded. Now it works.
Then we tried `(set-system-external-format :euc-kr)`. We can find the format implementation file correctly.
We also tested with ``` LANG=ko_KR.EUC_KR bin/lisp -noinit ``` This correctly loads up the euc-kr file and sets the external format. There are no errors. (But of course the printed path for the core file is wrong because euc-kr can't handle the greek letters.
- - - - - f43c6517 by Raymond Toy at 2022-12-10T16:26:21-08:00 Add some comments
- - - - -
2 changed files:
- src/code/extfmts.lisp - src/code/save.lisp
Changes:
===================================== src/code/extfmts.lisp ===================================== @@ -493,7 +493,7 @@ ;; encoding to NIL because we don't need any special ;; encoding to open the format files. (let* ((*print-readably* nil) - (unix::*filename-encoding* nil) + ;;(unix::*filename-encoding* nil) (*package* (find-package "STREAM")) (lisp::*enable-package-locked-errors* nil) (s (open (format nil "ext-formats:~(~A~).lisp" name)
===================================== src/code/save.lisp ===================================== @@ -164,20 +164,48 @@ *default-external-format*)))) (values))
- +(defun decode-runtime-strings (locale file-locale) + ;; The C runtime can initialize the following strings from the + ;; command line or the environment. We need to decode these into + ;; the utf-16 strings that Lisp uses. + (setf lisp-command-line-list + (mapcar #'(lambda (s) + (stream:string-decode s locale)) + lisp-command-line-list)) + (setf lisp-environment-list + (mapcar #'(lambda (s) + (stream:string-decode s locale)) + lisp-environment-list)) + ;; This needs more work.. *cmucl-lib* could be set from the the envvar + ;; "CMUCLLIB" or from the "-lib" command-line option, and thus + ;; should use the LOCALE to decode the string. + (when *cmucl-lib* + (setf *cmucl-lib* + (stream:string-decode *cmucl-lib* file-locale))) + ;; This also needs more work since the core path could come from the + ;; "-core" command-line option and should thus use LOCALE to decode + ;; the string. It could also come from the "CMUCLCORE" envvar. + (setf *cmucl-core-path* + (stream:string-decode *cmucl-core-path* file-locale)) + ;; *unidata-path* defaults to a pathname object, but the user can + ;; specify a path, so we need to decode the string path if given. + (when (and *unidata-path* (stringp *unidata-path*)) + (setf *unidata-path* + (stream:string-decode *unidata-path* file-locale)))) + (defun save-lisp (core-file-name &key - (purify t) - (root-structures ()) - (environment-name "Auxiliary") - (init-function #'%top-level) - (load-init-file t) - (site-init "library:site-init") - (print-herald t) - (process-command-line t) - #+:executable - (executable nil) - (batch-mode nil) - (quiet nil)) + (purify t) + (root-structures ()) + (environment-name "Auxiliary") + (init-function #'%top-level) + (load-init-file t) + (site-init "library:site-init") + (print-herald t) + (process-command-line t) + #+:executable + (executable nil) + (batch-mode nil) + (quiet nil)) "Saves a CMU Common Lisp core image in the file of the specified name. The following keywords are defined:
@@ -278,13 +306,18 @@ ;; Load external format aliases now so we can aliases to ;; specify the external format. (stream::load-external-format-aliases) - ;; Set the locale for lisp - (intl::setlocale) ;; Set up :locale format (set-up-locale-external-format) ;; Set terminal encodings to :locale and filename encoding to :utf-8. ;; (This needs more work on Darwin.) (set-system-external-format :locale :utf-8) + (decode-runtime-strings :locale :utf-8) + ;; Need to reinitialize the environment again because + ;; we've possibly changed the environment variables and + ;; pathnames. + (environment-init) + ;; Set the locale for lisp + (intl::setlocale) (ext::process-command-strings process-command-line) (setf *editor-lisp-p* nil) (macrolet ((find-switch (name) @@ -340,14 +373,14 @@ (unix:unix-exit (catch '%end-of-the-world (unwind-protect - (if *batch-mode* - (handler-case - (%restart-lisp) - (error (cond) - (format *error-output* (intl:gettext "Error in batch processing:~%~A~%") - cond) - (throw '%end-of-the-world 1))) - (%restart-lisp)) + (if *batch-mode* + (handler-case + (%restart-lisp) + (error (cond) + (format *error-output* (intl:gettext "Error in batch processing:~%~A~%") + cond) + (throw '%end-of-the-world 1))) + (%restart-lisp)) (finish-standard-output-streams))))))
;; Record dump time and host @@ -357,7 +390,7 @@ (let ((initial-function (get-lisp-obj-address #'restart-lisp)) (core-name (unix-namestring core-file-name nil))) (without-gcing - #+:executable + #+:executable (if executable (save-executable core-name initial-function) (save core-name initial-function #+sse2 1 #-sse2 0))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/354f94f5be60e66e139a09d...