
Raymond Toy pushed to branch issue-363-add-version-number at cmucl / cmucl Commits: 9ea51cb2 by Raymond Toy at 2025-01-28T14:14:09+00:00 Fix #364: Add interface to mkstemp and mkdtemp - - - - - 98961a8b by Raymond Toy at 2025-01-28T14:14:10+00:00 Merge branch 'issue-364-add-mkstemp-mkdtemp' into 'master' Fix #364: Add interface to mkstemp and mkdtemp Closes #364 See merge request cmucl/cmucl!260 - - - - - 5f94e404 by Raymond Toy at 2025-01-28T15:50:59+00:00 Fix #369: Better docstring for unix-setlocale - - - - - 4bc3e3f4 by Raymond Toy at 2025-01-28T15:50:59+00:00 Merge branch 'issue-369-unix-setlocale-docstring' into 'master' Fix #369: Better docstring for unix-setlocale Closes #369 See merge request cmucl/cmucl!263 - - - - - bf461224 by Raymond Toy at 2025-01-29T13:59:15-08:00 Update release notes with more closed issues. - - - - - fd391099 by Raymond Toy at 2025-01-31T14:39:40+00:00 Fix #367: Add method to count number of octets to encode a string - - - - - 12e15d8b by Raymond Toy at 2025-01-31T14:39:40+00:00 Merge branch 'issue-367-count-octets-for-encoding' into 'master' Fix #367: Add method to count number of octets to encode a string Closes #367 See merge request cmucl/cmucl!264 - - - - - 739de261 by Raymond Toy at 2025-01-31T06:57:45-08:00 Update release notes to add issue 367 - - - - - 46f8aca7 by Raymond Toy at 2025-02-03T13:47:04-08:00 Merge branch 'master' into issue-363-add-version-number - - - - - c88d93f3 by Raymond Toy at 2025-02-03T13:49:25-08:00 Apply suggested changes to .gitlab-ci.yml - - - - - bcea3e15 by Raymond Toy at 2025-02-03T13:54:57-08:00 Write "$TESTDIR" instead of just $TESTDIR In case TESTDIR has spaces or other weird characters. - - - - - c95ff4ba by Raymond Toy at 2025-02-03T14:22:20-08:00 Apply suggestions from review - - - - - dc084d13 by Raymond Toy at 2025-02-04T07:40:42-08:00 Create a header file for the cmucl version Modify bin/cmucl-version.sh to either echo out the version or to echo out text that can be used as the contents of cmucl-version.h. Use the `-f` flag to determine what to output. `-f` means the contents for a file. Also fix a typo in `if` statement in cmucl-version.sh Modify GNUmakefile to create cmucl-version.h automatically from cmucl-version.sh. We don't need "-DCMUCL_VERSION=..." anymore. Also, always build cmucl-version.h to make sure lisp has the right version info. In lisp.c, include cmucl-version.h to get the version. - - - - - 27 changed files: - .gitlab-ci.yml - bin/build.sh - bin/cmucl-version.sh - bin/make-dist.sh - bin/run-unit-tests.sh - + src/bootfiles/21e/boot-2024-08.lisp - src/code/exports.lisp - src/code/extfmts.lisp - src/code/unix.lisp - src/general-info/release-21f.md - src/i18n/locale/cmucl-unix.pot - src/i18n/locale/cmucl.pot - src/lisp/GNUmakefile - src/lisp/lisp.c - src/pcl/simple-streams/external-formats/ascii.lisp - src/pcl/simple-streams/external-formats/euc-kr.lisp - src/pcl/simple-streams/external-formats/iso8859-1.lisp - src/pcl/simple-streams/external-formats/iso8859-2.lisp - src/pcl/simple-streams/external-formats/mac-roman.lisp - src/pcl/simple-streams/external-formats/utf-16-be.lisp - src/pcl/simple-streams/external-formats/utf-16-le.lisp - src/pcl/simple-streams/external-formats/utf-16.lisp - src/pcl/simple-streams/external-formats/utf-32-be.lisp - src/pcl/simple-streams/external-formats/utf-32-le.lisp - src/pcl/simple-streams/external-formats/utf-32.lisp - src/pcl/simple-streams/external-formats/utf-8.lisp - + tests/external-formats.lisp Changes: ===================================== .gitlab-ci.yml ===================================== @@ -1,7 +1,7 @@ variables: download_url: "https://common-lisp.net/project/cmucl/downloads/snapshots/2024/08" version: "2024-08-x86" - bootstrap: "" + bootstrap: "-B boot-2024-08" stages: @@ -51,8 +51,8 @@ linux:build: # instead of clang. - bin/build.sh $bootstrap -R -C "x86_linux" -o snapshot/bin/lisp # - bin/build.sh $bootstrap -R -C "x86_linux" -o snapshot/bin/lisp - # If needed use -V to specify the version in case some tag makes git - # describe return something that make-dist.sh doesn't like. + # When the result of `git describe` cannot be used as a version + # string, an alternative can be provided with the -V flag - bin/make-dist.sh -I dist linux-4 linux:cross-build: ===================================== bin/build.sh ===================================== @@ -51,6 +51,7 @@ export GIT_FILE_COMMENT SKIPUTILS=no DEFAULT_VERSION="`bin/cmucl-version.sh`" export DEFAULT_VERISON +echo DEFAULT_VERSION = $DEFAULT_VERSION # If gmake exists, assume it is GNU make and use it. if [ -z "$MAKE" ]; then ===================================== bin/cmucl-version.sh ===================================== @@ -1,18 +1,44 @@ #!/bin/sh +# If no, just print out the version. If yes, print out the version as +# a C file #define. + +FILE="" + +while getopts "f" arg +do + case $arg in + f) FILE=yes + ;; + esac +done + # Script to determine the cmucl version based on git describe GIT_HASH="`(git describe --dirty 2>/dev/null || git describe 2>/dev/null)`" # echo GIT_HASH = ${GIT_HASH} -if expr "X${GIT_HASH}" : 'Xsnapshot-[0-9][0-9][0-9][0-9]-[01][0-9]' > /dev/null; then +if [ `expr "X${GIT_HASH}" : 'Xsnapshot-[0-9][0-9][0-9][0-9]-[01][0-9]'` != 0 ]; then # The git hash looks like snapshot-yyyy-mm-<stuff>. Remove the # "snapshot-" part. DEFAULT_VERSION=`expr "${GIT_HASH}" : "snapshot-\(.*\)"` fi -if expr "X${GIT_HASH}" : 'X[0-9][0-9][a-f]' > /dev/null; then +if [ `expr "X${GIT_HASH}" : 'X[0-9][0-9][a-f]'` != 0 ]; then # The git hash looks like a release which is 3 hex digits. Use it as is. DEFAULT_VERSION="${GIT_HASH}" fi -echo $DEFAULT_VERSION +if [ -z "$FILE" ]; then + echo $DEFAULT_VERSION +else + cat <<EOF +/* + * Cmucl version + * + * DO NOT EDIT! This file is auto-generated via bin/cmucl-version.sh. + */ + +#define CMUCL_VERSION "$DEFAULT_VERSION" +EOF +fi + ===================================== bin/make-dist.sh ===================================== @@ -182,9 +182,6 @@ fi TARGET="`echo $1 | sed 's:/*$::'`" echo INSTALL_DIR = $INSTALL_DIR -#if [ -n "$INSTALL_DIR" ]; then -# VERSION="today" -#fi echo cmucl-$VERSION-$ARCH-$OS ROOT=`dirname $0` ===================================== bin/run-unit-tests.sh ===================================== @@ -51,7 +51,7 @@ trap cleanup EXIT #set -x if [ -n "${TESTDIR}" ]; then - TESTDIRARG=" :test-directory \"${TESTDIR}/\"" + TESTDIRARG=" :test-directory \"$TESTDIR/\"" else TESTDIR="tests/" TESTDIRARG="" @@ -59,12 +59,12 @@ fi # Compile up the C file that is used for testing alien funcalls to # functions that return integer types of different lengths. We use # gcc since clang isn't always available. -(cd $TESTDIR; gcc -m32 -O3 -c test-return.c) +(cd "$TESTDIR"; gcc -m32 -O3 -c test-return.c) if [ $# -eq 0 ]; then # Test directory arg for run-all-tests if a non-default # No args so run all the tests - $LISP -nositeinit -noinit -load $TESTDIR/run-tests.lisp -eval "(cmucl-test-runner:run-all-tests ${TESTDIRARG})" + $LISP -nositeinit -noinit -load "$TESTDIR"/run-tests.lisp -eval "(cmucl-test-runner:run-all-tests ${TESTDIRARG})" else # Run selected files. Convert each file name to uppercase and append "-TESTS" result="" @@ -73,6 +73,6 @@ else new=`echo $f | tr '[a-z]' '[A-Z]'` result="$result "\"$new-TESTS\" done - $LISP -nositeinit -noinit -load $TESTDIR/run-tests.lisp -eval "(progn (cmucl-test-runner:load-test-files) (cmucl-test-runner:run-test $result))" + $LISP -nositeinit -noinit -load "$TESTDIR"/run-tests.lisp -eval "(progn (cmucl-test-runner:load-test-files) (cmucl-test-runner:run-test $result))" fi ===================================== src/bootfiles/21e/boot-2024-08.lisp ===================================== @@ -0,0 +1,10 @@ +;; Bootstrap file for the ef-octet-count changes. Just need to change +;; the value of +ef-max+ + +(in-package "STREAM") + +(handler-bind + ((error (lambda (c) + (declare (ignore c)) + (invoke-restart 'continue)))) + (defconstant +ef-max+ 14)) ===================================== src/code/exports.lisp ===================================== @@ -1456,7 +1456,8 @@ (name '("STRING-TO-OCTETS" "OCTETS-TO-STRING" "*DEFAULT-EXTERNAL-FORMAT*" "STRING-ENCODE" "STRING-DECODE" "SET-SYSTEM-EXTERNAL-FORMAT" - "LIST-ALL-EXTERNAL-FORMATS" "DESCRIBE-EXTERNAL-FORMAT")) + "LIST-ALL-EXTERNAL-FORMATS" "DESCRIBE-EXTERNAL-FORMAT" + "STRING-OCTET-COUNT")) (intern name "STREAM")) (defpackage "EXTENSIONS" ===================================== src/code/extfmts.lisp ===================================== @@ -19,7 +19,8 @@ string-encode string-decode set-system-external-format +replacement-character-code+ list-all-external-formats - describe-external-format)) + describe-external-format + string-octet-count)) (defvar *default-external-format* :utf-8 @@ -52,6 +53,7 @@ flush ; flush state copy-state ; copy state osc ; octets to string, counted + oc ; number of octets to encode string max) ;; Unicode replacement character U+FFFD @@ -96,6 +98,11 @@ (copy-state nil :type (or null function) :read-only t) (cache nil :type (or null simple-vector)) ;; + ;; Function to count the number of octets needed to encode a + ;; codepoint. Basically like code-to-octets, except we return the + ;; number of octets needed instead of the octets themselves. + (octet-count #'%efni :type (or null function) :read-only t) + ;; ;; Minimum number of octets needed to form a codepoint (min 1 :type kernel:index :read-only t) ;; @@ -126,7 +133,8 @@ (setf (gethash (ef-name ef) *external-formats*) ef)) (declaim (inline ef-octets-to-code ef-code-to-octets ef-flush-state ef-copy-state - ef-cache ef-min-octets ef-max-octets)) + ef-cache ef-min-octets ef-max-octets + ef-octet-count)) (defun ef-octets-to-code (ef) (efx-octets-to-code (ef-efx ef))) @@ -143,6 +151,9 @@ (defun ef-cache (ef) (efx-cache (ef-efx ef))) +(defun ef-octet-count (ef) + (efx-octet-count (ef-efx ef))) + (defun ef-min-octets (ef) (efx-min (ef-efx ef))) @@ -166,7 +177,7 @@ ;;; DEFINE-EXTERNAL-FORMAT -- Public ;;; ;;; name (&key base min max size documentation) (&rest slots) octets-to-code -;;; code-to-octets flush-state copy-state +;;; code-to-octets flush-state copy-state octet-count ;;; ;;; Define a new external format. If base is specified, then an ;;; external format is defined that is based on a previously defined @@ -228,6 +239,15 @@ ;;; This should probably be a deep copy so that if the original ;;; state is modified, the copy is not. ;;; +;;; octet-count (code state error &rest vars) +;;; Defines a form to determine the number of octets needed to +;;; encode the given CODE using the external format. This is +;;; essentially the same as CODE-TO-OCTETS, except the encoding is +;;; not saved anywhere. ERROR is the same as in CODE-TO-OCTETS. +;;; +;;; This should return one value: the number of octets needed to +;;; encode the given code. +;;; ;;; Note: external-formats work on code-points, not ;;; characters, so that the entire 31 bit ISO-10646 range can be ;;; used internally regardless of the size of a character recognized @@ -238,7 +258,7 @@ (defmacro define-external-format (name (&key base min max size (documentation "")) (&rest slots) &optional octets-to-code code-to-octets - flush-state copy-state) + flush-state copy-state octet-count) (let* ((tmp (gensym)) (min (or min size 1)) (max (or max size 6)) @@ -282,7 +302,17 @@ (declare (ignorable ,state)) (let (,@',slotb ,@(loop for var in vars collect `(,var (gensym)))) - ,body)))) + ,body))) + (octet-count ((code state error &rest vars) body) + `(lambda (,',tmp ,state ,error) + (declare (ignorable ,state ,error) + (optimize (ext:inhibit-warnings 3))) + (let (,@',slotb + (,code ',code) + ,@(loop for var in vars collect `(,var (gensym)))) + `(let ((,',code (the lisp:codepoint ,,',tmp))) + (declare (ignorable ,',code)) + ,,body))))) (%intern-ef (make-external-format ,name ,(if base `(ef-efx (find-external-format ,(ef-name base))) @@ -291,7 +321,8 @@ :flush-state ,flush-state :copy-state ,copy-state :cache (make-array +ef-max+ - :initial-element nil) + :initial-element nil) + :octet-count ,octet-count :min ,(min min max) :max ,(max min max))) nil @@ -688,7 +719,20 @@ character and illegal outputs are replaced by a question mark.") (intl:gettext "Cannot output codepoint #x~X to ISO8859-1 stream") ,code 1)) #x3F) - ,code)))) + ,code))) + () + () + (octet-count (code state error) + `(if (> ,code 255) + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error + (intl:gettext "Cannot output codepoint #x~X to ISO8859-1 stream") + ,code 1)) + 1) + 1))) ;;; OCTETS-TO-CODEPOINT, CODEPOINT-TO-OCTETS -- Semi-Public ;;; @@ -709,6 +753,10 @@ character and illegal outputs are replaced by a question mark.") (let ((ef (find-external-format external-format))) (funcall (ef-code-to-octets ef) code state output error))) +(defmacro count-codepoint-octets (external-format code state &optional error) + (let ((ef (find-external-format external-format))) + (funcall (ef-octet-count ef) code state error))) + (defvar *ef-base* +ef-max+) @@ -878,6 +926,42 @@ character and illegal outputs are replaced by a question mark.") (when f (funcall f state)))) +(defmacro octet-count (external-format char state &optional error) + (let ((nchar (gensym)) + (nstate (gensym)) + (count-it (gensym)) + (ch (gensym))) + `(let ((,nchar ,char) + (,nstate ,state)) + (when (null ,nstate) (setq ,nstate (setf ,state (cons nil nil)))) + (if (lisp::surrogatep (char-code ,nchar) :high) + (setf (car ,nstate) ,nchar) + (flet ((,count-it (,ch) + (count-codepoint-octets ,external-format ,ch (cdr ,nstate) ,error))) + (if (car ,nstate) + (prog1 + (,count-it (if (lisp::surrogatep (char-code ,nchar) :low) + (surrogates-to-codepoint (car ,nstate) ,nchar) + (if ,error + (locally + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error + (intl:gettext "Cannot convert invalide surrogate #~x~X to character") + ,nchar)) + +replacement-character-code+))) + (setf (car ,nstate) nil)) + ;; A lone trailing (low surrogate gets replaced with + ;; the replacement character. + (,count-it (if (lisp::surrogatep (char-code ,nchar) :low) + (if ,error + (locally + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error + (intl:gettext "Cannot convert lone trailing surrogate #x~X to character") + ,nchar)) + +replacement-character-code+) + (char-code ,nchar))))))))) + (def-ef-macro ef-string-to-octets (extfmt lisp::lisp +ef-max+ +ef-so+) `(lambda (string start end buffer buffer-start buffer-end error bufferp &aux (ptr buffer-start) (state nil) (last-octet buffer-start)) @@ -1071,6 +1155,31 @@ character and illegal outputs are replaced by a question mark.") (values (if stringp string (lisp::shrink-vector string pos)) (- pos s-start) last-octet new-state)))) +(def-ef-macro ef-string-octet-count (extfmt lisp::lisp +ef-max+ +ef-oc+) + `(lambda (string start end error &aux (total 0) (state nil)) + (dotimes (i (- end start) total) + (incf total + (octet-count ,extfmt (schar string (+ start i)) state error))))) + +(defun string-octet-count (string &key (start 0) end (external-format :default) error) + "Compute the number of octets needed to convert String using the + specified External-format. The string is bound by Start (defaulting + to 0) and End (defaulting to the end of the string)." + (let ((composing-format-p + ;; Determine is the external format is a composing format + ;; which we determine by seeing that the name of the format + ;; is a cons. Probably not the best way. + (consp (ef-name (find-external-format external-format))))) + ;; We currently don't know how to get just the number of octets + ;; when a composing external format is used. As a workaround, use + ;; STRING-TO-OCTETS to find the number of octets. + (if composing-format-p + (nth-value 1 + (string-to-octets string :start start :end end + :external-format external-format)) + (lisp::with-array-data ((string string) (start start) (end end)) + (funcall (ef-string-octet-count external-format) + string start end error))))) (def-ef-macro ef-encode (extfmt lisp::lisp +ef-max+ +ef-en+) `(lambda (string start end result error &aux (ptr 0) (state nil)) @@ -1186,10 +1295,11 @@ character and illegal outputs are replaced by a question mark.") (#.+ef-so+ (%ef-string-to-octets ef)) (#.+ef-en+ (%ef-encode ef)) (#.+ef-de+ (%ef-decode ef)) - (#.+ef-osc+ (%ef-octets-to-string-counted ef)))) + (#.+ef-osc+ (%ef-octets-to-string-counted ef)) + (#.+ef-oc+ (%ef-octet-count ef)))) `(setf (aref (ef-cache (find-external-format ,(ef-name ef))) ,slot) ,(subst (ef-name ef) ef - (function-lambda-expression (aref (ef-cache ef) slot)))))) + (function-lambda-expression (aref (ef-cache ef) slot))))))) ;;; Builtin external formats. @@ -1307,7 +1417,17 @@ replacement character.") ((< ,code #x800) (utf8 ,code 1)) ((< ,code #x10000) (utf8 ,code 2)) ((< ,code #x110000) (utf8 ,code 3)) - (t (error "How did this happen? Codepoint U+~X is illegal" ,code)))))) + (t (error "How did this happen? Codepoint U+~X is illegal" ,code))))) + () + () + (octet-count (code state error) + `(locally + (declare (optimize (ext:inhibit-warnings 3))) + (cond ((< ,code #x80) 1) + ((< ,code #x800) 2) + ((< ,code #x10000) 3) + ((< ,code #x110000) 4) + (t (error "How did this happen? Codepoint U+~X is illegal" ,code)))))) (define-external-format :ascii (:size 1 :documentation "US ASCII 7-bit encoding. Illegal input sequences are replaced with @@ -1333,4 +1453,14 @@ replaced with a question mark.") (declare (optimize (ext:inhibit-warnings 3))) (funcall ,error "Cannot output codepoint #x~X to ASCII stream" ,code)) #x3F) - ,code)))) + ,code))) + () + () + (octet-count (code state error) + `(if (> ,code #x7f) + (if ,error + (locally + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Cannot output codepoint #x~X to ASCII stream" ,code)) + 1) + 1))) ===================================== src/code/unix.lisp ===================================== @@ -2873,7 +2873,10 @@ (int-syscall ("fork"))) (defun unix-setlocale () - _N"Call setlocale(3c) with fixed args. Returns 0 on success." + _N"Set all the categories of the locale according to the values of + the environment variables by calling setlocale(LC_ALL, \"\"). + + Returns 0 on success and -1 if setlocale failed." (alien:alien-funcall (alien:extern-alien "os_setlocale" (function c-call:int)))) @@ -2900,3 +2903,32 @@ (extern-alien "os_get_locale_codeset" (function (* char)))) c-string)) + +(defun unix-mkstemp (template) + _N"Generates a unique temporary file name from TEMPLATE, and creates + and opens the file. On success, the corresponding file descriptor + and name of the file is returned. + + The last six characters of the template must be \"XXXXXX\"." + ;; Hope this buffer is large enough! + (let ((octets (%name->file template))) + (syscall ("mkstemp" c-call:c-string) + (values result + ;; Convert the file name back to a Lisp string. + (%file->name octets)) + octets))) + +(defun unix-mkdtemp (template) + _N"Generate a uniquely named temporary directory from Template, + which must have \"XXXXXX\" as the last six characters. The + directory is created with permissions 0700. The name of the + directory is returned." + (let* ((octets (%name->file template)) + (result (alien-funcall + (extern-alien "mkdtemp" + (function (* char) + c-call:c-string)) + octets))) + (if (null-alien result) + (values nil (unix-errno)) + (%file->name octets)))) ===================================== src/general-info/release-21f.md ===================================== @@ -30,6 +30,7 @@ public domain. * ANSI compliance fixes: * Bug fixes: * Gitlab tickets: + * ~~#135~~ `(unix-namestring ".")` returns "" instead of "." * ~~#154~~ piglatin translation does not work anymore * ~~#171~~ Readably print `(make-pathname :name :unspecfic)` * ~~#180~~ Move `get-page-size` to C @@ -52,6 +53,7 @@ public domain. available for Hemlock * ~~#261~~ Remove `get-system-info` from "bsd-os.lisp" * ~~#268~~ Can't clone ansi-test repo on Mac OS CI box + * ~~#262~~ [arch_skip_inst invalid code -55] * ~~#265~~ CI for mac os is broken * ~~#266~~ Support "~user" in namestrings * ~~#269~~ Add function to get user's home directory @@ -77,17 +79,42 @@ public domain. * ~~#299~~ Enable xoroshiro assembly routine * ~~#303~~ Variable `*assert-not-standard-readtable*` defined but not used. + * ~~#309~~ obj_run_linker does unnecessary allocations * ~~#312~~ Compiler error building motif server on Fedora 40 * ~~#314~~ tanh incorrect for large args * ~~#316~~ Support roundtrip character casing * ~~#320~~ Motif variant not defaulted for `x86_linux_clang` config * ~~#321~~ Rename Motif Config.x86 to Config.linux * ~~#323~~ Make string casing functions compliant + * ~~#327~~ Fix up weird CLRLF and LF line terminators in the same file * ~~#329~~ Fix compiler warnings in os.lisp * ~~#330~~ Fix typos in unicode.lisp * ~~#333~~ `load` doesn't accept generalized boolean for `:if-does-not-exist` arg + * ~~#338~~ Solaris/x86 build + * ~~#336~~ Clean up some compiler notes + * ~~#337~~ Cross-compile from x86 (linux) to x86 fails + * ~~#339~~ Solaris/x86 `nl_langinfo` returns "646" + * ~~#340~~ Use `+ascii-limit+` instead of `#x7f` in srctran.lisp + for consistency + * ~~#341~~ Update version feature in cross-compile script + * ~~#342~~ Add CI job to run gcc static analyer + * ~~#348~~ Solaris/x86: u_int64_t vs uint64_t + * ~~#347~~ Solaris/x86: Update cross-compile script + * ~~#350~~ Export warnings on Solaris + * ~~#351~~ Solaris does not recognize `-E` option for grep + * ~~#352~~ Always use bzip2 compression for tarballs + * ~~#353~~ Automatically use gtar on Solaris when making a distribution + * ~~#354~~ Check that executables can be created in CI + * ~~#356~~ Return value from `vm::x87-floating-point-modes` should + have status word in low part of result + * ~~#357~~ Solaris needs limits.h to get `PATH_MAX` in elf.c + * ~~#360~~ Adding site-init file * ~~#361~~ Add herald item to mention where to report issues + * ~~#362~~ Simplify "library:" search-list + * ~~#364~~ Add interface to `mkdtemp` and `mkstemp` + * ~~#367~~ Add stream:string-count-octets to count octets in a string + * ~~#369~~ Improve docstring for `unix::unix-setlocale` * Other changes: * Improvements to the PCL implementation of CLOS: * Changes to building procedure: ===================================== src/i18n/locale/cmucl-unix.pot ===================================== @@ -1418,7 +1418,11 @@ msgid "" msgstr "" #: src/code/unix.lisp -msgid "Call setlocale(3c) with fixed args. Returns 0 on success." +msgid "" +"Set all the categories of the locale according to the values of\n" +" the environment variables by calling setlocale(LC_ALL, \"\").\n" +"\n" +" Returns 0 on success and -1 if setlocale failed." msgstr "" #: src/code/unix.lisp @@ -1432,3 +1436,20 @@ msgstr "" msgid "Get the codeset from the locale" msgstr "" +#: src/code/unix.lisp +msgid "" +"Generates a unique temporary file name from TEMPLATE, and creates\n" +" and opens the file. On success, the corresponding file descriptor\n" +" and name of the file is returned.\n" +"\n" +" The last six characters of the template must be \"XXXXXX\"." +msgstr "" + +#: src/code/unix.lisp +msgid "" +"Generate a uniquely named temporary directory from Template,\n" +" which must have \"XXXXXX\" as the last six characters. The\n" +" directory is created with permissions 0700. The name of the\n" +" directory is returned." +msgstr "" + ===================================== src/i18n/locale/cmucl.pot ===================================== @@ -9300,6 +9300,13 @@ msgid "" " external format." msgstr "" +#: src/code/extfmts.lisp +msgid "" +"Compute the number of octets needed to convert String using the\n" +" specified External-format. The string is bound by Start (defaulting\n" +" to 0) and End (defaulting to the end of the string)." +msgstr "" + #: src/code/extfmts.lisp msgid "" "Encode the given String using External-Format and return a new\n" ===================================== src/lisp/GNUmakefile ===================================== @@ -41,7 +41,7 @@ OBJS = $(patsubst %.c,%.o,$(patsubst %.S,%.o,$(patsubst %.s,%.o,$(SRCS)))) ### Don't look in RCS for the files, because we might not want the latest. %: RCS/%,v -DEFAULT_VERSION = $(shell ../../bin/cmucl-version.sh) +#DEFAULT_VERSION = $(shell ../../bin/cmucl-version.sh) lisp.nm: lisp lisp.a echo 'Map file for lisp version ' `cat version` > ,lisp.nm @@ -53,8 +53,11 @@ version.o : version.c version mv ,version version $(CC) ${CFLAGS} $(CPPFLAGS) -DVERSION=`cat version` -c $< -lisp.o : lisp.c - $(CC) ${CFLAGS} $(CPPFLAGS) -DCMUCL_VERSION="\"${DEFAULT_VERSION}\"" -c $< +lisp.o : lisp.c cmucl-version.h + $(CC) ${CFLAGS} $(CPPFLAGS) -c $< + +cmucl-version.h: + ../../bin/cmucl-version.sh -f > cmucl-version.h lisp: ${OBJS} version.o $(CC) -g ${OS_LINK_FLAGS} -o ,lisp $^ ${OS_LIBS} -lm @@ -126,7 +129,7 @@ translations: done; done # Always build lisp.o so that the embedded version is updated. -.PHONY : translations lisp.o +.PHONY : translations cmucl-version.h lisp.o # Like translations, but we don't compute the diff. We just overwrite # the po files in the repository so that we can tell if the ===================================== src/lisp/lisp.c ===================================== @@ -42,8 +42,10 @@ #include <time.h> #endif +#include "cmucl-version.h" + #ifndef CMUCL_VERSION -#define CMUCL_VERSION "alpha" +#error CMUCL_VERSION not defined! #endif ===================================== src/pcl/simple-streams/external-formats/ascii.lisp ===================================== @@ -33,5 +33,15 @@ replaced with a question mark.") (declare (optimize (ext:inhibit-warnings 3))) (funcall ,error "Cannot output codepoint #x~X to ASCII stream" ,code)) #x3F) - ,code)))) + ,code))) + () + () + (octet-count (code state error) + `(if (> ,code #x7f) + (if ,error + (locally + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Cannot output codepoint #x~X to ASCII stream" ,code)) + 1) + 1))) ===================================== src/pcl/simple-streams/external-formats/euc-kr.lisp ===================================== @@ -1007,4 +1007,16 @@ character and illegal outputs are replaced by a question mark.") (t (if ,error (funcall ,error "Cannot output codepoint #x~X to EUC-KR format." ,code) - (,output #X3f)))))))) + (,output #X3f))))))) + () + () + (octet-count (code state error present) + `(if (<= ,code #x7f) + 1 + (let ((,present (get-inverse ,itable ,code))) + (cond (,present + 2) + (t + (if ,error + (funcall ,error "Cannot output codepoint #x~X to EUC-KR format." ,code) + 1))))))) ===================================== src/pcl/simple-streams/external-formats/iso8859-1.lisp ===================================== @@ -31,4 +31,17 @@ character and illegal outputs are replaced by a question mark.") (funcall ,error "Cannot output codepoint #x~X to ISO8859-1 stream" ,code 1)) #x3F) - ,code)))) + ,code))) + () + () + (octet-count (code state error) + `(if (> ,code 255) + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error + (intl:gettext "Cannot output codepoint #x~X to ISO8859-1 stream") + ,code 1)) + 1) + 1))) ===================================== src/pcl/simple-streams/external-formats/iso8859-2.lisp ===================================== @@ -47,4 +47,19 @@ character and illegal outputs are replaced by a question mark.") (declare (optimize (ext:inhibit-warnings 3))) (funcall ,error "Cannot output codepoint #x~X to ISO8859-2 stream" ,code)) - #x3F))))))) + #x3F)))))) + () + () + (octet-count (code state error present) + `(if (< ,code 160) + 1 + (let ((,present (get-inverse ,itable ,code))) + (if ,present + 1 + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Cannot output codepoint #x~X to ISO8859-2 stream" + ,code)) + 1)))))) ===================================== src/pcl/simple-streams/external-formats/mac-roman.lisp ===================================== @@ -49,4 +49,19 @@ character and illegal outputs are replaced by a question mark.") (declare (optimize (ext:inhibit-warnings 3))) (funcall ,error "Cannot output codepoint #x~X to MAC-ROMAN stream" ,code)) - #x3F))))))) + #x3F)))))) + () + () + (octet-count (code state error present) + `(if (< ,code 128) + 1 + (let ((,present (get-inverse ,itable ,code))) + (if ,present + 1 + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Cannot output codepoint #x~X to MAC-ROMAN stream" + ,code)) + 1)))))) ===================================== src/pcl/simple-streams/external-formats/utf-16-be.lisp ===================================== @@ -110,4 +110,12 @@ Unicode replacement character.") (copy-state (state) ;; The state is either NIL or a codepoint, so nothing really ;; special is needed to copy it. - `(progn ,state))) + `(progn ,state)) + (octet-count (code state error) + `(cond ((< ,code #x10000) + 2) + ((< ,code #x110000) + 4) + (t + ;; Replacement character is 2 octets + 2)))) ===================================== src/pcl/simple-streams/external-formats/utf-16-le.lisp ===================================== @@ -111,4 +111,12 @@ Unicode replacement character.") (copy-state (state) ;; The state is either NIL or a codepoint, so nothing really ;; special is needed. - `(progn ,state))) + `(progn ,state)) + (octet-count (code state error) + `(cond ((< ,code #x10000) + 2) + ((< ,code #x110000) + 4) + (t + ;; Replacement character is 2 octets + 2)))) ===================================== src/pcl/simple-streams/external-formats/utf-16.lisp ===================================== @@ -156,4 +156,18 @@ Unicode replacement character.") ,c)))))) (copy-state (state) ;; The state is list. Copy it - `(copy-list ,state))) + `(copy-list ,state)) + (octet-count (code state error) + `(let ((bom-count 0)) + (unless ,state + ;; Output BOM + (setf bom-count 2) + (setf ,state t)) + (+ bom-count + (cond ((< ,code #x10000) + 2) + ((< ,code #x110000) + 4) + (t + ;; Replacement character is 2 octets + 2)))))) ===================================== src/pcl/simple-streams/external-formats/utf-32-be.lisp ===================================== @@ -61,4 +61,18 @@ Unicode replacement character.") ,code)) +replacement-character-code+))) (t - (out ,code)))))) + (out ,code))))) + () + () + (octet-count (code state error) + `(cond ((lisp::surrogatep ,code) + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Surrogate code #x~4,'0X is illegal for UTF32 output" + ,code)) + ;; Replacement character is 2 octets + 2)) + (t + 4)))) ===================================== src/pcl/simple-streams/external-formats/utf-32-le.lisp ===================================== @@ -62,4 +62,18 @@ Unicode replacement character.") ,code)) +replacement-character-code+))) (t - (out ,code)))))) + (out ,code))))) + () + () + (octet-count (code state error) + `(cond ((lisp::surrogatep ,code) + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Surrogate code #x~4,'0X is illegal for UTF32 output" + ,code)) + ;; Replacement character is 2 octets + 2)) + (t + 4)))) ===================================== src/pcl/simple-streams/external-formats/utf-32.lisp ===================================== @@ -114,4 +114,20 @@ Unicode replacement character.") nil (copy-state (state) ;; The state is either NIL or T, so we can just return that. - `(progn ,state))) + `(progn ,state)) + (octet-count (code state error) + `(let ((bom-count 0)) + (unless ,state + (setf bom-count 4) + (setf ,state t)) + (cond ((lisp::surrogatep ,code) + (if ,error + (locally + ;; No warnings about fdefinition + (declare (optimize (ext:inhibit-warnings 3))) + (funcall ,error "Surrogate code #x~4,'0X is illegal for UTF32 output" + ,code)) + ;; Replacement character is 2 octets + (+ 2 bom-count))) + (t + (+ 4 bom-count)))))) ===================================== src/pcl/simple-streams/external-formats/utf-8.lisp ===================================== @@ -127,4 +127,14 @@ replacement character.") ((< ,code #x800) (utf8 ,code 1)) ((< ,code #x10000) (utf8 ,code 2)) ((< ,code #x110000) (utf8 ,code 3)) - (t (error "How did this happen? Codepoint U+~X is illegal" ,code)))))) + (t (error "How did this happen? Codepoint U+~X is illegal" ,code))))) + () + () + (octet-count (code state error) + `(locally + (declare (optimize (ext:inhibit-warnings 3))) + (cond ((< ,code #x80) 1) + ((< ,code #x800) 2) + ((< ,code #x10000) 3) + ((< ,code #x110000) 4) + (t (error "How did this happen? Codepoint U+~X is illegal" ,code)))))) ===================================== tests/external-formats.lisp ===================================== @@ -0,0 +1,135 @@ +;;; Tests for external formats + +(defpackage :external-formats-tests + (:use :cl :lisp-unit)) + +(in-package "EXTERNAL-FORMATS-TESTS") + +(defparameter *test-iso8859-1* + (let ((rs (kernel::make-random-object :state (kernel::init-random-state 27182828)))) + (lisp::codepoints-string + (loop for k from 0 below 1000 + collect (random 256 rs)))) + "Random test string with ISO8859-1 characters") + +(defparameter *test-unicode* + (let ((rs (kernel::make-random-object :state (kernel::init-random-state 27182828)))) + (lisp::codepoints-string + (loop for k from 0 below 1000 + collect (random 20000 rs)))) + "Random test string with codepoints below 20000") + + + +(defmacro test-octet-count (string format) + "Test that STRING-OCTET-COUNT returns the correct number of octets" + ;; We expect STRING-OCTET-COUNT returns the same number of octets + ;; that are produced by STRING-TO-OCTETS. + `(multiple-value-bind (octets count converted) + (stream:string-to-octets ,string :external-format ,format) + ;; While we're at it, make sure that the length of the octet + ;; buffer matches returned count. And make sure we converted all + ;; the characters in the string. + (assert-equal (length octets) count) + (assert-equal (length ,string) converted) + ;; Finally, make sure that STRING-OCTET-COUNT returns the same + ;; number of octets from STRING-TO-OCTETS. + (assert-equal (length octets) + (stream::string-octet-count ,string :external-format ,format)))) + +(define-test octet-count.iso8859-1 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-1)) + +(define-test octet-count.ascii + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :ascii)) + +(define-test octet-count.ascii.error + (:tag :octet-count) + (assert-error 'simple-error + (stream::string-octet-count *test-iso8859-1* + :external-format :ascii + :error 'error))) + +(define-test octet-count.utf-8 + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-8)) + +(define-test octet-count.utf-16 + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-16)) + +(define-test octet-count.utf-16-be + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-16-be)) + +(define-test octet-count.utf-16-le + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-16-le)) + +(define-test octet-count.utf-32 + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-32)) + +(define-test octet-count.utf-32-le + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-32-le)) + +(define-test octet-count.utf-32-le + (:tag :octet-count) + (test-octet-count *test-unicode* :utf-32-le)) + +(define-test octet-count.euc-kr + (:tag :octet-count) + (test-octet-count *test-unicode* :euc-kr)) + +(define-test octet-count.iso8859-2 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-2)) + +(define-test octet-count.iso8859-3 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-3)) + +(define-test octet-count.iso8859-4 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-4)) + +(define-test octet-count.iso8859-5 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-5)) + +(define-test octet-count.iso8859-6 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-6)) + +(define-test octet-count.iso8859-7 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-7)) + +(define-test octet-count.iso8859-8 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-8)) + +(define-test octet-count.iso8859-10 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-10)) + +(define-test octet-count.iso8859-13 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-13)) + +(define-test octet-count.iso8859-14 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-14)) + +(define-test octet-count.iso8859-15 + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :iso8859-15)) + +(define-test octet-count.mac-roman + (:tag :octet-count) + (test-octet-count *test-iso8859-1* :mac-roman)) + + View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/c60a9a250050d3c19b3a673... -- View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/c60a9a250050d3c19b3a673... You're receiving this email because of your account on gitlab.common-lisp.net.
participants (1)
-
Raymond Toy (@rtoy)