cmucl-cvs
Threads by month
- ----- 2026 -----
- June
- May
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- 1 participants
- 3689 discussions
[Git][cmucl/cmucl][master] 2 commits: Fix #504: Read denormals with correct rounding
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch master at cmucl / cmucl
Commits:
20b8c4eb by Raymond Toy at 2026-06-26T14:12:04-07:00
Fix #504: Read denormals with correct rounding
- - - - -
5926490b by Raymond Toy at 2026-06-26T14:12:04-07:00
Merge branch 'issue-504-read-denormals-with-rounding' into 'master'
Fix #504: Read denormals with correct rounding
Closes #504
See merge request cmucl/cmucl!380
- - - - -
3 changed files:
- src/code/float.lisp
- src/general-info/release-22a.md
- tests/float.lisp
Changes:
=====================================
src/code/float.lisp
=====================================
@@ -885,40 +885,57 @@
;;; denormalized or underflows to 0.
;;;
(defun scale-float-maybe-underflow (x exp)
- (multiple-value-bind (sig old-exp)
- (integer-decode-float x)
+ (declare (type (or single-float double-float) x)
+ (fixnum exp))
+ (multiple-value-bind (sig old-exp float-sign)
+ (integer-decode-float x)
(let* ((digits (float-digits x))
+ (1+digits (1+ digits))
(new-exp (+ exp old-exp digits
(etypecase x
(single-float vm:single-float-bias)
(double-float vm:double-float-bias))))
- (sign (if (minusp (float-sign x)) 1 0)))
+ (sign (if (minusp float-sign) 1 0)))
(cond
- ((< new-exp
- (etypecase x
- (single-float vm:single-float-normal-exponent-min)
- (double-float vm:double-float-normal-exponent-min)))
- (when (vm:current-float-trap :inexact)
- (error 'floating-point-inexact :operation 'scale-float
- :operands (list x exp)))
- (when (vm:current-float-trap :underflow)
- (error 'floating-point-underflow :operation 'scale-float
- :operands (list x exp)))
- (let ((shift (1- new-exp)))
- ;; Is it necessary to have this IF here? Is there any case
- ;; where (ash sig shift) won't return 0 when
- ;; shift < -(digits-1)?
- (if (< shift (- (1- digits)))
+ ((< new-exp
+ (etypecase x
+ (single-float vm:single-float-normal-exponent-min)
+ (double-float vm:double-float-normal-exponent-min)))
+ (when (vm:current-float-trap :inexact)
+ (error 'floating-point-inexact :operation 'scale-float
+ :operands (list x exp)))
+ (when (vm:current-float-trap :underflow)
+ (error 'floating-point-underflow :operation 'scale-float
+ :operands (list x exp)))
+ ;; To round correctly, let the hardware multiplier do the
+ ;; rounding: build a normal float whose stored exponent is
+ ;; bumped up by 1+DIGITS (which puts it safely in the normal
+ ;; range), then multiply by 2^-(1+DIGITS). The multiplier is
+ ;; an exact power of two, so the multiplication is exact
+ ;; apart from the unavoidable rounding step that expresses
+ ;; the product as a denormal, which the FPU performs in the
+ ;; current rounding mode. If the bumped exponent is zero or
+ ;; negative the bumped float would itself be a denormal --
+ ;; losing the implicit 1 bit of SIG -- so handle that case
+ ;; explicitly by returning signed zero.
+ (let ((bumped-exp (+ new-exp 1+digits)))
+ (cond
+ ((<= bumped-exp 0)
(etypecase x
(single-float (single-from-bits sign 0 0))
- (double-float (double-from-bits sign 0 0)))
+ (double-float (double-from-bits sign 0 0))))
+ (t
(etypecase x
- (single-float (single-from-bits sign 0 (ash sig shift)))
- (double-float (double-from-bits sign 0 (ash sig shift)))))))
- (t
- (etypecase x
- (single-float (single-from-bits sign new-exp sig))
- (double-float (double-from-bits sign new-exp sig))))))))
+ (single-float
+ (* (single-from-bits sign bumped-exp sig)
+ (scale-float 1f0 (- 1+digits))))
+ (double-float
+ (* (double-from-bits sign bumped-exp sig)
+ (scale-float 1d0 (- 1+digits)))))))))
+ (t
+ (etypecase x
+ (single-float (single-from-bits sign new-exp sig))
+ (double-float (double-from-bits sign new-exp sig))))))))
;;; SCALE-FLOAT-MAYBE-OVERFLOW -- Internal
@@ -1135,42 +1152,74 @@
(assert (= len (the fixnum (1+ digits))))
(multiple-value-bind (f0)
(floatit (ash bits -1))
- #+nil
- (progn
- (format t "x = ~A~%" x)
- (format t "1: f0, f1 = ~A~%" f0)
- (format t " scale = ~A~%" (1+ scale)))
-
(scale-float f0 (1+ scale))))
(t
(multiple-value-bind (f0)
(floatit bits)
- #+nil
- (progn
- (format t "2: f0, f1 = ~A~%" f0)
- (format t " scale = ~A~%" scale)
- (format t "scale-float f0 = ~A~%" (scale-float f0 scale)))
- (let ((min-exponent
- ;; Compute the min (unbiased) exponent
- (ecase format
- (single-float
- (- vm:single-float-normal-exponent-min
- vm:single-float-bias
- vm:single-float-digits))
- (double-float
- (- vm:double-float-normal-exponent-min
- vm:double-float-bias
- vm:double-float-digits)))))
- ;; F0 is always between 0.5 and 1. If
- ;; SCALE is the min exponent, we have a
- ;; denormal number just less than the
- ;; least-positive float. We want to
- ;; return the least-positive-float so
- ;; multiply F0 by 2 (without adjusting
- ;; SCALE) to get the nearest float.
- (if (= scale min-exponent)
- (scale-float (* 2 f0) scale)
- (scale-float f0 scale))))))))
+ (scale-float f0 scale))))))
+ (denormal-excess ()
+ ;; How many bits of precision the result loses by being
+ ;; denormal instead of normal. A normal-precision return
+ ;; would be BITS*2^(SCALE-DIGITS) with BITS having DIGITS
+ ;; bits. Once it's known that this representation will
+ ;; produce a denormal -- equivalently, that SCALE-FLOAT-
+ ;; MAYBE-UNDERFLOW would take the underflow branch --
+ ;; (1 - BIAS) - SCALE bits of the mantissa fall below the
+ ;; denormal's narrower storage and must be rounded off.
+ ;; Zero in the normal range.
+ (let ((bias
+ (ecase format
+ (single-float vm:single-float-bias)
+ (double-float vm:double-float-bias))))
+ (declare (fixnum bias))
+ (max 0 (the fixnum
+ (- (the fixnum (- 1 bias))
+ scale)))))
+ (round-denormal (fraction-and-guard rem excess)
+ ;; FRACTION-AND-GUARD has (1+ DIGITS) bits with one guard
+ ;; bit; round it to (- DIGITS EXCESS) bits using round-to-
+ ;; nearest, ties to even, with REM as the sticky tail.
+ ;; Drops EXCESS+1 low bits in a single step. This is the
+ ;; one rounding the denormal result undergoes; no
+ ;; subsequent SCALE-FLOAT call is needed, so there is no
+ ;; double rounding.
+ (declare (type unsigned-byte fraction-and-guard rem)
+ (fixnum excess))
+ (let* ((shift (1+ excess))
+ (low (ldb (byte shift 0) fraction-and-guard))
+ (quot (ash fraction-and-guard (- shift)))
+ (halfway (ash 1 excess)))
+ (declare (fixnum shift))
+ (cond ((< low halfway) quot)
+ ((> low halfway) (1+ quot))
+ ((not (zerop rem)) (1+ quot))
+ ((oddp quot) (1+ quot))
+ (t quot))))
+ (denormal-from-bits (mantissa excess)
+ ;; MANTISSA has at most (- DIGITS EXCESS) bits and is the
+ ;; stored significand of a denormal result. Denormal
+ ;; storage holds (1- DIGITS) bits, so rounding can carry
+ ;; into the smallest normal only when EXCESS = 1, in
+ ;; which case MANTISSA can be exactly (ASH 1 (1- DIGITS)).
+ (declare (fixnum excess))
+ (let ((sign (if plusp 0 1)))
+ (case format
+ (single-float
+ (cond ((and (= excess 1)
+ (= mantissa
+ (ash 1 (1- vm:single-float-digits))))
+ (single-from-bits
+ sign vm:single-float-normal-exponent-min 0))
+ (t
+ (single-from-bits sign 0 mantissa))))
+ (double-float
+ (cond ((and (= excess 1)
+ (= mantissa
+ (ash 1 (1- vm:double-float-digits))))
+ (double-from-bits
+ sign vm:double-float-normal-exponent-min 0))
+ (t
+ (double-from-bits sign 0 mantissa)))))))
(floatit (bits)
(let ((sign (if plusp 0 1)))
(case format
@@ -1188,16 +1237,36 @@
(declare (fixnum extra))
(cond ((/= extra 1)
(assert (> extra 1)))
- ((oddp fraction-and-guard)
- (return
- (if (zerop rem)
- (float-and-scale
- (if (zerop (logand fraction-and-guard 2))
- fraction-and-guard
- (1+ fraction-and-guard)))
- (float-and-scale (1+ fraction-and-guard)))))
(t
- (return (float-and-scale fraction-and-guard)))))
+ (return
+ (let ((excess (denormal-excess)))
+ (cond
+ ((zerop excess)
+ ;; Normal result: original odd/even tie-break.
+ (cond ((oddp fraction-and-guard)
+ (if (zerop rem)
+ (float-and-scale
+ (if (zerop
+ (logand fraction-and-guard 2))
+ fraction-and-guard
+ (1+ fraction-and-guard)))
+ (float-and-scale
+ (1+ fraction-and-guard))))
+ (t
+ (float-and-scale fraction-and-guard))))
+ (t
+ ;; Denormal result: re-round directly to the
+ ;; denormal's narrower precision so the only
+ ;; rounding step happens here. Rounding to
+ ;; DIGITS first and re-rounding via
+ ;; SCALE-FLOAT-MAYBE-UNDERFLOW would double-
+ ;; round (e.g. 7.290983e-39 would land on an
+ ;; artifical tie at the 24-bit boundary).
+ (let ((mantissa
+ (round-denormal fraction-and-guard rem
+ excess)))
+ (declare (type unsigned-byte mantissa))
+ (denormal-from-bits mantissa excess)))))))))
(setq shifted-num (ash shifted-num -1))
(incf scale)))))))
=====================================
src/general-info/release-22a.md
=====================================
@@ -58,6 +58,8 @@ public domain.
* #463: `double-double-float` is missing comparison operations
between `double-double-float` and `double-float`
* #474: Add functions to print and parse C-style hex floats.
+ * #504: Do correct rounding in `scale-float-maybe-underflow`.
+ This was causing some denormals to be read incorrectly.
* Other changes:
* Improvements to the PCL implementation of CLOS:
* Changes to building procedure:
=====================================
tests/float.lisp
=====================================
@@ -179,7 +179,9 @@
(kernel::float-ratio-float (* 4 expo) 'double-float))
(assert-equal least-positive-double-float
(kernel::float-ratio-float (* 494/100 expo) 'double-float))
- (assert-equal least-positive-double-float
+ ;; 988/100*10^-324 is very close to 2*least-positive (the exact ratio
+ ;; is 1.9997 * least-positive), so it rounds to 2*least-positive.
+ (assert-equal (* 2 least-positive-double-float)
(kernel::float-ratio-float (* 988/100 expo) 'double-float)))))
(define-test reader-error.small-single-floats
@@ -678,8 +680,11 @@
(frob cdiv.maxima-case
#c(5.43d-10 1.13d-100)
#c(1.2d-311 5.7d-312)
- #c(3.691993880674614517999740937026568563794896024143749539711267954d301
- -1.753697093319947872394996242210428954266103103602859195409591583d301)
+ ;; Compute the expected value using rational arithmetic after
+ ;; converting the complex numbers above to the equivalent
+ ;; complex rationals.
+ (/ (complex (rational 5.43d-10) (rational 1.13d-100))
+ (complex (rational 1.2d-311) (rational 5.7d-312)))
52)
;; 12
;;
@@ -766,3 +771,258 @@
(coerce y '(complex single-float)))
x
y)))
+
+(define-test scale-float-underflow-rounding.single
+ (:tag :issues)
+ ;; SCALE-FLOAT into the denormal range must round to nearest, ties to
+ ;; even, instead of truncating the discarded bits. Each (X EXP BITS)
+ ;; triple gives a normal X, an exponent EXP to scale by, and the IEEE
+ ;; bits of the expected single-float result.
+ (ext:with-float-traps-masked (:underflow :inexact)
+ (dolist (case (list
+ ;; 1.7f0 * 2^-149: between denormals 1 and 2, closer to 2.
+ (list 1.7f0 -149 #x00000002)
+ ;; 1.5f0 * 2^-149: halfway between denormals 1 and 2;
+ ;; ties round to even -> 2.
+ (list 1.5f0 -149 #x00000002)
+ ;; 1.1f0 * 2^-149: closer to denormal 1.
+ (list 1.1f0 -149 #x00000001)
+ ;; 1.0001f0 * 2^-150: just above halfway between 0 and
+ ;; smallest denormal; rounds up to 1.
+ (list 1.0001f0 -150 #x00000001)
+ ;; 1.0f0 * 2^-150: exactly halfway between 0 and smallest
+ ;; denormal; ties round to even -> 0.
+ (list 1.0f0 -150 #x00000000)
+ ;; Largest single < 2 scaled by 2^-127: rounding carries
+ ;; into the implicit-1 position and produces the smallest
+ ;; normal number.
+ (list (kernel:make-single-float #x3fffffff)
+ -127 #x00800000)))
+ (destructuring-bind (x exp bits) case
+ (let ((result (scale-float x exp)))
+ (assert-equal bits (kernel:single-float-bits result)
+ x exp result))))))
+
+(define-test scale-float-underflow-rounding.double
+ (:tag :issues)
+ ;; Like SCALE-FLOAT-UNDERFLOW-ROUNDING.SINGLE but for double-floats.
+ ;; Each (X EXP HI LO) gives a normal X, an exponent EXP, and the IEEE
+ ;; high and low bits of the expected double-float result.
+ (ext:with-float-traps-masked (:underflow :inexact)
+ (dolist (case (list
+ ;; 1.7d0 * 2^-1074: between denormals 1 and 2, closer to 2.
+ (list 1.7d0 -1074 0 2)
+ ;; 1.5d0 * 2^-1074: tie, rounds to even -> 2.
+ (list 1.5d0 -1074 0 2)
+ ;; 1.1d0 * 2^-1074: closer to denormal 1.
+ (list 1.1d0 -1074 0 1)
+ ;; 1.0001d0 * 2^-1075: just above halfway, rounds up.
+ (list 1.0001d0 -1075 0 1)
+ ;; 1.0d0 * 2^-1075: tie at the bottom, rounds to even -> 0.
+ (list 1.0d0 -1075 0 0)
+ ;; Largest double < 2 scaled by 2^-1023: rounding carries
+ ;; into the implicit-1 position and produces the smallest
+ ;; normal number.
+ (list (kernel:make-double-float #x3fffffff #xffffffff)
+ -1023 #x00100000 0)))
+ (destructuring-bind (x exp hi lo) case
+ (let ((result (scale-float x exp)))
+ (assert-equal hi (kernel:double-float-high-bits result)
+ x exp result)
+ (assert-equal lo (kernel:double-float-low-bits result)
+ x exp result))))))
+
+(define-test scale-float-underflow-rounding.reader-single
+ (:tag :issues)
+ ;; The reader uses FLOAT-RATIO-FLOAT, which calls SCALE-FLOAT and
+ ;; hence SCALE-FLOAT-MAYBE-UNDERFLOW on denormal results. Reading
+ ;; small float literals must therefore also round to nearest.
+ (ext:with-float-traps-masked (:underflow :inexact)
+ ;; 1.1e-44 is closer to 8 * least-positive-single-float than to 7.
+ (assert-equal #x00000008
+ (kernel:single-float-bits 1.1e-44))
+ ;; 1.121e-44 (essentially the IEEE representation of 1.1e-44) reads
+ ;; as the same denormal.
+ (assert-equal #x00000008
+ (kernel:single-float-bits 1.121e-44))))
+
+(define-test scale-float-underflow-rounding.reader-double
+ (:tag :issues)
+ ;; Like the single-float reader test, in the double denormal range.
+ ;; 1.1d-322 is closest to denormal #x16 (= 22).
+ (ext:with-float-traps-masked (:underflow :inexact)
+ (assert-equal 0 (kernel:double-float-high-bits 1.1d-322))
+ (assert-equal #x16 (kernel:double-float-low-bits 1.1d-322))))
+
+(define-test float-ratio-float.denormal-double-rounding.single
+ (:tag :issues)
+ ;; Regression for the double-rounding bug exposed by reading
+ ;; "7.290983e-39". The exact rational lies at 5203019.332 * 2^-149,
+ ;; below halfway between denormals #x4f644b and #x4f644c. An earlier
+ ;; fix rounded to 24 bits first (lifting it to the artificial tie
+ ;; 5203019.5 * 2^-149) and then re-rounded ties-to-even to #x4f644c.
+ ;; FLOAT-RATIO-FLOAT now re-rounds directly to denormal precision in a
+ ;; single step, yielding the correctly-rounded #x4f644b.
+ (assert-equal #x4f644b
+ (kernel:single-float-bits 7.290983e-39)))
+
+(define-test float-ratio-float.denormal-low-below-halfway.single
+ (:tag :issues)
+ ;; Exercise ROUND-DENORMAL's `low < halfway' branch via FLOAT-RATIO-
+ ;; FLOAT. Value 11/10 * least-positive is 1.1 denormal-units; the
+ ;; loop sees several bits of low and the comparison must take the
+ ;; round-down path so the mantissa stays at 1.
+ (let ((x (* 11/10 (expt 2 -149))))
+ (assert-equal #x00000001
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-low-above-halfway.single
+ (:tag :issues)
+ ;; ROUND-DENORMAL's `low > halfway' branch: 17/10 * least-positive
+ ;; (= 1.7 denormal-units), past halfway between 1 and 2, rounds up.
+ (let ((x (* 17/10 (expt 2 -149))))
+ (assert-equal #x00000002
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-tie-to-even.single
+ (:tag :issues)
+ ;; ROUND-DENORMAL's tie path, REM = 0: exact halfway between two
+ ;; denormals rounds to even. 3/2 * least-positive ties between
+ ;; denormals 1 and 2; the even neighbour is 2.
+ (let ((x (* 3/2 (expt 2 -149))))
+ (assert-equal #x00000002
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; 5/2 * least-positive ties between 2 and 3; even is 2.
+ (let ((x (* 5/2 (expt 2 -149))))
+ (assert-equal #x00000002
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; 7/2 * least-positive ties between 3 and 4; even is 4.
+ (let ((x (* 7/2 (expt 2 -149))))
+ (assert-equal #x00000004
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; 1/2 * least-positive ties between 0 and 1; even is 0.
+ (let ((x (* 1/2 (expt 2 -149))))
+ (assert-equal #x00000000
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-tie-with-sticky.single
+ (:tag :issues)
+ ;; ROUND-DENORMAL's tie path with REM != 0: when the loop's
+ ;; FRACTION-AND-GUARD reaches an exact halfway pattern but the
+ ;; division has a nonzero remainder, the rounding must go up because
+ ;; the original rational is strictly above halfway. 3/2 * least-
+ ;; positive + a tiny fraction of least-positive lands just past tie 1-2,
+ ;; rounds up to 2.
+ (let ((x (+ (* 3/2 (expt 2 -149))
+ (* (expt 2 -149) 1/1000000000))))
+ (assert-equal #x00000002
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; 1/2 + tiny: just past tie 0-1, rounds up to 1.
+ (let ((x (+ (* 1/2 (expt 2 -149))
+ (* (expt 2 -149) 1/1000000000))))
+ (assert-equal #x00000001
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-excess.single
+ (:tag :issues)
+ ;; Exercise DENORMAL-EXCESS over a range of magnitudes. EXCESS
+ ;; varies inversely with the result's magnitude; each case has the
+ ;; result land on a chosen denormal so an off-by-one in the EXCESS
+ ;; computation would shift the result by a factor of two.
+ ;; EXCESS = 1: value near smallest normal; pick (2^23 - 1) * 2^-149,
+ ;; the largest denormal.
+ (let ((x (* (1- (expt 2 23)) (expt 2 -149))))
+ (assert-equal #x007fffff
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; EXCESS = 4: small denormal, mantissa 2^19.
+ (let ((x (expt 2 -130)))
+ (assert-equal (ash 1 19)
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; EXCESS = 21: very small denormal, mantissa 8.
+ (let ((x (* 8 (expt 2 -149))))
+ (assert-equal #x00000008
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; EXCESS = 23 (the maximum): only the bottom three denormals are
+ ;; reachable. 3 * least-positive gives mantissa 3.
+ (let ((x (* 3 (expt 2 -149))))
+ (assert-equal #x00000003
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-carry-to-normal.single
+ (:tag :issues)
+ ;; DENORMAL-FROM-BITS's carry-into-smallest-normal branch. Rounding
+ ;; promotes the denormal mantissa to 2^(DIGITS-1), which doesn't fit
+ ;; in the denormal's stored mantissa width; the result must be the
+ ;; smallest normal (stored exponent = 1, mantissa = 0).
+ ;;
+ ;; (2^24 - 1)/2 * 2^-149 is an exact tie between the largest denormal
+ ;; (mantissa 2^23 - 1) and the smallest normal (2^-126); the latter
+ ;; has stored mantissa 0 which is even, so the tie rounds to it.
+ (let ((x (* (1- (expt 2 24)) 1/2 (expt 2 -149))))
+ (assert-equal #x00800000
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; Just past that tie, also rounds up to smallest normal.
+ (let ((x (+ (* (1- (expt 2 24)) 1/2 (expt 2 -149))
+ (* (expt 2 -149) 1/1000))))
+ (assert-equal #x00800000
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x))
+ ;; Just below that tie, rounds down to the largest denormal.
+ (let ((x (- (* (1- (expt 2 24)) 1/2 (expt 2 -149))
+ (* (expt 2 -149) 1/1000))))
+ (assert-equal #x007fffff
+ (kernel:single-float-bits
+ (kernel::float-ratio-float x 'single-float))
+ x)))
+
+(define-test float-ratio-float.denormal-double-rounding.double
+ (:tag :issues)
+ ;; Double-float equivalents. Pick a rational that lands strictly
+ ;; below halfway between two denormals after 53-bit rounding would
+ ;; otherwise lift to the halfway position. Verify a few denormal
+ ;; cases also work end-to-end through FLOAT-RATIO-FLOAT.
+ ;;
+ ;; 1.1d-322 -> mantissa #x16 (= 22).
+ (assert-equal 0 (kernel:double-float-high-bits 1.1d-322))
+ (assert-equal #x16 (kernel:double-float-low-bits 1.1d-322))
+ ;; Tie-to-even, double: 3/2 * least-positive ties between denormals
+ ;; 1 and 2; even neighbour is 2.
+ (let ((x (* 3/2 (expt 2 -1074))))
+ (assert-equal 0 (kernel:double-float-high-bits
+ (kernel::float-ratio-float x 'double-float)))
+ (assert-equal 2 (kernel:double-float-low-bits
+ (kernel::float-ratio-float x 'double-float))))
+ ;; Carry into smallest normal: (2^53 - 1)/2 * 2^-1074 exactly halfway
+ ;; between the largest double denormal and the smallest normal;
+ ;; rounds up to the smallest normal #x00100000_00000000.
+ (let* ((x (* (1- (expt 2 53)) 1/2 (expt 2 -1074)))
+ (r (kernel::float-ratio-float x 'double-float)))
+ (assert-equal #x00100000 (kernel:double-float-high-bits r) x)
+ (assert-equal 0 (kernel:double-float-low-bits r) x)))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/06740b4098b7d3f22bb8ea…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/06740b4098b7d3f22bb8ea…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][issue-317-string-compare-by-code-unit] 8 commits: Fix #508: Save ansi-test log even if it fails CI
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch issue-317-string-compare-by-code-unit at cmucl / cmucl
Commits:
f373711b by Raymond Toy at 2026-05-31T17:57:04-07:00
Fix #508: Save ansi-test log even if it fails CI
- - - - -
1b0a461c by Raymond Toy at 2026-05-31T17:57:04-07:00
Merge branch 'issue-508-ansi-test-log-artifact-on-failure' into 'master'
Fix #508: Save ansi-test log even if it fails CI
Closes #508
See merge request cmucl/cmucl!382
- - - - -
fad6dd9e by Raymond Toy at 2026-06-02T17:03:21-07:00
Fix #512: CI retries select stages
- - - - -
47971b15 by Raymond Toy at 2026-06-02T17:03:21-07:00
Merge branch 'issue-512-ci-retries-select-stages' into 'master'
Fix #512: CI retries select stages
Closes #512
See merge request cmucl/cmucl!386
- - - - -
2aa2eff9 by Raymond Toy at 2026-06-26T09:56:12-07:00
Fix #318: Add standard-char type
- - - - -
06740b40 by Raymond Toy at 2026-06-26T09:56:13-07:00
Merge branch 'issue-318-add-concrete-standard-char-type' into 'master'
Fix #318: Add standard-char type
Closes #318
See merge request cmucl/cmucl!377
- - - - -
15598355 by Raymond Toy at 2026-06-26T12:59:09-07:00
Merge branch 'master' into issue-317-string-compare-by-code-unit
- - - - -
7cfe5c42 by Raymond Toy at 2026-06-26T12:59:58-07:00
Use normal cmucl-expected-failures branch for ansi-test
Since #318 has been fixed, the `etypecase.15` error shouldn't be
happening anymore, so we can use the normal cmucl-expected-failures
branch for the ansi-tests.
- - - - -
7 changed files:
- .gitlab-ci.yml
- bin/run-ansi-tests.sh
- src/code/exports.lisp
- src/code/pred.lisp
- src/code/type.lisp
- src/i18n/locale/cmucl.pot
- + tests/standard-char.lisp
Changes:
=====================================
.gitlab-ci.yml
=====================================
@@ -29,6 +29,7 @@ workflow:
# for building.
.install:
stage: install
+ retry: 1
artifacts:
paths:
- snapshot/
@@ -85,15 +86,22 @@ workflow:
.ansi-test:
stage: ansi-test
artifacts:
+ # Always save artifacts even when the job fails.
+ when: always
paths:
- ansi-test.out
+ retry: 1
script:
- bin/run-ansi-tests.sh -l dist/bin/lisp
+ after_script:
+ # This after_script always runs to save the log file even if
+ # run-ansi-tests.sh fails.
- cp ../ansi-test/test.out ansi-test.out
# Default configuration for running unit tests.
.unit-test:
stage: test
+ retry: 1
artifacts:
paths:
- test.log
=====================================
bin/run-ansi-tests.sh
=====================================
@@ -36,7 +36,7 @@ shift $((OPTIND - 1))
# Use branch cmucl-expected-failures in general since this branch
# generally has the list of expected failures. This is the branch to
# use on cmucl master in general.
-BRANCH=cmucl-expected-failures-issue-317
+BRANCH=cmucl-expected-failures
set -x
if [ -d ../ansi-test ]; then
=====================================
src/code/exports.lisp
=====================================
@@ -2190,7 +2190,11 @@
"STANDARD-PPRINT-DISPATCH-TABLE-MODIFIED-ERROR"
"%IEEE754-REM-PI/2"
- "%SINCOS")
+ "%SINCOS"
+
+ "STANDARD-CHAR-TYPE"
+ "MAKE-STANDARD-CHAR-TYPE"
+ "STANDARD-CHAR-TYPE-P")
#+heap-overflow-check
(:export "DYNAMIC-SPACE-OVERFLOW-WARNING-HIT"
"DYNAMIC-SPACE-OVERFLOW-ERROR-HIT"
=====================================
src/code/pred.lisp
=====================================
@@ -291,6 +291,9 @@
(and (consp object)
(%%typep (car object) (cons-type-car-type type))
(%%typep (cdr object) (cons-type-cdr-type type))))
+ (kernel::standard-char-type
+ (and (characterp object)
+ (standard-char-p object)))
(unknown-type
;; Parse it again to make sure it's really undefined.
(let ((reparse (specifier-type (unknown-type-specifier type))))
=====================================
src/code/type.lisp
=====================================
@@ -52,6 +52,7 @@
(define-type-class intersection)
(define-type-class alien)
(define-type-class cons)
+(define-type-class standard-char)
;;; The Args-Type structure is used both to represent Values types and
;;; and Function types.
@@ -363,6 +364,16 @@
*empty-type*
(%make-cons-type car-type cdr-type)))
+(defstruct (standard-char-type
+ (:include ctype
+ (class-info (type-class-or-lose 'standard-char))
+ (:enumerable t))
+ (:constructor %make-standard-char-type ())
+ (:copier nil)
+ (:print-function %print-type)))
+
+(defun make-standard-char-type ()
+ (%make-standard-char-type))
;;;
@@ -3293,6 +3304,121 @@
(cons-type-car-type type2))
cdr-int2)))))
+
+;;;; Standard-char type
+(def-type-translator standard-char ()
+ (make-standard-char-type))
+
+(define-type-method (standard-char :unparse) (type)
+ (declare (ignore type))
+ 'standard-char)
+
+(define-type-method (standard-char :simple-=) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(define-type-method (standard-char :simple-subtypep) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(defconstant +standard-chars+
+ '(#\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
+ #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
+ #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
+ #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
+ #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
+ #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
+ #\| #\} #\~)
+ "The set of characters in the STANDARD-CHAR type")
+
+(define-type-method (standard-char :simple-union) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+(define-type-method (standard-char :simple-intersection) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+;; (subtype standard-char other)
+(define-type-method (standard-char :complex-subtypep-arg1) (type1 type2)
+ (declare (ignore type1))
+ (cond ((csubtypep (specifier-type 'character) type2)
+ ;; STANDARD-CHAR is a subtype of CHARACTER/BASE-CHAR
+ (values t t))
+ ((member-type-p type2)
+ ;; If TYPE2 is a member-type, check whether it contains all standard-chars
+ (values (let ((members (member-type-members type2)))
+ (every #'(lambda (c)
+ (member c members))
+ +standard-chars+))
+ t))
+ (t
+ (values nil t))))
+
+;; (subtypep other standard-char)
+(define-type-method (standard-char :complex-subtypep-arg2) (type1 type2)
+ (declare (ignore type2))
+ (cond ((member-type-p type1)
+ ;; If TYPE1 is a member-type, check whether it contains all
+ ;; standard-chars.
+ (values (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members type1))
+ t))
+ (t
+ (values nil t))))
+
+(define-type-method (standard-char :complex-union) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ other)
+ ((and (member-type-p other)
+ ;; Check to see every member of OTHER is a STANDARD-CHAR.
+ (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members other)))
+ sc)
+ (t nil))))
+
+(define-type-method (standard-char :complex-intersection) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ ;; STANDARD-CHAR intersect any super-type of CHARACTER is a
+ ;; STANDARD-CHAR.
+ sc)
+ (t
+ (block punt
+ ;; Look through OTHER and find OTHER contains any standard
+ ;; character. If so, collect them all. If there are, the
+ ;; intersection is a member-type of the collected characters.
+ (collect ((members))
+ (dolist (ch +standard-chars+)
+ (multiple-value-bind (val win)
+ (ctypep ch other)
+ (unless win
+ (return-from punt nil))
+ (when val
+ (members ch))))
+ (cond ((null (members))
+ c::*empty-type*)
+ ((= (length (members))
+ (length +standard-chars+))
+ sc)
+ (t
+ (make-member-type :members (members))))))))))
+
;;; TYPE-DIFFERENCE -- Interface
;;;
@@ -3379,7 +3505,8 @@
(declare (type ctype type))
(etypecase type
((or numeric-type named-type member-type array-type
- kernel::built-in-class cons-type)
+ kernel::built-in-class cons-type
+ standard-char-type)
(values (%typep obj type) t))
(class
(if (if (csubtypep type (specifier-type 'funcallable-instance))
@@ -3520,16 +3647,6 @@
"Type of characters that aren't base-char's. None in CMU CL."
'(and character (not base-char)))
-(deftype standard-char ()
- "Type corresponding to the charaters required by the standard."
- '(member #\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
- #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
- #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
- #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
- #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
- #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
- #\| #\} #\~))
-
(deftype keyword ()
"Type for any keyword symbol."
'(and symbol (satisfies keywordp)))
=====================================
src/i18n/locale/cmucl.pot
=====================================
@@ -1225,11 +1225,11 @@ msgid "Array dimensions is not a list, integer or *:~% ~S"
msgstr ""
#: src/code/type.lisp
-msgid "Type of characters that aren't base-char's. None in CMU CL."
+msgid "The set of characters in the STANDARD-CHAR type"
msgstr ""
#: src/code/type.lisp
-msgid "Type corresponding to the charaters required by the standard."
+msgid "Type of characters that aren't base-char's. None in CMU CL."
msgstr ""
#: src/code/type.lisp
=====================================
tests/standard-char.lisp
=====================================
@@ -0,0 +1,367 @@
+;;; Tests for standard-char
+
+(defpackage :standard-char-tests
+ (:use :cl :lisp-unit))
+
+(in-package "STANDARD-CHAR-TESTS")
+
+;; For the following tests, we generally want to use
+;; kernel::type-intersection and kernel::type-union directly to make
+;; sure we test the intersection and union methods for standard-char.
+
+(define-test standard-char.typep
+ (:tag :issues)
+ (assert-true (typep #\a 'standard-char))
+ (assert-false (typep #\tab 'standard-char))
+ (assert-true (typep #\Z 'standard-char))
+ (assert-true (typep #\Space 'standard-char))
+ (assert-true (typep #\Newline 'standard-char))
+ (assert-false (typep #\Rubout 'standard-char))
+ (assert-false (typep 5 'standard-char))
+ (assert-false (typep "hello" 'standard-char))
+ (assert-false (typep nil 'standard-char))
+ (assert-false (typep t 'standard-char))
+
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'base-char))
+ (assert-equal (values nil t)
+ (subtypep 'base-char 'standard-char)))
+
+(define-test standard-char.etypecase-15
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type=
+ (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname))))))
+
+(define-test standard-char.identity
+ (:tag :issues)
+ (let ((a (c::specifier-type 'standard-char))
+ (b (c::specifier-type 'standard-char)))
+ ;; Should be EQ due to internal caching.
+ (assert-eq a b)))
+
+(define-test standard-char.parsing
+ (:tag :issues)
+ (assert-eq 'standard-char
+ (c::type-specifier (c::specifier-type 'standard-char))))
+
+(define-test standard-char.predicate
+ (:tag :issues)
+ (assert-true (c::standard-char-type-p (c::specifier-type 'standard-char))))
+
+(define-test standard-char.simple-subtypep
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type 'standard-char)
+ (c::specifier-type 'standard-char)))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'standard-char)))
+
+(define-test standard-char.complex-subtype-arg1
+ (:tag :issues)
+ ;; STANDARD-CHAR is a subtype of CHARACTER and T.
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values t t)
+ (subtypep 'standard-char t))
+
+ ;; Not a subtype of disjoint types.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'integer))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'symbol))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'pathname))
+
+ ;; Subtype of a member-type that contains all standard chars.
+ (assert-equal (values t t)
+ (subtypep 'standard-char
+ `(member ,@kernel::+standard-chars+)))
+ ;; Not a subtype of a member-type missing even one standard char.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char '(member #\a))))
+
+(define-test standard-char.complex-subtypep-arg
+ (:tag :issues)
+ ;; All standard chars: subtype.
+ (assert-equal (values t t)
+ (subtypep '(member #\a) 'standard-char))
+ (assert-equal (values t t)
+ (subtypep '(member #\Space #\Newline) 'standard-char))
+
+ ;; Mixed — character but not standard.
+ (assert-equal (values nil t)
+ (subtypep '(member #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\Rubout) 'standard-char))
+
+ ;; Mixed — non-character members. This was the crash case.
+ (assert-equal (values nil t)
+ (subtypep '(member t) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member t nil) 'standard-char))
+
+ ;; Mixed — some standard, some not.
+ (assert-equal (values nil t)
+ (subtypep '(member #\a #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\a t) 'standard-char))
+
+ ;; CHARACTER is not a subtype of STANDARD-CHAR (non-standard chars exist).
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char)))
+
+(define-test standard-char.complex-union
+ (:tag :issues)
+ ;; Absorbed by supertype.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'character)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 't)))
+
+ ;; All-standard-chars member-type absorbed back into STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\b)))
+ (c::specifier-type 'standard-char)))
+
+ ;; Disjoint type stays as a union (the bug-fix case).
+ ;; The result should NOT be a single member-type containing
+ ;; T, NIL, and 96 standard chars.
+ (let ((result (c::specifier-type '(or boolean standard-char))))
+ (assert-true (c::union-type-p result))
+ (assert-equal 2 (length (c::union-type-types result)))
+ (assert-true (notany (lambda (m)
+ (and (c::member-type-p m)
+ (some #'characterp (c::member-type-members m))
+ (some (complement #'characterp)
+ (c::member-type-members m))))
+ (c::union-type-types result))))
+
+
+ ;; Permutation invariance — the original etypecase.15 trigger.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(or boolean standard-char))
+ (c::specifier-type '(or standard-char boolean))))
+
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname)))))
+
+ ;; Member-type with non-standard chars — kept symbolically separate.
+ (let ((result (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab)))))
+ ;; Should not collapse into a 97-element MEMBER.
+ (assert-false (c::member-type-p result))
+ (assert-true (c::union-type-p result))))
+
+(define-test standard-char.complex-intersection
+ (:tag :issues)
+ ;; Intersection with supertype is STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'standard-char)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 'standard-char)))
+
+ ;; Intersection with disjoint type is empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'integer))
+ c::*empty-type*))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'symbol))
+ c::*empty-type*))
+
+ ;; Intersection with member-type — filtered to standard chars.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\Tab #\b)))
+ (c::specifier-type '(member #\a #\b))))
+
+ ;; All-non-standard members → empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab #\Rubout)))
+ c::*empty-type*))
+
+ ;; All-standard members → that member-type unchanged.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a)))
+ (c::specifier-type '(member #\a)))))
+
+
+
+(define-test standard-char.negation
+ (:tag :issues)
+ ;; NOT STANDARD-CHAR catches non-standard characters.
+ (assert-true (typep #\Tab '(not standard-char)))
+ (assert-false (typep #\a '(not standard-char)))
+
+ ;; AND CHARACTER (NOT STANDARD-CHAR) is the non-standard chars.
+ (assert-true (typep #\Tab '(and character (not standard-char))))
+ (assert-false (typep #\a '(and character (not standard-char))))
+ (assert-false (typep 5 '(and character (not standard-char))))
+
+ ;; Permutation invariance with negation, multiple types.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(and standard-char (not (member #\a))))
+ (c::specifier-type '(and (not (member #\a)) standard-char)))))
+
+(define-test standard-char.etypecase
+ (:tag :issues)
+ ;; Test that etypecase works using ASCII characters which will cover
+ ;; standard-char values and other characters.
+ (dotimes (k 128)
+ (let* ((ch (code-char k))
+ (expected (if (standard-char-p ch)
+ :is-standard :is-other))
+ (actual (handler-case
+ (etypecase ch
+ (standard-char :is-standard)
+ (character :is-other))
+ (error ()
+ :error))))
+ (assert-eql expected actual ch))))
+
+(define-test standard-char.intersection-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char intersect character = standard-char, regardless of argument order.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-disjoint-both-orderings
+ (:tag :issues)
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'integer))))
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'integer)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-member-both-orderings
+ (:tag :issues)
+ ;; Filter member-type to standard chars only.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\Tab #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type '(member #\a #\Tab #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char union character = character.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-member-of-standard-both-orderings
+ (:tag :issues)
+ ;; Standard-char absorbs all-standard member-type.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type '(member #\a #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-disjoint-stays-symbolic-both-orderings
+ (:tag :issues)
+ ;; (or boolean standard-char) and reverse — both should stay symbolic
+ ;; rather than collapsing into a giant member-type.
+ (let ((r1 (kernel::specifier-type '(or boolean standard-char)))
+ (r2 (kernel::specifier-type '(or standard-char boolean))))
+ (assert-true (kernel::union-type-p r1))
+ (assert-true (kernel::union-type-p r2))
+ (assert-equal (values t t)
+ (kernel::type= r1 r2))
+ ;; Neither should contain a member-type with both characters
+ ;; and non-characters.
+ (dolist (m (kernel::union-type-types r1))
+ (assert-false (and (kernel::member-type-p m)
+ (some #'characterp (kernel::member-type-members m))
+ (some (complement #'characterp)
+ (kernel::member-type-members m)))))))
+
+(defun assert-commutative-union (type-a-spec type-b-spec)
+ "Assert that union(A, B) and union(B, A) produce type= results."
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-union (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-union (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(defun assert-commutative-intersection (type-a-spec type-b-spec)
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-intersection (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-intersection (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(define-test standard-char.commutativity
+ (:tag :issues)
+ (assert-commutative-union 'standard-char 'character)
+ (assert-commutative-union 'standard-char 'integer)
+ (assert-commutative-union 'standard-char '(member #\a #\b))
+ (assert-commutative-union 'standard-char '(member #\Tab))
+ (assert-commutative-union 'standard-char 'boolean)
+ (assert-commutative-union 'standard-char '(not character))
+ (assert-commutative-union 'standard-char 't)
+ (assert-commutative-intersection 'standard-char 'character)
+ (assert-commutative-intersection 'standard-char 'integer)
+ (assert-commutative-intersection 'standard-char '(member #\a #\b))
+ (assert-commutative-intersection 'standard-char '(member #\Tab))
+ (assert-commutative-intersection 'standard-char 'boolean)
+ (assert-commutative-intersection 'standard-char '(not character))
+ (assert-commutative-intersection 'standard-char 't))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/d051b51a054ffd8eb58db9…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/d051b51a054ffd8eb58db9…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][issue-504-read-denormals-with-rounding] 8 commits: Fix #508: Save ansi-test log even if it fails CI
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch issue-504-read-denormals-with-rounding at cmucl / cmucl
Commits:
f373711b by Raymond Toy at 2026-05-31T17:57:04-07:00
Fix #508: Save ansi-test log even if it fails CI
- - - - -
1b0a461c by Raymond Toy at 2026-05-31T17:57:04-07:00
Merge branch 'issue-508-ansi-test-log-artifact-on-failure' into 'master'
Fix #508: Save ansi-test log even if it fails CI
Closes #508
See merge request cmucl/cmucl!382
- - - - -
fad6dd9e by Raymond Toy at 2026-06-02T17:03:21-07:00
Fix #512: CI retries select stages
- - - - -
47971b15 by Raymond Toy at 2026-06-02T17:03:21-07:00
Merge branch 'issue-512-ci-retries-select-stages' into 'master'
Fix #512: CI retries select stages
Closes #512
See merge request cmucl/cmucl!386
- - - - -
2aa2eff9 by Raymond Toy at 2026-06-26T09:56:12-07:00
Fix #318: Add standard-char type
- - - - -
06740b40 by Raymond Toy at 2026-06-26T09:56:13-07:00
Merge branch 'issue-318-add-concrete-standard-char-type' into 'master'
Fix #318: Add standard-char type
Closes #318
See merge request cmucl/cmucl!377
- - - - -
c8825b9f by Raymond Toy at 2026-06-26T12:31:14-07:00
Merge branch 'master' into issue-504-read-denormals-with-rounding
- - - - -
5896e64b by Raymond Toy at 2026-06-26T12:34:24-07:00
Get sign from integer-decode-float
Instead of calling `float-sign` to get the sign,
`integer-decode-float` already computes the sign (as -1 or +1), so use
that to determine the sign bit to use.
- - - - -
7 changed files:
- .gitlab-ci.yml
- src/code/exports.lisp
- src/code/float.lisp
- src/code/pred.lisp
- src/code/type.lisp
- src/i18n/locale/cmucl.pot
- + tests/standard-char.lisp
Changes:
=====================================
.gitlab-ci.yml
=====================================
@@ -29,6 +29,7 @@ workflow:
# for building.
.install:
stage: install
+ retry: 1
artifacts:
paths:
- snapshot/
@@ -85,15 +86,22 @@ workflow:
.ansi-test:
stage: ansi-test
artifacts:
+ # Always save artifacts even when the job fails.
+ when: always
paths:
- ansi-test.out
+ retry: 1
script:
- bin/run-ansi-tests.sh -l dist/bin/lisp
+ after_script:
+ # This after_script always runs to save the log file even if
+ # run-ansi-tests.sh fails.
- cp ../ansi-test/test.out ansi-test.out
# Default configuration for running unit tests.
.unit-test:
stage: test
+ retry: 1
artifacts:
paths:
- test.log
=====================================
src/code/exports.lisp
=====================================
@@ -2190,7 +2190,11 @@
"STANDARD-PPRINT-DISPATCH-TABLE-MODIFIED-ERROR"
"%IEEE754-REM-PI/2"
- "%SINCOS")
+ "%SINCOS"
+
+ "STANDARD-CHAR-TYPE"
+ "MAKE-STANDARD-CHAR-TYPE"
+ "STANDARD-CHAR-TYPE-P")
#+heap-overflow-check
(:export "DYNAMIC-SPACE-OVERFLOW-WARNING-HIT"
"DYNAMIC-SPACE-OVERFLOW-ERROR-HIT"
=====================================
src/code/float.lisp
=====================================
@@ -887,7 +887,7 @@
(defun scale-float-maybe-underflow (x exp)
(declare (type (or single-float double-float) x)
(fixnum exp))
- (multiple-value-bind (sig old-exp)
+ (multiple-value-bind (sig old-exp float-sign)
(integer-decode-float x)
(let* ((digits (float-digits x))
(1+digits (1+ digits))
@@ -895,7 +895,7 @@
(etypecase x
(single-float vm:single-float-bias)
(double-float vm:double-float-bias))))
- (sign (if (minusp (float-sign x)) 1 0)))
+ (sign (if (minusp float-sign) 1 0)))
(cond
((< new-exp
(etypecase x
=====================================
src/code/pred.lisp
=====================================
@@ -291,6 +291,9 @@
(and (consp object)
(%%typep (car object) (cons-type-car-type type))
(%%typep (cdr object) (cons-type-cdr-type type))))
+ (kernel::standard-char-type
+ (and (characterp object)
+ (standard-char-p object)))
(unknown-type
;; Parse it again to make sure it's really undefined.
(let ((reparse (specifier-type (unknown-type-specifier type))))
=====================================
src/code/type.lisp
=====================================
@@ -52,6 +52,7 @@
(define-type-class intersection)
(define-type-class alien)
(define-type-class cons)
+(define-type-class standard-char)
;;; The Args-Type structure is used both to represent Values types and
;;; and Function types.
@@ -363,6 +364,16 @@
*empty-type*
(%make-cons-type car-type cdr-type)))
+(defstruct (standard-char-type
+ (:include ctype
+ (class-info (type-class-or-lose 'standard-char))
+ (:enumerable t))
+ (:constructor %make-standard-char-type ())
+ (:copier nil)
+ (:print-function %print-type)))
+
+(defun make-standard-char-type ()
+ (%make-standard-char-type))
;;;
@@ -3293,6 +3304,121 @@
(cons-type-car-type type2))
cdr-int2)))))
+
+;;;; Standard-char type
+(def-type-translator standard-char ()
+ (make-standard-char-type))
+
+(define-type-method (standard-char :unparse) (type)
+ (declare (ignore type))
+ 'standard-char)
+
+(define-type-method (standard-char :simple-=) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(define-type-method (standard-char :simple-subtypep) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(defconstant +standard-chars+
+ '(#\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
+ #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
+ #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
+ #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
+ #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
+ #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
+ #\| #\} #\~)
+ "The set of characters in the STANDARD-CHAR type")
+
+(define-type-method (standard-char :simple-union) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+(define-type-method (standard-char :simple-intersection) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+;; (subtype standard-char other)
+(define-type-method (standard-char :complex-subtypep-arg1) (type1 type2)
+ (declare (ignore type1))
+ (cond ((csubtypep (specifier-type 'character) type2)
+ ;; STANDARD-CHAR is a subtype of CHARACTER/BASE-CHAR
+ (values t t))
+ ((member-type-p type2)
+ ;; If TYPE2 is a member-type, check whether it contains all standard-chars
+ (values (let ((members (member-type-members type2)))
+ (every #'(lambda (c)
+ (member c members))
+ +standard-chars+))
+ t))
+ (t
+ (values nil t))))
+
+;; (subtypep other standard-char)
+(define-type-method (standard-char :complex-subtypep-arg2) (type1 type2)
+ (declare (ignore type2))
+ (cond ((member-type-p type1)
+ ;; If TYPE1 is a member-type, check whether it contains all
+ ;; standard-chars.
+ (values (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members type1))
+ t))
+ (t
+ (values nil t))))
+
+(define-type-method (standard-char :complex-union) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ other)
+ ((and (member-type-p other)
+ ;; Check to see every member of OTHER is a STANDARD-CHAR.
+ (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members other)))
+ sc)
+ (t nil))))
+
+(define-type-method (standard-char :complex-intersection) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ ;; STANDARD-CHAR intersect any super-type of CHARACTER is a
+ ;; STANDARD-CHAR.
+ sc)
+ (t
+ (block punt
+ ;; Look through OTHER and find OTHER contains any standard
+ ;; character. If so, collect them all. If there are, the
+ ;; intersection is a member-type of the collected characters.
+ (collect ((members))
+ (dolist (ch +standard-chars+)
+ (multiple-value-bind (val win)
+ (ctypep ch other)
+ (unless win
+ (return-from punt nil))
+ (when val
+ (members ch))))
+ (cond ((null (members))
+ c::*empty-type*)
+ ((= (length (members))
+ (length +standard-chars+))
+ sc)
+ (t
+ (make-member-type :members (members))))))))))
+
;;; TYPE-DIFFERENCE -- Interface
;;;
@@ -3379,7 +3505,8 @@
(declare (type ctype type))
(etypecase type
((or numeric-type named-type member-type array-type
- kernel::built-in-class cons-type)
+ kernel::built-in-class cons-type
+ standard-char-type)
(values (%typep obj type) t))
(class
(if (if (csubtypep type (specifier-type 'funcallable-instance))
@@ -3520,16 +3647,6 @@
"Type of characters that aren't base-char's. None in CMU CL."
'(and character (not base-char)))
-(deftype standard-char ()
- "Type corresponding to the charaters required by the standard."
- '(member #\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
- #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
- #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
- #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
- #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
- #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
- #\| #\} #\~))
-
(deftype keyword ()
"Type for any keyword symbol."
'(and symbol (satisfies keywordp)))
=====================================
src/i18n/locale/cmucl.pot
=====================================
@@ -1225,11 +1225,11 @@ msgid "Array dimensions is not a list, integer or *:~% ~S"
msgstr ""
#: src/code/type.lisp
-msgid "Type of characters that aren't base-char's. None in CMU CL."
+msgid "The set of characters in the STANDARD-CHAR type"
msgstr ""
#: src/code/type.lisp
-msgid "Type corresponding to the charaters required by the standard."
+msgid "Type of characters that aren't base-char's. None in CMU CL."
msgstr ""
#: src/code/type.lisp
=====================================
tests/standard-char.lisp
=====================================
@@ -0,0 +1,367 @@
+;;; Tests for standard-char
+
+(defpackage :standard-char-tests
+ (:use :cl :lisp-unit))
+
+(in-package "STANDARD-CHAR-TESTS")
+
+;; For the following tests, we generally want to use
+;; kernel::type-intersection and kernel::type-union directly to make
+;; sure we test the intersection and union methods for standard-char.
+
+(define-test standard-char.typep
+ (:tag :issues)
+ (assert-true (typep #\a 'standard-char))
+ (assert-false (typep #\tab 'standard-char))
+ (assert-true (typep #\Z 'standard-char))
+ (assert-true (typep #\Space 'standard-char))
+ (assert-true (typep #\Newline 'standard-char))
+ (assert-false (typep #\Rubout 'standard-char))
+ (assert-false (typep 5 'standard-char))
+ (assert-false (typep "hello" 'standard-char))
+ (assert-false (typep nil 'standard-char))
+ (assert-false (typep t 'standard-char))
+
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'base-char))
+ (assert-equal (values nil t)
+ (subtypep 'base-char 'standard-char)))
+
+(define-test standard-char.etypecase-15
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type=
+ (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname))))))
+
+(define-test standard-char.identity
+ (:tag :issues)
+ (let ((a (c::specifier-type 'standard-char))
+ (b (c::specifier-type 'standard-char)))
+ ;; Should be EQ due to internal caching.
+ (assert-eq a b)))
+
+(define-test standard-char.parsing
+ (:tag :issues)
+ (assert-eq 'standard-char
+ (c::type-specifier (c::specifier-type 'standard-char))))
+
+(define-test standard-char.predicate
+ (:tag :issues)
+ (assert-true (c::standard-char-type-p (c::specifier-type 'standard-char))))
+
+(define-test standard-char.simple-subtypep
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type 'standard-char)
+ (c::specifier-type 'standard-char)))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'standard-char)))
+
+(define-test standard-char.complex-subtype-arg1
+ (:tag :issues)
+ ;; STANDARD-CHAR is a subtype of CHARACTER and T.
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values t t)
+ (subtypep 'standard-char t))
+
+ ;; Not a subtype of disjoint types.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'integer))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'symbol))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'pathname))
+
+ ;; Subtype of a member-type that contains all standard chars.
+ (assert-equal (values t t)
+ (subtypep 'standard-char
+ `(member ,@kernel::+standard-chars+)))
+ ;; Not a subtype of a member-type missing even one standard char.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char '(member #\a))))
+
+(define-test standard-char.complex-subtypep-arg
+ (:tag :issues)
+ ;; All standard chars: subtype.
+ (assert-equal (values t t)
+ (subtypep '(member #\a) 'standard-char))
+ (assert-equal (values t t)
+ (subtypep '(member #\Space #\Newline) 'standard-char))
+
+ ;; Mixed — character but not standard.
+ (assert-equal (values nil t)
+ (subtypep '(member #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\Rubout) 'standard-char))
+
+ ;; Mixed — non-character members. This was the crash case.
+ (assert-equal (values nil t)
+ (subtypep '(member t) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member t nil) 'standard-char))
+
+ ;; Mixed — some standard, some not.
+ (assert-equal (values nil t)
+ (subtypep '(member #\a #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\a t) 'standard-char))
+
+ ;; CHARACTER is not a subtype of STANDARD-CHAR (non-standard chars exist).
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char)))
+
+(define-test standard-char.complex-union
+ (:tag :issues)
+ ;; Absorbed by supertype.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'character)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 't)))
+
+ ;; All-standard-chars member-type absorbed back into STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\b)))
+ (c::specifier-type 'standard-char)))
+
+ ;; Disjoint type stays as a union (the bug-fix case).
+ ;; The result should NOT be a single member-type containing
+ ;; T, NIL, and 96 standard chars.
+ (let ((result (c::specifier-type '(or boolean standard-char))))
+ (assert-true (c::union-type-p result))
+ (assert-equal 2 (length (c::union-type-types result)))
+ (assert-true (notany (lambda (m)
+ (and (c::member-type-p m)
+ (some #'characterp (c::member-type-members m))
+ (some (complement #'characterp)
+ (c::member-type-members m))))
+ (c::union-type-types result))))
+
+
+ ;; Permutation invariance — the original etypecase.15 trigger.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(or boolean standard-char))
+ (c::specifier-type '(or standard-char boolean))))
+
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname)))))
+
+ ;; Member-type with non-standard chars — kept symbolically separate.
+ (let ((result (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab)))))
+ ;; Should not collapse into a 97-element MEMBER.
+ (assert-false (c::member-type-p result))
+ (assert-true (c::union-type-p result))))
+
+(define-test standard-char.complex-intersection
+ (:tag :issues)
+ ;; Intersection with supertype is STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'standard-char)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 'standard-char)))
+
+ ;; Intersection with disjoint type is empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'integer))
+ c::*empty-type*))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'symbol))
+ c::*empty-type*))
+
+ ;; Intersection with member-type — filtered to standard chars.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\Tab #\b)))
+ (c::specifier-type '(member #\a #\b))))
+
+ ;; All-non-standard members → empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab #\Rubout)))
+ c::*empty-type*))
+
+ ;; All-standard members → that member-type unchanged.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a)))
+ (c::specifier-type '(member #\a)))))
+
+
+
+(define-test standard-char.negation
+ (:tag :issues)
+ ;; NOT STANDARD-CHAR catches non-standard characters.
+ (assert-true (typep #\Tab '(not standard-char)))
+ (assert-false (typep #\a '(not standard-char)))
+
+ ;; AND CHARACTER (NOT STANDARD-CHAR) is the non-standard chars.
+ (assert-true (typep #\Tab '(and character (not standard-char))))
+ (assert-false (typep #\a '(and character (not standard-char))))
+ (assert-false (typep 5 '(and character (not standard-char))))
+
+ ;; Permutation invariance with negation, multiple types.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(and standard-char (not (member #\a))))
+ (c::specifier-type '(and (not (member #\a)) standard-char)))))
+
+(define-test standard-char.etypecase
+ (:tag :issues)
+ ;; Test that etypecase works using ASCII characters which will cover
+ ;; standard-char values and other characters.
+ (dotimes (k 128)
+ (let* ((ch (code-char k))
+ (expected (if (standard-char-p ch)
+ :is-standard :is-other))
+ (actual (handler-case
+ (etypecase ch
+ (standard-char :is-standard)
+ (character :is-other))
+ (error ()
+ :error))))
+ (assert-eql expected actual ch))))
+
+(define-test standard-char.intersection-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char intersect character = standard-char, regardless of argument order.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-disjoint-both-orderings
+ (:tag :issues)
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'integer))))
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'integer)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-member-both-orderings
+ (:tag :issues)
+ ;; Filter member-type to standard chars only.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\Tab #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type '(member #\a #\Tab #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char union character = character.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-member-of-standard-both-orderings
+ (:tag :issues)
+ ;; Standard-char absorbs all-standard member-type.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type '(member #\a #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-disjoint-stays-symbolic-both-orderings
+ (:tag :issues)
+ ;; (or boolean standard-char) and reverse — both should stay symbolic
+ ;; rather than collapsing into a giant member-type.
+ (let ((r1 (kernel::specifier-type '(or boolean standard-char)))
+ (r2 (kernel::specifier-type '(or standard-char boolean))))
+ (assert-true (kernel::union-type-p r1))
+ (assert-true (kernel::union-type-p r2))
+ (assert-equal (values t t)
+ (kernel::type= r1 r2))
+ ;; Neither should contain a member-type with both characters
+ ;; and non-characters.
+ (dolist (m (kernel::union-type-types r1))
+ (assert-false (and (kernel::member-type-p m)
+ (some #'characterp (kernel::member-type-members m))
+ (some (complement #'characterp)
+ (kernel::member-type-members m)))))))
+
+(defun assert-commutative-union (type-a-spec type-b-spec)
+ "Assert that union(A, B) and union(B, A) produce type= results."
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-union (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-union (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(defun assert-commutative-intersection (type-a-spec type-b-spec)
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-intersection (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-intersection (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(define-test standard-char.commutativity
+ (:tag :issues)
+ (assert-commutative-union 'standard-char 'character)
+ (assert-commutative-union 'standard-char 'integer)
+ (assert-commutative-union 'standard-char '(member #\a #\b))
+ (assert-commutative-union 'standard-char '(member #\Tab))
+ (assert-commutative-union 'standard-char 'boolean)
+ (assert-commutative-union 'standard-char '(not character))
+ (assert-commutative-union 'standard-char 't)
+ (assert-commutative-intersection 'standard-char 'character)
+ (assert-commutative-intersection 'standard-char 'integer)
+ (assert-commutative-intersection 'standard-char '(member #\a #\b))
+ (assert-commutative-intersection 'standard-char '(member #\Tab))
+ (assert-commutative-intersection 'standard-char 'boolean)
+ (assert-commutative-intersection 'standard-char '(not character))
+ (assert-commutative-intersection 'standard-char 't))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/bc6c560e5d4632f897854f…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/bc6c560e5d4632f897854f…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][master] 2 commits: Fix #318: Add standard-char type
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch master at cmucl / cmucl
Commits:
2aa2eff9 by Raymond Toy at 2026-06-26T09:56:12-07:00
Fix #318: Add standard-char type
- - - - -
06740b40 by Raymond Toy at 2026-06-26T09:56:13-07:00
Merge branch 'issue-318-add-concrete-standard-char-type' into 'master'
Fix #318: Add standard-char type
Closes #318
See merge request cmucl/cmucl!377
- - - - -
5 changed files:
- src/code/exports.lisp
- src/code/pred.lisp
- src/code/type.lisp
- src/i18n/locale/cmucl.pot
- + tests/standard-char.lisp
Changes:
=====================================
src/code/exports.lisp
=====================================
@@ -2190,7 +2190,11 @@
"STANDARD-PPRINT-DISPATCH-TABLE-MODIFIED-ERROR"
"%IEEE754-REM-PI/2"
- "%SINCOS")
+ "%SINCOS"
+
+ "STANDARD-CHAR-TYPE"
+ "MAKE-STANDARD-CHAR-TYPE"
+ "STANDARD-CHAR-TYPE-P")
#+heap-overflow-check
(:export "DYNAMIC-SPACE-OVERFLOW-WARNING-HIT"
"DYNAMIC-SPACE-OVERFLOW-ERROR-HIT"
=====================================
src/code/pred.lisp
=====================================
@@ -291,6 +291,9 @@
(and (consp object)
(%%typep (car object) (cons-type-car-type type))
(%%typep (cdr object) (cons-type-cdr-type type))))
+ (kernel::standard-char-type
+ (and (characterp object)
+ (standard-char-p object)))
(unknown-type
;; Parse it again to make sure it's really undefined.
(let ((reparse (specifier-type (unknown-type-specifier type))))
=====================================
src/code/type.lisp
=====================================
@@ -52,6 +52,7 @@
(define-type-class intersection)
(define-type-class alien)
(define-type-class cons)
+(define-type-class standard-char)
;;; The Args-Type structure is used both to represent Values types and
;;; and Function types.
@@ -363,6 +364,16 @@
*empty-type*
(%make-cons-type car-type cdr-type)))
+(defstruct (standard-char-type
+ (:include ctype
+ (class-info (type-class-or-lose 'standard-char))
+ (:enumerable t))
+ (:constructor %make-standard-char-type ())
+ (:copier nil)
+ (:print-function %print-type)))
+
+(defun make-standard-char-type ()
+ (%make-standard-char-type))
;;;
@@ -3293,6 +3304,121 @@
(cons-type-car-type type2))
cdr-int2)))))
+
+;;;; Standard-char type
+(def-type-translator standard-char ()
+ (make-standard-char-type))
+
+(define-type-method (standard-char :unparse) (type)
+ (declare (ignore type))
+ 'standard-char)
+
+(define-type-method (standard-char :simple-=) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(define-type-method (standard-char :simple-subtypep) (type1 type2)
+ (declare (ignore type1 type2))
+ (values t t))
+
+(defconstant +standard-chars+
+ '(#\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
+ #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
+ #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
+ #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
+ #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
+ #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
+ #\| #\} #\~)
+ "The set of characters in the STANDARD-CHAR type")
+
+(define-type-method (standard-char :simple-union) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+(define-type-method (standard-char :simple-intersection) (type1 type2)
+ (declare (ignore type2))
+ type1)
+
+;; (subtype standard-char other)
+(define-type-method (standard-char :complex-subtypep-arg1) (type1 type2)
+ (declare (ignore type1))
+ (cond ((csubtypep (specifier-type 'character) type2)
+ ;; STANDARD-CHAR is a subtype of CHARACTER/BASE-CHAR
+ (values t t))
+ ((member-type-p type2)
+ ;; If TYPE2 is a member-type, check whether it contains all standard-chars
+ (values (let ((members (member-type-members type2)))
+ (every #'(lambda (c)
+ (member c members))
+ +standard-chars+))
+ t))
+ (t
+ (values nil t))))
+
+;; (subtypep other standard-char)
+(define-type-method (standard-char :complex-subtypep-arg2) (type1 type2)
+ (declare (ignore type2))
+ (cond ((member-type-p type1)
+ ;; If TYPE1 is a member-type, check whether it contains all
+ ;; standard-chars.
+ (values (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members type1))
+ t))
+ (t
+ (values nil t))))
+
+(define-type-method (standard-char :complex-union) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ other)
+ ((and (member-type-p other)
+ ;; Check to see every member of OTHER is a STANDARD-CHAR.
+ (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members other)))
+ sc)
+ (t nil))))
+
+(define-type-method (standard-char :complex-intersection) (type1 type2)
+ ;; The standard-char type could be in type1 or type2. Figure out
+ ;; which one is a standard-char.
+ (multiple-value-bind (sc other)
+ (if (standard-char-type-p type1)
+ (values type1 type2)
+ (values type2 type1))
+ (cond
+ ((csubtypep (specifier-type 'character) other)
+ ;; STANDARD-CHAR intersect any super-type of CHARACTER is a
+ ;; STANDARD-CHAR.
+ sc)
+ (t
+ (block punt
+ ;; Look through OTHER and find OTHER contains any standard
+ ;; character. If so, collect them all. If there are, the
+ ;; intersection is a member-type of the collected characters.
+ (collect ((members))
+ (dolist (ch +standard-chars+)
+ (multiple-value-bind (val win)
+ (ctypep ch other)
+ (unless win
+ (return-from punt nil))
+ (when val
+ (members ch))))
+ (cond ((null (members))
+ c::*empty-type*)
+ ((= (length (members))
+ (length +standard-chars+))
+ sc)
+ (t
+ (make-member-type :members (members))))))))))
+
;;; TYPE-DIFFERENCE -- Interface
;;;
@@ -3379,7 +3505,8 @@
(declare (type ctype type))
(etypecase type
((or numeric-type named-type member-type array-type
- kernel::built-in-class cons-type)
+ kernel::built-in-class cons-type
+ standard-char-type)
(values (%typep obj type) t))
(class
(if (if (csubtypep type (specifier-type 'funcallable-instance))
@@ -3520,16 +3647,6 @@
"Type of characters that aren't base-char's. None in CMU CL."
'(and character (not base-char)))
-(deftype standard-char ()
- "Type corresponding to the charaters required by the standard."
- '(member #\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
- #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
- #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
- #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
- #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
- #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
- #\| #\} #\~))
-
(deftype keyword ()
"Type for any keyword symbol."
'(and symbol (satisfies keywordp)))
=====================================
src/i18n/locale/cmucl.pot
=====================================
@@ -1225,11 +1225,11 @@ msgid "Array dimensions is not a list, integer or *:~% ~S"
msgstr ""
#: src/code/type.lisp
-msgid "Type of characters that aren't base-char's. None in CMU CL."
+msgid "The set of characters in the STANDARD-CHAR type"
msgstr ""
#: src/code/type.lisp
-msgid "Type corresponding to the charaters required by the standard."
+msgid "Type of characters that aren't base-char's. None in CMU CL."
msgstr ""
#: src/code/type.lisp
=====================================
tests/standard-char.lisp
=====================================
@@ -0,0 +1,367 @@
+;;; Tests for standard-char
+
+(defpackage :standard-char-tests
+ (:use :cl :lisp-unit))
+
+(in-package "STANDARD-CHAR-TESTS")
+
+;; For the following tests, we generally want to use
+;; kernel::type-intersection and kernel::type-union directly to make
+;; sure we test the intersection and union methods for standard-char.
+
+(define-test standard-char.typep
+ (:tag :issues)
+ (assert-true (typep #\a 'standard-char))
+ (assert-false (typep #\tab 'standard-char))
+ (assert-true (typep #\Z 'standard-char))
+ (assert-true (typep #\Space 'standard-char))
+ (assert-true (typep #\Newline 'standard-char))
+ (assert-false (typep #\Rubout 'standard-char))
+ (assert-false (typep 5 'standard-char))
+ (assert-false (typep "hello" 'standard-char))
+ (assert-false (typep nil 'standard-char))
+ (assert-false (typep t 'standard-char))
+
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'base-char))
+ (assert-equal (values nil t)
+ (subtypep 'base-char 'standard-char)))
+
+(define-test standard-char.etypecase-15
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type=
+ (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname))))))
+
+(define-test standard-char.identity
+ (:tag :issues)
+ (let ((a (c::specifier-type 'standard-char))
+ (b (c::specifier-type 'standard-char)))
+ ;; Should be EQ due to internal caching.
+ (assert-eq a b)))
+
+(define-test standard-char.parsing
+ (:tag :issues)
+ (assert-eq 'standard-char
+ (c::type-specifier (c::specifier-type 'standard-char))))
+
+(define-test standard-char.predicate
+ (:tag :issues)
+ (assert-true (c::standard-char-type-p (c::specifier-type 'standard-char))))
+
+(define-test standard-char.simple-subtypep
+ (:tag :issues)
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type 'standard-char)
+ (c::specifier-type 'standard-char)))
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'standard-char)))
+
+(define-test standard-char.complex-subtype-arg1
+ (:tag :issues)
+ ;; STANDARD-CHAR is a subtype of CHARACTER and T.
+ (assert-equal (values t t)
+ (subtypep 'standard-char 'character))
+ (assert-equal (values t t)
+ (subtypep 'standard-char t))
+
+ ;; Not a subtype of disjoint types.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'integer))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'symbol))
+ (assert-equal (values nil t)
+ (subtypep 'standard-char 'pathname))
+
+ ;; Subtype of a member-type that contains all standard chars.
+ (assert-equal (values t t)
+ (subtypep 'standard-char
+ `(member ,@kernel::+standard-chars+)))
+ ;; Not a subtype of a member-type missing even one standard char.
+ (assert-equal (values nil t)
+ (subtypep 'standard-char '(member #\a))))
+
+(define-test standard-char.complex-subtypep-arg
+ (:tag :issues)
+ ;; All standard chars: subtype.
+ (assert-equal (values t t)
+ (subtypep '(member #\a) 'standard-char))
+ (assert-equal (values t t)
+ (subtypep '(member #\Space #\Newline) 'standard-char))
+
+ ;; Mixed — character but not standard.
+ (assert-equal (values nil t)
+ (subtypep '(member #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\Rubout) 'standard-char))
+
+ ;; Mixed — non-character members. This was the crash case.
+ (assert-equal (values nil t)
+ (subtypep '(member t) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member t nil) 'standard-char))
+
+ ;; Mixed — some standard, some not.
+ (assert-equal (values nil t)
+ (subtypep '(member #\a #\Tab) 'standard-char))
+ (assert-equal (values nil t)
+ (subtypep '(member #\a t) 'standard-char))
+
+ ;; CHARACTER is not a subtype of STANDARD-CHAR (non-standard chars exist).
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char)))
+
+(define-test standard-char.complex-union
+ (:tag :issues)
+ ;; Absorbed by supertype.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'character)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 't)))
+
+ ;; All-standard-chars member-type absorbed back into STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\b)))
+ (c::specifier-type 'standard-char)))
+
+ ;; Disjoint type stays as a union (the bug-fix case).
+ ;; The result should NOT be a single member-type containing
+ ;; T, NIL, and 96 standard chars.
+ (let ((result (c::specifier-type '(or boolean standard-char))))
+ (assert-true (c::union-type-p result))
+ (assert-equal 2 (length (c::union-type-types result)))
+ (assert-true (notany (lambda (m)
+ (and (c::member-type-p m)
+ (some #'characterp (c::member-type-members m))
+ (some (complement #'characterp)
+ (c::member-type-members m))))
+ (c::union-type-types result))))
+
+
+ ;; Permutation invariance — the original etypecase.15 trigger.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(or boolean standard-char))
+ (c::specifier-type '(or standard-char boolean))))
+
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type
+ '(not (or pathname boolean standard-char standard-object character file-error)))
+ (c::specifier-type
+ '(not (or file-error character standard-object standard-char boolean pathname)))))
+
+ ;; Member-type with non-standard chars — kept symbolically separate.
+ (let ((result (c::type-union (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab)))))
+ ;; Should not collapse into a 97-element MEMBER.
+ (assert-false (c::member-type-p result))
+ (assert-true (c::union-type-p result))))
+
+(define-test standard-char.complex-intersection
+ (:tag :issues)
+ ;; Intersection with supertype is STANDARD-CHAR.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'character))
+ (c::specifier-type 'standard-char)))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 't))
+ (c::specifier-type 'standard-char)))
+
+ ;; Intersection with disjoint type is empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'integer))
+ c::*empty-type*))
+
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type 'symbol))
+ c::*empty-type*))
+
+ ;; Intersection with member-type — filtered to standard chars.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a #\Tab #\b)))
+ (c::specifier-type '(member #\a #\b))))
+
+ ;; All-non-standard members → empty.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\Tab #\Rubout)))
+ c::*empty-type*))
+
+ ;; All-standard members → that member-type unchanged.
+ (assert-equal (values t t)
+ (c::type= (c::type-intersection (c::specifier-type 'standard-char)
+ (c::specifier-type '(member #\a)))
+ (c::specifier-type '(member #\a)))))
+
+
+
+(define-test standard-char.negation
+ (:tag :issues)
+ ;; NOT STANDARD-CHAR catches non-standard characters.
+ (assert-true (typep #\Tab '(not standard-char)))
+ (assert-false (typep #\a '(not standard-char)))
+
+ ;; AND CHARACTER (NOT STANDARD-CHAR) is the non-standard chars.
+ (assert-true (typep #\Tab '(and character (not standard-char))))
+ (assert-false (typep #\a '(and character (not standard-char))))
+ (assert-false (typep 5 '(and character (not standard-char))))
+
+ ;; Permutation invariance with negation, multiple types.
+ (assert-equal (values t t)
+ (c::type= (c::specifier-type '(and standard-char (not (member #\a))))
+ (c::specifier-type '(and (not (member #\a)) standard-char)))))
+
+(define-test standard-char.etypecase
+ (:tag :issues)
+ ;; Test that etypecase works using ASCII characters which will cover
+ ;; standard-char values and other characters.
+ (dotimes (k 128)
+ (let* ((ch (code-char k))
+ (expected (if (standard-char-p ch)
+ :is-standard :is-other))
+ (actual (handler-case
+ (etypecase ch
+ (standard-char :is-standard)
+ (character :is-other))
+ (error ()
+ :error))))
+ (assert-eql expected actual ch))))
+
+(define-test standard-char.intersection-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char intersect character = standard-char, regardless of argument order.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-intersection (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-disjoint-both-orderings
+ (:tag :issues)
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'integer))))
+ (assert-equal (values t t)
+ (kernel::type=
+ kernel::*empty-type*
+ (kernel::type-intersection (kernel::specifier-type 'integer)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.intersection-member-both-orderings
+ (:tag :issues)
+ ;; Filter member-type to standard chars only.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\Tab #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type '(member #\a #\b))
+ (kernel::type-intersection (kernel::specifier-type '(member #\a #\Tab #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-character-both-orderings
+ (:tag :issues)
+ ;; Standard-char union character = character.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type 'character))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'character)
+ (kernel::type-union (kernel::specifier-type 'character)
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-member-of-standard-both-orderings
+ (:tag :issues)
+ ;; Standard-char absorbs all-standard member-type.
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type 'standard-char)
+ (kernel::specifier-type '(member #\a #\b)))))
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::specifier-type 'standard-char)
+ (kernel::type-union (kernel::specifier-type '(member #\a #\b))
+ (kernel::specifier-type 'standard-char)))))
+
+(define-test standard-char.union-disjoint-stays-symbolic-both-orderings
+ (:tag :issues)
+ ;; (or boolean standard-char) and reverse — both should stay symbolic
+ ;; rather than collapsing into a giant member-type.
+ (let ((r1 (kernel::specifier-type '(or boolean standard-char)))
+ (r2 (kernel::specifier-type '(or standard-char boolean))))
+ (assert-true (kernel::union-type-p r1))
+ (assert-true (kernel::union-type-p r2))
+ (assert-equal (values t t)
+ (kernel::type= r1 r2))
+ ;; Neither should contain a member-type with both characters
+ ;; and non-characters.
+ (dolist (m (kernel::union-type-types r1))
+ (assert-false (and (kernel::member-type-p m)
+ (some #'characterp (kernel::member-type-members m))
+ (some (complement #'characterp)
+ (kernel::member-type-members m)))))))
+
+(defun assert-commutative-union (type-a-spec type-b-spec)
+ "Assert that union(A, B) and union(B, A) produce type= results."
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-union (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-union (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(defun assert-commutative-intersection (type-a-spec type-b-spec)
+ (assert-equal (values t t)
+ (kernel::type=
+ (kernel::type-intersection (kernel::specifier-type type-a-spec)
+ (kernel::specifier-type type-b-spec))
+ (kernel::type-intersection (kernel::specifier-type type-b-spec)
+ (kernel::specifier-type type-a-spec)))))
+
+(define-test standard-char.commutativity
+ (:tag :issues)
+ (assert-commutative-union 'standard-char 'character)
+ (assert-commutative-union 'standard-char 'integer)
+ (assert-commutative-union 'standard-char '(member #\a #\b))
+ (assert-commutative-union 'standard-char '(member #\Tab))
+ (assert-commutative-union 'standard-char 'boolean)
+ (assert-commutative-union 'standard-char '(not character))
+ (assert-commutative-union 'standard-char 't)
+ (assert-commutative-intersection 'standard-char 'character)
+ (assert-commutative-intersection 'standard-char 'integer)
+ (assert-commutative-intersection 'standard-char '(member #\a #\b))
+ (assert-commutative-intersection 'standard-char '(member #\Tab))
+ (assert-commutative-intersection 'standard-char 'boolean)
+ (assert-commutative-intersection 'standard-char '(not character))
+ (assert-commutative-intersection 'standard-char 't))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/47971b155ca22c2637a0d3…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/47971b155ca22c2637a0d3…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][issue-318-add-concrete-standard-char-type] Don't need bootfile anymore
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch issue-318-add-concrete-standard-char-type at cmucl / cmucl
Commits:
33c5bacb by Raymond Toy at 2026-06-26T08:54:51-07:00
Don't need bootfile anymore
- - - - -
1 changed file:
- .gitlab-ci.yml
Changes:
=====================================
.gitlab-ci.yml
=====================================
@@ -7,7 +7,7 @@ variables:
download_url: "https://common-lisp.net/project/cmucl/downloads/release/$release"
version: "$release-x86"
tar_ext: "xz"
- bootstrap: "-B boot-21f"
+ bootstrap: ""
workflow:
rules:
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/33c5bacb79cdc9cb65fdfa6…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/33c5bacb79cdc9cb65fdfa6…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][issue-318-add-concrete-standard-char-type] 5 commits: Remove defypte of standard-char from code/type.lisp
by Raymond Toy (@rtoy) 26 Jun '26
by Raymond Toy (@rtoy) 26 Jun '26
26 Jun '26
Raymond Toy pushed to branch issue-318-add-concrete-standard-char-type at cmucl / cmucl
Commits:
3dfca652 by Raymond Toy at 2026-06-26T08:51:40-07:00
Remove defypte of standard-char from code/type.lisp
It's not needed now that we have a separate concrete standard-char
type.
- - - - -
7d0efaa6 by Raymond Toy at 2026-06-26T08:51:40-07:00
Bootfile is not needed to build this change
Not sure why we had this originally, but I just did a normal build
without specifying a -B option, and everything built fine and the
standard-char test ran fine.
- - - - -
2302c6bc by Raymond Toy at 2026-06-26T08:51:40-07:00
Address review comment
Reverse subtype args to verify that we're computing the correct
result.
- - - - -
fecdd641 by Raymond Toy at 2026-06-26T08:51:40-07:00
Address review comment
Replace `subsetp` with `every` to determine subtype relationship
between standard-char and member types. There were two places that
were updated.
- - - - -
a4842b2f by Raymond Toy at 2026-06-26T08:51:40-07:00
Address review comment
Forgot to replace one `subsetp` with `every`.
- - - - -
4 changed files:
- bin/build.sh
- − src/bootfiles/21f/boot-21f.lisp
- src/code/type.lisp
- tests/standard-char.lisp
Changes:
=====================================
bin/build.sh
=====================================
@@ -38,7 +38,7 @@ ENABLE2="yes"
ENABLE3="yes"
ENABLE4="yes"
-version=21f
+version=21e
SRCDIR=src
BINDIR=bin
TOOLDIR=$BINDIR
=====================================
src/bootfiles/21f/boot-21f.lisp deleted
=====================================
@@ -1,14 +0,0 @@
-;; For #318. Define new standard-char type.
-(in-package "KERNEL")
-(ext:without-package-locks
-(define-type-class standard-char)
-(defstruct (standard-char-type
- (:include ctype
- (class-info (type-class-or-lose 'standard-char))
- (:enumerable t))
- (:constructor %make-standard-char-type ())
- (:copier nil)))
-
-(defun make-standard-char-type ()
- (%make-standard-char-type))
-)
=====================================
src/code/type.lisp
=====================================
@@ -3347,7 +3347,10 @@
(values t t))
((member-type-p type2)
;; If TYPE2 is a member-type, check whether it contains all standard-chars
- (values (subsetp +standard-chars+ (member-type-members type2))
+ (values (let ((members (member-type-members type2)))
+ (every #'(lambda (c)
+ (member c members))
+ +standard-chars+))
t))
(t
(values nil t))))
@@ -3358,7 +3361,9 @@
(cond ((member-type-p type1)
;; If TYPE1 is a member-type, check whether it contains all
;; standard-chars.
- (values (subsetp (member-type-members type1) +standard-chars+)
+ (values (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members type1))
t))
(t
(values nil t))))
@@ -3374,8 +3379,10 @@
((csubtypep (specifier-type 'character) other)
other)
((and (member-type-p other)
- (subsetp (member-type-members other)
- +standard-chars+))
+ ;; Check to see every member of OTHER is a STANDARD-CHAR.
+ (every #'(lambda (c)
+ (member c +standard-chars+))
+ (member-type-members other)))
sc)
(t nil))))
@@ -3640,16 +3647,6 @@
"Type of characters that aren't base-char's. None in CMU CL."
'(and character (not base-char)))
-#+nil
-(deftype standard-char ()
- "Type corresponding to the charaters required by the standard."
- '(member #\NEWLINE #\SPACE #\! #\" #\# #\$ #\% #\& #\' #\( #\) #\* #\+ #\,
- #\- #\. #\/ #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\: #\; #\< #\=
- #\> #\? #\@ #\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M
- #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\[ #\\ #\]
- #\^ #\_ #\` #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m
- #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\{
- #\| #\} #\~))
(deftype keyword ()
"Type for any keyword symbol."
'(and symbol (satisfies keywordp)))
=====================================
tests/standard-char.lisp
=====================================
@@ -10,7 +10,7 @@
;; sure we test the intersection and union methods for standard-char.
(define-test standard-char.typep
- (:tag :issues)
+ (:tag :issues)
(assert-true (typep #\a 'standard-char))
(assert-false (typep #\tab 'standard-char))
(assert-true (typep #\Z 'standard-char))
@@ -24,8 +24,12 @@
(assert-equal (values t t)
(subtypep 'standard-char 'character))
+ (assert-equal (values nil t)
+ (subtypep 'character 'standard-char))
(assert-equal (values t t)
- (subtypep 'standard-char 'base-char)))
+ (subtypep 'standard-char 'base-char))
+ (assert-equal (values nil t)
+ (subtypep 'base-char 'standard-char)))
(define-test standard-char.etypecase-15
(:tag :issues)
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/95fb934cafa3ccd4a65de6…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/compare/95fb934cafa3ccd4a65de6…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][rtoy-unicode-collation-ducet] Add case-insensitive comparison functions
by Raymond Toy (@rtoy) 17 Jun '26
by Raymond Toy (@rtoy) 17 Jun '26
17 Jun '26
Raymond Toy pushed to branch rtoy-unicode-collation-ducet at cmucl / cmucl
Commits:
d555c487 by Raymond Toy at 2026-06-17T14:03:47-07:00
Add case-insensitive comparison functions
Add a STRENGTH parameter to the collation comparison path -- :PRIMARY,
:SECONDARY, :TERTIARY (the default), or :QUATERNARY -- bounding the
weight levels included in the sort key, and hence the distinctions the
comparison makes. A lower strength makes more strings compare equal:
:SECONDARY ignores case, :PRIMARY also ignores accents. COLLATION-SORT-KEY
and COLLATION-COMPARE take it, and it is threaded through the UNICODE
comparison functions.
Add the case-insensitive comparison functions, the Unicode analogs of
the COMMON-LISP -EQUAL / -LESSP family: STRING-EQUAL, STRING-NOT-EQUAL,
STRING-LESSP, STRING-GREATERP, STRING-NOT-GREATERP, and STRING-NOT-LESSP.
They default to :SECONDARY strength, so they ignore case (and other
tertiary distinctions) while remaining sensitive to base letters and
accents. Shadow and export the six in the UNICODE package.
Add tests for the public comparison functions: the boolean result of
each predicate, the four strength levels, the case-insensitive variants,
string-designator and START/END handling, and the variable-weighting
option.
- - - - -
3 changed files:
- src/code/exports.lisp
- src/code/unicode-collation.lisp
- tests/unicode-collation.lisp
Changes:
=====================================
src/code/exports.lisp
=====================================
@@ -2313,7 +2313,13 @@
"STRING="
"STRING/="
"STRING>"
- "STRING>=")
+ "STRING>="
+ "STRING-EQUAL"
+ "STRING-NOT-EQUAL"
+ "STRING-LESSP"
+ "STRING-GREATERP"
+ "STRING-NOT-GREATERP"
+ "STRING-NOT-LESSP")
(:import-from "LISP"
"CODEPOINT"
"SURROGATES"
@@ -2337,6 +2343,12 @@
"STRING="
"STRING/="
"STRING>"
- "STRING>="))
+ "STRING>="
+ "STRING-EQUAL"
+ "STRING-NOT-EQUAL"
+ "STRING-LESSP"
+ "STRING-GREATERP"
+ "STRING-NOT-GREATERP"
+ "STRING-NOT-LESSP"))
=====================================
src/code/unicode-collation.lisp
=====================================
@@ -467,18 +467,28 @@ is no fourth level. See COLLATION-WEIGHTS."
(unless (zerop te) (push te l3))))
(values (nreverse l1) (nreverse l2) (nreverse l3) nil)))
-(defun collation-sort-key (d string &optional (variable-weighting :shifted))
+(defun collation-sort-key (d string &optional (variable-weighting :shifted)
+ (strength :tertiary))
"Compute the UTS #10 sort key for STRING under DUCET D. Returns a
-(simple-array (unsigned-byte 16) (*)) holding the level-1 weights, a 0000
-separator, the level-2 weights, 0000, the level-3 weights, and -- for the
-:SHIFTED option -- a further 0000 separator and the level-4 weights.
-Binary comparison of two such keys yields the collation order of their
-strings. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS."
+(simple-array (unsigned-byte 16) (*)) holding the weight levels separated
+by 0000: level 1, level 2, level 3, and -- under the :SHIFTED option --
+level 4. Binary comparison of two such keys yields the collation order
+of their strings. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS.
+
+STRENGTH bounds the levels included in the key, and hence the
+distinctions the comparison makes: :PRIMARY (base letters only),
+:SECONDARY (also accents), :TERTIARY (also case; the default), or
+:QUATERNARY (also the level-4 weights, which exist only under the
+:SHIFTED option and otherwise add nothing). A lower strength makes
+more strings compare equal; for example :SECONDARY ignores case."
(multiple-value-bind (l1 l2 l3 l4)
(collation-weights d string variable-weighting)
- (let* ((weights (if (eq variable-weighting :non-ignorable)
- (append l1 (list 0) l2 (list 0) l3)
- (append l1 (list 0) l2 (list 0) l3 (list 0) l4)))
+ (let* ((weights (ecase strength
+ (:primary l1)
+ (:secondary (append l1 (list 0) l2))
+ (:tertiary (append l1 (list 0) l2 (list 0) l3))
+ (:quaternary
+ (append l1 (list 0) l2 (list 0) l3 (list 0) l4))))
(key (make-array (length weights)
:element-type '(unsigned-byte 16))))
(loop for w in weights
@@ -486,12 +496,13 @@ strings. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS."
do (setf (aref key k) w))
key)))
-(defun collation-compare (d s1 s2 &optional (variable-weighting :shifted))
+(defun collation-compare (d s1 s2 &optional (variable-weighting :shifted)
+ (strength :tertiary))
"Compare strings S1 and S2 under DUCET D. Returns -1, 0, or 1 like a
three-way comparison: negative if S1 sorts before S2, zero if equal, 1
-if after. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS."
- (let ((k1 (collation-sort-key d s1 variable-weighting))
- (k2 (collation-sort-key d s2 variable-weighting)))
+if after. VARIABLE-WEIGHTING and STRENGTH are as in COLLATION-SORT-KEY."
+ (let ((k1 (collation-sort-key d s1 variable-weighting strength))
+ (k2 (collation-sort-key d s2 variable-weighting strength)))
(let ((n (min (length k1) (length k2))))
(dotimes (i n)
(let ((a (aref k1 i)) (b (aref k2 i)))
@@ -615,7 +626,7 @@ collation section of unidata.bin on first use."
(setf *collation-table* (lisp::unidata-ducet))))
(defun %collation-compare (string1 string2 start1 end1 start2 end2
- variable-weighting)
+ variable-weighting strength)
"Three-way collation comparison of the designated substrings of
STRING1 and STRING2: returns a negative integer, zero, or a positive
integer as the first sorts before, equal to, or after the second."
@@ -625,47 +636,87 @@ integer as the first sorts before, equal to, or after the second."
(setf s1 (subseq s1 start1 end1)))
(when (or (/= start2 0) end2)
(setf s2 (subseq s2 start2 end2)))
- (lisp::collation-compare (collation-table) s1 s2 variable-weighting)))
+ (lisp::collation-compare (collation-table) s1 s2
+ variable-weighting strength)))
-(defmacro %def-collation-predicate (name test docstring)
+(defmacro %def-collation-predicate (name test default-strength docstring)
"Define a collation comparison predicate NAME whose result is (TEST c)
-where c is the three-way comparison of the two string arguments."
+where c is the three-way comparison of the two string arguments.
+DEFAULT-STRENGTH is the default value of the STRENGTH keyword."
`(defun ,name (string1 string2 &key (start1 0) end1 (start2 0) end2
- (variable-weighting :shifted))
+ (variable-weighting :shifted)
+ (strength ,default-strength))
,docstring
(let ((c (%collation-compare string1 string2
start1 end1 start2 end2
- variable-weighting)))
+ variable-weighting strength)))
(,test c))))
-(%def-collation-predicate string= zerop
+(%def-collation-predicate string= zerop :tertiary
"Return true if STRING1 and STRING2 collate as equal under the Unicode
Collation Algorithm. Note that this is collation equality, not
code-point identity: canonically equivalent strings, and strings that
differ only in collation-ignorable ways, compare equal. START1, END1,
-START2 and END2 bound the substrings compared; VARIABLE-WEIGHTING is
-:SHIFTED (the default) or :NON-IGNORABLE.")
+START2 and END2 bound the substrings compared. VARIABLE-WEIGHTING is
+:SHIFTED (the default) or :NON-IGNORABLE. STRENGTH is :PRIMARY,
+:SECONDARY, :TERTIARY (the default), or :QUATERNARY, as in
+LISP::COLLATION-SORT-KEY; a lower strength makes more strings compare
+equal -- :SECONDARY, for instance, ignores case.")
-(%def-collation-predicate string/= (lambda (c) (not (zerop c)))
+(%def-collation-predicate string/= (lambda (c) (not (zerop c))) :tertiary
"Return true if STRING1 and STRING2 do not collate as equal. See
UNICODE:STRING= for the meaning of the keyword arguments.")
-(%def-collation-predicate string< minusp
+(%def-collation-predicate string< minusp :tertiary
"Return true if STRING1 collates before STRING2 under the Unicode
Collation Algorithm. See UNICODE:STRING= for the meaning of the keyword
arguments.")
-(%def-collation-predicate string> plusp
+(%def-collation-predicate string> plusp :tertiary
"Return true if STRING1 collates after STRING2 under the Unicode
Collation Algorithm. See UNICODE:STRING= for the meaning of the keyword
arguments.")
-(%def-collation-predicate string<= (lambda (c) (not (plusp c)))
+(%def-collation-predicate string<= (lambda (c) (not (plusp c))) :tertiary
"Return true if STRING1 collates before or equal to STRING2 under the
Unicode Collation Algorithm. See UNICODE:STRING= for the meaning of the
keyword arguments.")
-(%def-collation-predicate string>= (lambda (c) (not (minusp c)))
+(%def-collation-predicate string>= (lambda (c) (not (minusp c))) :tertiary
"Return true if STRING1 collates after or equal to STRING2 under the
Unicode Collation Algorithm. See UNICODE:STRING= for the meaning of the
keyword arguments.")
+
+;;; The case-insensitive comparison functions, the Unicode analogs of
+;;; the COMMON-LISP -EQUAL/-LESSP/... family. They default to :SECONDARY
+;;; strength, which drops the tertiary level where case is encoded, so
+;;; they ignore case (and other tertiary distinctions, such as width)
+;;; while remaining sensitive to base letters and accents. This is the
+;;; closest collation analog of case-folded comparison; the Unicode
+;;; Collation Algorithm has no operation that folds case alone.
+
+(%def-collation-predicate string-equal zerop :secondary
+ "Return true if STRING1 and STRING2 collate as equal ignoring case,
+under the Unicode Collation Algorithm. Like UNICODE:STRING= but
+defaulting to :SECONDARY strength; see it for the keyword arguments.")
+
+(%def-collation-predicate string-not-equal (lambda (c) (not (zerop c))) :secondary
+ "Return true if STRING1 and STRING2 do not collate as equal ignoring
+case. See UNICODE:STRING-EQUAL for the keyword arguments.")
+
+(%def-collation-predicate string-lessp minusp :secondary
+ "Return true if STRING1 collates before STRING2 ignoring case, under
+the Unicode Collation Algorithm. See UNICODE:STRING-EQUAL for the
+keyword arguments.")
+
+(%def-collation-predicate string-greaterp plusp :secondary
+ "Return true if STRING1 collates after STRING2 ignoring case. See
+UNICODE:STRING-EQUAL for the keyword arguments.")
+
+(%def-collation-predicate string-not-greaterp (lambda (c) (not (plusp c))) :secondary
+ "Return true if STRING1 collates before or equal to STRING2 ignoring
+case. See UNICODE:STRING-EQUAL for the keyword arguments.")
+
+(%def-collation-predicate string-not-lessp (lambda (c) (not (minusp c))) :secondary
+ "Return true if STRING1 collates after or equal to STRING2 ignoring
+case. See UNICODE:STRING-EQUAL for the keyword arguments.")
=====================================
tests/unicode-collation.lisp
=====================================
@@ -145,6 +145,83 @@ must match the expected key in the line's comment."
(run-collation-conformance (ducet) *collation-non-ignorable-test*
:non-ignorable))
+;;; Tests for the public UNICODE comparison functions. The conformance
+;;; tests above already validate the collation weights themselves; these
+;;; check the thin wrappers -- that each predicate maps the comparison to
+;;; the right boolean, and that the STRENGTH and VARIABLE-WEIGHTING
+;;; options and the case-insensitive variants behave as documented.
+
+(define-test unicode.string-predicates
+ "The six case-sensitive comparison predicates map the collation order
+of a known pair to the correct boolean, including the reflexive case."
+ (:tag :unicode)
+ (assert-true (unicode:string< "a" "b"))
+ (assert-false (unicode:string< "b" "a"))
+ (assert-false (unicode:string< "a" "a"))
+ (assert-true (unicode:string> "b" "a"))
+ (assert-false (unicode:string> "a" "b"))
+ (assert-true (unicode:string<= "a" "b"))
+ (assert-true (unicode:string<= "a" "a"))
+ (assert-false (unicode:string<= "b" "a"))
+ (assert-true (unicode:string>= "b" "a"))
+ (assert-true (unicode:string>= "a" "a"))
+ (assert-false (unicode:string>= "a" "b"))
+ (assert-true (unicode:string= "a" "a"))
+ (assert-false (unicode:string= "a" "b"))
+ (assert-true (unicode:string/= "a" "b"))
+ (assert-false (unicode:string/= "a" "a")))
+
+(define-test unicode.string-designators-and-bounds
+ "Comparison accepts string designators (characters, symbols) like the
+COMMON-LISP functions, and honors START/END substring bounds."
+ (:tag :unicode)
+ (assert-true (unicode:string= #\a #\a))
+ (assert-true (unicode:string= 'abc "ABC"))
+ (assert-true (unicode:string= "xab" "yab" :start1 1 :start2 1))
+ (assert-false (unicode:string= "xab" "yzb" :start1 1 :start2 1)))
+
+(define-test unicode.string-strength
+ "STRENGTH bounds the distinctions made: case is a tertiary difference,
+an accent a secondary one."
+ (:tag :unicode)
+ ;; Default (tertiary): case matters. Lowercase sorts before uppercase.
+ (assert-true (unicode:string< "abc" "ABC"))
+ (assert-false (unicode:string< "ABC" "abc"))
+ (assert-false (unicode:string= "ABC" "abc"))
+ ;; Secondary: case ignored, but accents still distinguish.
+ (assert-true (unicode:string= "ABC" "abc" :strength :secondary))
+ (assert-false (unicode:string= "café" "cafe" :strength :secondary))
+ ;; Primary: accents ignored too.
+ (assert-true (unicode:string= "café" "cafe" :strength :primary))
+ ;; Quaternary: under the default :SHIFTED option the level-4 weights
+ ;; make variable elements (here the hyphen) break an otherwise equal
+ ;; comparison, where :TERTIARY ignores them.
+ (assert-true (unicode:string= "co-op" "coop"))
+ (assert-false (unicode:string= "co-op" "coop" :strength :quaternary)))
+
+(define-test unicode.string-case-insensitive
+ "The case-insensitive family ignores case (secondary strength) while
+remaining sensitive to accents."
+ (:tag :unicode)
+ (assert-true (unicode:string-equal "ABC" "abc"))
+ (assert-false (unicode:string-not-equal "ABC" "abc"))
+ (assert-false (unicode:string-lessp "ABC" "abc"))
+ (assert-false (unicode:string-greaterp "ABC" "abc"))
+ (assert-true (unicode:string-not-greaterp "ABC" "abc"))
+ (assert-true (unicode:string-not-lessp "ABC" "abc"))
+ ;; An accent is a secondary difference, so it still distinguishes.
+ (assert-false (unicode:string-equal "café" "cafe")))
+
+(define-test unicode.string-variable-weighting
+ "Under the default :SHIFTED option variable elements (punctuation) are
+ignored; under :NON-IGNORABLE they are significant."
+ (:tag :unicode)
+ ;; Shifted (default): the hyphen is ignored, so the strings collate equal.
+ (assert-true (unicode:string= "co-op" "coop"))
+ ;; Non-ignorable: the hyphen counts, so they are not equal.
+ (assert-false (unicode:string= "co-op" "coop"
+ :variable-weighting :non-ignorable)))
+
;; A DEFINE-TEST body is stored as source and run interpreted, and the
;; test runner (tests/run-tests.lisp) loads this file as source, so its
;; functions would otherwise run interpreted. The per-line parsing and
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/d555c487cbc28742ca2967d…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/d555c487cbc28742ca2967d…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][rtoy-unicode-collation-ducet] Serialize the DUCET into unidata.bin
by Raymond Toy (@rtoy) 17 Jun '26
by Raymond Toy (@rtoy) 17 Jun '26
17 Jun '26
Raymond Toy pushed to branch rtoy-unicode-collation-ducet at cmucl / cmucl
Commits:
6a425efa by Raymond Toy at 2026-06-17T13:30:01-07:00
Serialize the DUCET into unidata.bin
The collation table was loaded at runtime by parsing allkeys.txt. Store
it in unidata.bin instead, as a new section (index 19), and bump the
file format version to 2.
The section holds the collation elements in three parallel arrays
(primary u16, secondary u16, tertiary u8 with the variable flag in bit
7), a single-codepoint index as an ntrie32 mapping a codepoint to a
packed (offset << 6) | count into those arrays, a contraction table
(four 32-bit words per entry), and the @implicitweights ranges.
build-unidata.lisp reads allkeys.txt and builds the section; unidata.lisp
reads it back (loader for section 19, plus the COLLATION struct and slot).
UNIDATA-DUCET in unicode-collation.lisp builds the runtime DUCET from the
loaded section -- structurally identical to one from LOAD-DUCET, so the
sort-key construction code is unchanged -- and COLLATION-TABLE now uses
it. LOAD-DUCET is kept for regenerating data and cross-checking.
The collation conformance tests build the DUCET from unidata.bin and
still pass all 437930 assertions.
unidata.bin updated with the new collation data.
- - - - -
4 changed files:
- src/code/unicode-collation.lisp
- src/code/unidata.lisp
- src/i18n/unidata.bin
- tests/unicode-collation.lisp
Changes:
=====================================
src/code/unicode-collation.lisp
=====================================
@@ -518,6 +518,85 @@ if after. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS."
;;; no meaningful character index of the first difference to return.
;;; -------------------------------------------------------------------
+
+;;; -------------------------------------------------------------------
+;;; Building the runtime DUCET from the collation section of
+;;; unidata.bin. The resulting table is structurally identical to one
+;;; built by LOAD-DUCET from allkeys.txt -- the same MAP / SINGLE /
+;;; STARTERS hashes and implicit ranges -- so the sort-key construction
+;;; code uses it unchanged. This replaces the runtime use of LOAD-DUCET
+;;; (which is kept for regenerating data and for cross-checking).
+;;; -------------------------------------------------------------------
+
+(defun unidata-ducet ()
+ "Build a DUCET from the collation section of unidata.bin, loading the
+section first if necessary."
+ (unless (unidata-collation *unicode-data*)
+ (load-collation))
+ (let* ((c (unidata-collation *unicode-data*))
+ (primv (collation-primv c))
+ (secv (collation-secv c))
+ (terv (collation-terv c))
+ (contractions (collation-contractions c))
+ (ranges (collation-ranges c))
+ (d (make-ducet :version (format nil "~D.~D.~D"
+ +unicode-major-version+
+ +unicode-minor-version+
+ +unicode-update-version+)))
+ (maxvar 0)
+ (maxkey 1))
+ (flet ((ces-at (packed)
+ ;; Slice the parallel arrays into a simple-vector of
+ ;; collation-elements for the packed (offset << 6) | count.
+ (let* ((off (ash packed -6))
+ (n (logand packed #x3f))
+ (v (make-array n)))
+ (dotimes (i n)
+ (let* ((j (+ off i))
+ (te (aref terv j))
+ (var (logbitp 7 te))
+ (p (aref primv j)))
+ (when (and var (> p maxvar))
+ (setf maxvar p))
+ (setf (aref v i)
+ (make-ce p (aref secv j) (logand te #x7f) var))))
+ v)))
+ ;; Single-codepoint entries: walk the codepoint space and pull the
+ ;; non-zero values out of the index trie. (Many keys are astral,
+ ;; so the walk must cover the full range, not just the BMP.)
+ (dotimes (cp #x110000)
+ (let ((packed (qref32 c cp)))
+ (unless (zerop packed)
+ (let ((ces (ces-at packed)))
+ (setf (gethash cp (ducet-single d)) ces)
+ (setf (gethash (make-array 1 :initial-element cp) (ducet-map d))
+ ces)))))
+ ;; Contractions: four 32-bit words each.
+ (loop for i from 0 below (length contractions) by 4 do
+ (let* ((cp1 (aref contractions i))
+ (cp2 (aref contractions (+ i 1)))
+ (cp3 (aref contractions (+ i 2)))
+ (packed (aref contractions (+ i 3)))
+ (key (if (= cp3 #xFFFFFFFF)
+ (make-array 2 :initial-contents (list cp1 cp2))
+ (make-array 3 :initial-contents (list cp1 cp2 cp3)))))
+ (setf (gethash key (ducet-map d)) (ces-at packed))
+ (setf (gethash cp1 (ducet-starters d)) t)
+ (setf maxkey (max maxkey (length key)))))
+ ;; Implicit-weight ranges: four 32-bit words each (start, end,
+ ;; base, base-origin).
+ (let ((rl nil))
+ (loop for i from 0 below (length ranges) by 4 do
+ (let ((r (make-implicit-range (aref ranges i)
+ (aref ranges (+ i 1))
+ (aref ranges (+ i 2)))))
+ (setf (implicit-range-base-origin r) (aref ranges (+ i 3)))
+ (push r rl)))
+ (setf (ducet-implicit-ranges d) (nreverse rl)))
+ (setf (ducet-max-key-length d) maxkey
+ (ducet-max-variable-primary d) maxvar)
+ d)))
+
(in-package "UNICODE")
(defvar *collation-table-path* "ext-formats:allkeys.txt"
@@ -530,10 +609,10 @@ loaded. Loaded lazily from *COLLATION-TABLE-PATH* the first time a
collation function needs it. Set to NIL to force a reload.")
(defun collation-table ()
- "Return the default Unicode collation table, loading it from
-*COLLATION-TABLE-PATH* on first use."
+ "Return the default Unicode collation table, building it from the
+collation section of unidata.bin on first use."
(or *collation-table*
- (setf *collation-table* (lisp::load-ducet *collation-table-path*))))
+ (setf *collation-table* (lisp::unidata-ducet))))
(defun %collation-compare (string1 string2 start1 end1 start2 end2
variable-weighting)
=====================================
src/code/unidata.lisp
=====================================
@@ -56,6 +56,7 @@
case-fold-simple
case-fold-full
word-break
+ collation
)
(defvar *unicode-data* (make-unidata))
@@ -65,7 +66,7 @@
(defconstant +unicode-magic-number+ #x2A554344)
;; The format version for the unidata.bin file.
-(defconstant +unicode-format-version+ 1)
+(defconstant +unicode-format-version+ 2)
;; The expected Unicode version. This needs to be synced with
;; build-unidata.lisp.
@@ -292,6 +293,27 @@
(defstruct (case-fold-full (:include decomp)))
+(defstruct (collation (:include ntrie32))
+ ;; Parallel collation-element arrays shared by the single-codepoint
+ ;; index (whose LVEC packs (offset << 6) | count into these) and the
+ ;; contraction table. TERV holds the tertiary weight in its low 7
+ ;; bits and the variable flag in bit 7.
+ (primv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 16) (*)))
+ (secv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 16) (*)))
+ (terv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 8) (*)))
+ ;; Contraction table: four 32-bit words per entry -- cp1, cp2, cp3
+ ;; (or #xFFFFFFFF when the key has only two codepoints), and the
+ ;; packed (offset << 6) | count into the collation-element arrays.
+ (contractions (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 32) (*)))
+ ;; @implicitweights ranges: four 32-bit words per entry -- start,
+ ;; end, base, and base-origin (smallest start sharing the base).
+ (ranges (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 32) (*))))
+
(defstruct (bidi (:include ntrie16))
(tabl (ext:required-argument) :read-only t
:type (simple-array (unsigned-byte 16) (*))))
@@ -718,6 +740,29 @@
(read-ntrie 4 stm)
(setf (unidata-word-break *unicode-data*)
(make-ntrie4 :split split :hvec hvec :mvec mvec :lvec lvec))))
+(defloader load-collation (stm 19)
+ (multiple-value-bind (split hvec mvec lvec)
+ (read-ntrie 32 stm)
+ (let* ((nce (read32 stm))
+ (primv (make-array nce :element-type '(unsigned-byte 16)))
+ (secv (make-array nce :element-type '(unsigned-byte 16)))
+ (terv (make-array nce :element-type '(unsigned-byte 8))))
+ (read-vector primv stm :endian-swap :network-order)
+ (read-vector secv stm :endian-swap :network-order)
+ (read-vector terv stm :endian-swap :network-order)
+ (let* ((ncontr (read32 stm))
+ (contractions (make-array (* 4 ncontr)
+ :element-type '(unsigned-byte 32))))
+ (read-vector contractions stm :endian-swap :network-order)
+ (let* ((nrange (read-byte stm))
+ (ranges (make-array (* 4 nrange)
+ :element-type '(unsigned-byte 32))))
+ (read-vector ranges stm :endian-swap :network-order)
+ (setf (unidata-collation *unicode-data*)
+ (make-collation :split split :hvec hvec :mvec mvec :lvec lvec
+ :primv primv :secv secv :terv terv
+ :contractions contractions
+ :ranges ranges)))))))
;;; Accessor functions.
@@ -1657,4 +1702,5 @@ unidata.bin."
(unidata-case-fold-simple *unicode-data*)
(unidata-case-fold-full *unicode-data*)
(unidata-word-break *unicode-data*)
+ (unidata-collation *unicode-data*)
t))
=====================================
src/i18n/unidata.bin
=====================================
Binary files a/src/i18n/unidata.bin and b/src/i18n/unidata.bin differ
=====================================
tests/unicode-collation.lisp
=====================================
@@ -19,9 +19,10 @@
"The Default Unicode Collation Element Table, loaded on first use.")
(defun ducet ()
- "Return the DUCET, loading it from *COLLATION-ALLKEYS* the first time."
+ "Return the DUCET, built from the collation section of unidata.bin on
+first use."
(or *ducet*
- (setf *ducet* (lisp::load-ducet *collation-allkeys*))))
+ (setf *ducet* (lisp::unidata-ducet))))
(defun collation-hex-list (string)
"Parse all space-separated hexadecimal numbers in STRING into a list of
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/6a425efa239f6911e1b1fc0…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/6a425efa239f6911e1b1fc0…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][rtoy-unicode-collation-ducet] Update build-unidata to build the collation table
by Raymond Toy (@rtoy) 17 Jun '26
by Raymond Toy (@rtoy) 17 Jun '26
17 Jun '26
Raymond Toy pushed to branch rtoy-unicode-collation-ducet at cmucl / cmucl
Commits:
b021b2a8 by Raymond Toy at 2026-06-17T13:04:12-07:00
Update build-unidata to build the collation table
This reads allkeys.txt and builds the collation table as part of
unidata.bin.
- - - - -
1 changed file:
- src/tools/build-unidata.lisp
Changes:
=====================================
src/tools/build-unidata.lisp
=====================================
@@ -44,6 +44,7 @@
case-fold-full
case-fold-simple
word-break
+ collation
)
(defvar *unicode-data* (make-unidata))
@@ -146,6 +147,27 @@
(tabl (ext:required-argument) :read-only t
:type (simple-array (unsigned-byte 16) (*))))
+(defstruct (collation (:include ntrie32))
+ ;; Parallel collation-element arrays shared by the single-codepoint
+ ;; index (whose LVEC packs (offset << 6) | count into these) and the
+ ;; contraction table. TERV holds the tertiary weight in its low 7
+ ;; bits and the variable flag in bit 7.
+ (primv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 16) (*)))
+ (secv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 16) (*)))
+ (terv (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 8) (*)))
+ ;; Contraction table: four 32-bit words per entry -- cp1, cp2, cp3
+ ;; (or #xFFFFFFFF when the key has only two codepoints), and the
+ ;; packed (offset << 6) | count into the collation-element arrays.
+ (contractions (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 32) (*)))
+ ;; @implicitweights ranges: four 32-bit words per entry -- start,
+ ;; end, base, and base-origin (smallest start sharing the base).
+ (ranges (ext:required-argument) :read-only t
+ :type (simple-array (unsigned-byte 32) (*))))
+
(defstruct (bidi (:include ntrie16))
(tabl (ext:required-argument) :read-only t
:type (simple-array (unsigned-byte 16) (*))))
@@ -535,11 +557,11 @@
:element-type '(unsigned-byte 8))
;; The length of the index array is the number of sections to be
;; written. See below for each section.
- (let ((index (make-array 19 :fill-pointer 0)))
+ (let ((index (make-array 20 :fill-pointer 0)))
;; File header
(write32 +unicode-magic-number+ stm) ; identification "magic"
- ;; File format version (1: dictionary nextv de-packed, keypv added)
- (write-byte 1 stm)
+ ;; File format version (2: collation/DUCET section added)
+ (write-byte 2 stm)
;; Unicode version
(write-byte +unicode-major-version+ stm)
(write-byte +unicode-minor-version+ stm)
@@ -638,6 +660,18 @@
(let ((data (unidata-word-break *unicode-data*)))
(update-index (file-position stm) index)
(write-ntrie4 data stm))
+ ;; 19. Collation (DUCET)
+ (let ((data (unidata-collation *unicode-data*)))
+ (update-index (file-position stm) index)
+ (write-ntrie32 data stm)
+ (write32 (length (collation-primv data)) stm)
+ (write-vector (collation-primv data) stm :endian-swap :network-order)
+ (write-vector (collation-secv data) stm :endian-swap :network-order)
+ (write-vector (collation-terv data) stm :endian-swap :network-order)
+ (write32 (truncate (length (collation-contractions data)) 4) stm)
+ (write-vector (collation-contractions data) stm :endian-swap :network-order)
+ (write-byte (truncate (length (collation-ranges data)) 4) stm)
+ (write-vector (collation-ranges data) stm :endian-swap :network-order))
;; All components saved. Patch up index table now.
(file-position stm 8)
(dotimes (i (length index))
@@ -1016,6 +1050,140 @@
;; ucd-directory should be the directory where UnicodeData.txt is
;; located.
+(defun parse-collation-key (string)
+ "Parse the space-separated hexadecimal codepoints in STRING (the part
+of an allkeys.txt line before the semicolon) into a list of integers."
+ (let ((result nil) (i 0) (n (length string)))
+ (loop
+ (loop while (and (< i n) (not (digit-char-p (char string i) 16)))
+ do (incf i))
+ (when (>= i n) (return))
+ (let ((j i))
+ (loop while (and (< j n) (digit-char-p (char string j) 16))
+ do (incf j))
+ (push (parse-integer string :start i :end j :radix 16) result)
+ (setf i j)))
+ (nreverse result)))
+
+(defun parse-collation-elements (string)
+ "Parse the collation elements [.pppp.ssss.tttt] (or [*pppp...] for a
+variable element) from STRING into a list of (primary secondary tertiary
+variablep) lists."
+ (let ((result nil) (i 0))
+ (loop
+ (let ((open (position #\[ string :start i)))
+ (unless open (return))
+ (let* ((var (char= (char string (1+ open)) #\*))
+ (close (position #\] string :start open))
+ (body (subseq string (+ open 2) close))
+ (d1 (position #\. body))
+ (d2 (position #\. body :start (1+ d1))))
+ (push (list (parse-integer body :end d1 :radix 16)
+ (parse-integer body :start (1+ d1) :end d2 :radix 16)
+ (parse-integer body :start (1+ d2) :radix 16)
+ var)
+ result)
+ (setf i (1+ close)))))
+ (nreverse result)))
+
+(defun build-collation (ucd range ucd-directory)
+ "Read allkeys.txt (the DUCET) from UCD-DIRECTORY and build the
+collation section: the parallel collation-element arrays, the
+single-codepoint index (an ntrie32 mapping a codepoint to a packed
+(offset << 6) | count into those arrays), the contraction table, and the
+@implicitweights ranges."
+ (let ((path (make-pathname :name "allkeys" :type "txt" :defaults ucd-directory))
+ (primv (make-array 65536 :element-type '(unsigned-byte 16)
+ :adjustable t :fill-pointer 0))
+ (secv (make-array 65536 :element-type '(unsigned-byte 16)
+ :adjustable t :fill-pointer 0))
+ (terv (make-array 65536 :element-type '(unsigned-byte 8)
+ :adjustable t :fill-pointer 0))
+ (single (make-hash-table))
+ (contractions nil)
+ (raw-ranges nil))
+ (flet ((emit (ces)
+ ;; Append CES to the parallel arrays; return the packed
+ ;; (offset << 6) | count referring to them.
+ (let ((offset (fill-pointer primv))
+ (count (length ces)))
+ (dolist (ce ces)
+ (destructuring-bind (p s te var) ce
+ (vector-push-extend p primv)
+ (vector-push-extend s secv)
+ (vector-push-extend (logior te (if var #x80 0)) terv)))
+ (logior (ash offset 6) count))))
+ (with-open-file (s path :direction :input :external-format :utf-8)
+ (loop for line = (read-line s nil) while line do
+ (cond
+ ((zerop (length line)))
+ ((char= (char line 0) #\#))
+ ((eql 0 (search "@implicitweights" line))
+ (let* ((semi (position #\; line))
+ (dd (search ".." line))
+ (start (parse-integer line :start (length "@implicitweights")
+ :end dd :radix 16 :junk-allowed t))
+ (end (parse-integer line :start (+ dd 2) :end semi
+ :radix 16 :junk-allowed t))
+ (base (parse-integer line :start (1+ semi)
+ :radix 16 :junk-allowed t)))
+ (push (list start end base) raw-ranges)))
+ ((char= (char line 0) #\@))
+ (t
+ (let ((semi (position #\; line)))
+ (when semi
+ (let* ((hash (position #\# line))
+ (key (parse-collation-key (subseq line 0 semi)))
+ (ces (parse-collation-elements
+ (subseq line (1+ semi) hash)))
+ (packed (emit ces)))
+ (if (= (length key) 1)
+ (setf (gethash (first key) single) packed)
+ (push (list (first key) (second key) (third key) packed)
+ contractions))))))))))
+ ;; base-origin: smallest start among ranges sharing a base.
+ (let ((origin (make-hash-table)))
+ (dolist (r raw-ranges)
+ (destructuring-bind (start end base) r
+ (declare (ignore end))
+ (when (or (null (gethash base origin))
+ (< start (gethash base origin)))
+ (setf (gethash base origin) start))))
+ (let* ((rl (nreverse raw-ranges))
+ (rvec (make-array (* 4 (length rl)) :element-type '(unsigned-byte 32)))
+ (cl (nreverse contractions))
+ (cvec (make-array (* 4 (length cl)) :element-type '(unsigned-byte 32)))
+ (i 0))
+ (dolist (r rl)
+ (destructuring-bind (start end base) r
+ (setf (aref rvec i) start
+ (aref rvec (+ i 1)) end
+ (aref rvec (+ i 2)) base
+ (aref rvec (+ i 3)) (gethash base origin))
+ (incf i 4)))
+ (setf i 0)
+ (dolist (c cl)
+ (destructuring-bind (cp1 cp2 cp3 packed) c
+ (setf (aref cvec i) cp1
+ (aref cvec (+ i 1)) cp2
+ (aref cvec (+ i 2)) (or cp3 #xFFFFFFFF)
+ (aref cvec (+ i 3)) packed)
+ (incf i 4)))
+ (multiple-value-bind (hvec mvec lvec)
+ (pack ucd range
+ (lambda (ent) (gethash (ucdent-code ent) single 0))
+ 0 32 #x54)
+ (make-collation
+ :split #x54 :hvec hvec :mvec mvec :lvec lvec
+ :primv (make-array (length primv) :element-type '(unsigned-byte 16)
+ :initial-contents primv)
+ :secv (make-array (length secv) :element-type '(unsigned-byte 16)
+ :initial-contents secv)
+ :terv (make-array (length terv) :element-type '(unsigned-byte 8)
+ :initial-contents terv)
+ :contractions cvec
+ :ranges rvec))))))
+
(defun build-unidata (&optional (ucd-directory "target:i18n/"))
(format t "~&Reading data from ~S~%" (probe-file ucd-directory))
(force-output)
@@ -1216,4 +1384,9 @@
0 4 split)
(setf (unidata-word-break *unicode-data*)
(make-ntrie4 :split split :hvec hvec :mvec mvec :lvec lvec))))
+
+ (format t "~&Building collation table~%")
+ (force-output)
+ (setf (unidata-collation *unicode-data*)
+ (build-collation ucd range ucd-directory))
nil))
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/b021b2a8b48c87ee5203741…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/b021b2a8b48c87ee5203741…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0
[Git][cmucl/cmucl][rtoy-unicode-collation-ducet] Add UNICODE string comparison functions (case-sensitive)
by Raymond Toy (@rtoy) 17 Jun '26
by Raymond Toy (@rtoy) 17 Jun '26
17 Jun '26
Raymond Toy pushed to branch rtoy-unicode-collation-ducet at cmucl / cmucl
Commits:
f05b7f0e by Raymond Toy at 2026-06-17T07:11:52-07:00
Add UNICODE string comparison functions (case-sensitive)
Add Unicode-aware equivalents of the COMMON-LISP string comparison
functions in the UNICODE package: STRING=, STRING/=, STRING<, STRING>,
STRING<=, and STRING>=. These compare by the Unicode Collation
Algorithm rather than code-unit order, using the collation code in
unicode-collation.lisp. They take the usual START1/END1/START2/END2
keywords plus a VARIABLE-WEIGHTING keyword (:SHIFTED by default), and
return a generalized boolean rather than a mismatch index, since the
comparison is on sort keys derived from the whole normalized string.
The Default Unicode Collation Element Table is loaded lazily from
allkeys.txt (ext-formats: search list) on first use, pending its
serialization into unidata.bin.
Shadow and export the six comparison symbols in the UNICODE package.
- - - - -
2 changed files:
- src/code/exports.lisp
- src/code/unicode-collation.lisp
Changes:
=====================================
src/code/exports.lisp
=====================================
@@ -2307,7 +2307,13 @@
(:use "COMMON-LISP")
(:shadow "STRING-CAPITALIZE"
"STRING-DOWNCASE"
- "STRING-UPCASE")
+ "STRING-UPCASE"
+ "STRING<"
+ "STRING<="
+ "STRING="
+ "STRING/="
+ "STRING>"
+ "STRING>=")
(:import-from "LISP"
"CODEPOINT"
"SURROGATES"
@@ -2325,6 +2331,12 @@
(:export "STRING-CAPITALIZE"
"STRING-DOWNCASE"
"STRING-UPCASE"
- "STRING-NEXT-WORD-BREAK"))
+ "STRING-NEXT-WORD-BREAK"
+ "STRING<"
+ "STRING<="
+ "STRING="
+ "STRING/="
+ "STRING>"
+ "STRING>="))
=====================================
src/code/unicode-collation.lisp
=====================================
@@ -500,3 +500,93 @@ if after. VARIABLE-WEIGHTING is as in COLLATION-WEIGHTS."
(cond ((< (length k1) (length k2)) -1)
((> (length k1) (length k2)) 1)
(t 0)))))
+
+
+;;; -------------------------------------------------------------------
+;;; Public collation API (UNICODE package): the Unicode-aware
+;;; equivalents of the COMMON-LISP string comparison functions.
+;;;
+;;; These compare strings by the Unicode Collation Algorithm rather than
+;;; by code-point order, so the result reflects linguistic sort order
+;;; (after NFD normalization, with contractions, expansions and the
+;;; chosen variable-weighting option). The Default Unicode Collation
+;;; Element Table is loaded lazily on first use.
+;;;
+;;; Unlike the COMMON-LISP functions, these return a generalized boolean
+;;; (T or NIL) rather than a mismatch index: the comparison is performed
+;;; on sort keys derived from the whole normalized string, so there is
+;;; no meaningful character index of the first difference to return.
+;;; -------------------------------------------------------------------
+
+(in-package "UNICODE")
+
+(defvar *collation-table-path* "ext-formats:allkeys.txt"
+ "Pathname of the DUCET data file (allkeys.txt) from which the default
+collation table is loaded on first use.")
+
+(defvar *collation-table* nil
+ "The default Unicode collation table, or NIL if it has not yet been
+loaded. Loaded lazily from *COLLATION-TABLE-PATH* the first time a
+collation function needs it. Set to NIL to force a reload.")
+
+(defun collation-table ()
+ "Return the default Unicode collation table, loading it from
+*COLLATION-TABLE-PATH* on first use."
+ (or *collation-table*
+ (setf *collation-table* (lisp::load-ducet *collation-table-path*))))
+
+(defun %collation-compare (string1 string2 start1 end1 start2 end2
+ variable-weighting)
+ "Three-way collation comparison of the designated substrings of
+STRING1 and STRING2: returns a negative integer, zero, or a positive
+integer as the first sorts before, equal to, or after the second."
+ (let ((s1 (string string1))
+ (s2 (string string2)))
+ (when (or (/= start1 0) end1)
+ (setf s1 (subseq s1 start1 end1)))
+ (when (or (/= start2 0) end2)
+ (setf s2 (subseq s2 start2 end2)))
+ (lisp::collation-compare (collation-table) s1 s2 variable-weighting)))
+
+(defmacro %def-collation-predicate (name test docstring)
+ "Define a collation comparison predicate NAME whose result is (TEST c)
+where c is the three-way comparison of the two string arguments."
+ `(defun ,name (string1 string2 &key (start1 0) end1 (start2 0) end2
+ (variable-weighting :shifted))
+ ,docstring
+ (let ((c (%collation-compare string1 string2
+ start1 end1 start2 end2
+ variable-weighting)))
+ (,test c))))
+
+(%def-collation-predicate string= zerop
+ "Return true if STRING1 and STRING2 collate as equal under the Unicode
+Collation Algorithm. Note that this is collation equality, not
+code-point identity: canonically equivalent strings, and strings that
+differ only in collation-ignorable ways, compare equal. START1, END1,
+START2 and END2 bound the substrings compared; VARIABLE-WEIGHTING is
+:SHIFTED (the default) or :NON-IGNORABLE.")
+
+(%def-collation-predicate string/= (lambda (c) (not (zerop c)))
+ "Return true if STRING1 and STRING2 do not collate as equal. See
+UNICODE:STRING= for the meaning of the keyword arguments.")
+
+(%def-collation-predicate string< minusp
+ "Return true if STRING1 collates before STRING2 under the Unicode
+Collation Algorithm. See UNICODE:STRING= for the meaning of the keyword
+arguments.")
+
+(%def-collation-predicate string> plusp
+ "Return true if STRING1 collates after STRING2 under the Unicode
+Collation Algorithm. See UNICODE:STRING= for the meaning of the keyword
+arguments.")
+
+(%def-collation-predicate string<= (lambda (c) (not (plusp c)))
+ "Return true if STRING1 collates before or equal to STRING2 under the
+Unicode Collation Algorithm. See UNICODE:STRING= for the meaning of the
+keyword arguments.")
+
+(%def-collation-predicate string>= (lambda (c) (not (minusp c)))
+ "Return true if STRING1 collates after or equal to STRING2 under the
+Unicode Collation Algorithm. See UNICODE:STRING= for the meaning of the
+keyword arguments.")
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/f05b7f0e709186eefd49b5d…
--
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/f05b7f0e709186eefd49b5d…
You're receiving this email because of your account on gitlab.common-lisp.net. Manage all notifications: https://gitlab.common-lisp.net/-/profile/notifications | Help: https://gitlab.common-lisp.net/help
1
0