cmucl-cvs
Threads by month
- ----- 2025 -----
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
March 2013
- 1 participants
- 16 discussions
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-6-gc94b32f
by Raymond Toy 25 Mar '13
by Raymond Toy 25 Mar '13
25 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via c94b32f927061d6e7b7ea1ebf92ccdb4c3b1a842 (commit)
from 0232d2242e5acf9d1654cd0e4d7883de9e9f4705 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit c94b32f927061d6e7b7ea1ebf92ccdb4c3b1a842
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Sun Mar 24 20:13:56 2013 -0700
Fix ticket:77 correctly, using the supplied patch link.
diff --git a/src/code/exports.lisp b/src/code/exports.lisp
index 2c3e0ff..6c7bbed 100644
--- a/src/code/exports.lisp
+++ b/src/code/exports.lisp
@@ -2388,6 +2388,7 @@
"ATOMIC-PUSH" "CURRENT-PROCESS" "DESTROY-PROCESS" "DISABLE-PROCESS"
"ENABLE-PROCESS" "INIT-STACK-GROUPS" "LOCK" "MAKE-STACK-GROUP"
"MAKE-LOCK" "MAKE-PROCESS" "PROCESS-ACTIVE-P"
+ "PROCESS-JOIN"
"PROCESS-ADD-ARREST-REASON" "PROCESS-ADD-RUN-REASON"
"PROCESS-ALIVE-P" "PROCESS-ARREST-REASONS"
"PROCESS-IDLE-TIME" "PROCESS-INTERRUPT" "PROCESS-NAME"
diff --git a/src/code/multi-proc.lisp b/src/code/multi-proc.lisp
index b0ce883..e478a95 100644
--- a/src/code/multi-proc.lisp
+++ b/src/code/multi-proc.lisp
@@ -298,6 +298,7 @@
(%real-time 0d0 :type double-float)
(%run-time 0d0 :type double-float)
(property-list nil :type list)
+ (%return-values nil :type list)
(initial-bindings nil :type list))
@@ -956,9 +957,11 @@
(with-simple-restart
(destroy "Destroy the process")
(setf *inhibit-scheduling* nil)
- (apply-with-bindings function
- nil
- initial-bindings))
+ (setf (process-%return-values *current-process*)
+ (multiple-value-list
+ (apply-with-bindings function
+ nil
+ initial-bindings))))
;; Normal exit.
(throw '%end-of-the-process nil))))
(setf *inhibit-scheduling* t)
@@ -1973,19 +1976,7 @@
#-x86 (when (eq (lock-process ,lock) *current-process*)
(setf (lock-process ,lock) nil)))))))
-(defun %make-thread (function name)
- (mp:make-process (lambda ()
- (let ((return-values
- (multiple-value-list (funcall function))))
- (setf (getf (mp:process-property-list mp:*current-process*)
- 'return-values)
- return-values)
- (values-list return-values)))
- :name name))
-
-(defun join-thread (thread)
+(defun process-join (process)
(mp:process-wait (format nil "Waiting for thread ~A to complete" thread)
(lambda () (not (mp:process-alive-p thread))))
- (let ((return-values
- (getf (mp:process-property-list thread) 'return-values)))
- (values-list return-values)))
+ (values-list (process-%return-values process)))
diff --git a/src/general-info/release-20e.txt b/src/general-info/release-20e.txt
index 06c2fc6..d38c14c 100644
--- a/src/general-info/release-20e.txt
+++ b/src/general-info/release-20e.txt
@@ -42,7 +42,7 @@ New in this release:
* :I486 and :PENTIUM (Always assume we're running on at least a
Pentium.)
* Update unicode to support Unicode 6.2.
- * Add MP:JOIN-THREAD, as given in ticket #77.
+ * Add MP:PROCESS-JOIN, as given in ticket #77.
* ANSI compliance fixes:
* Attempts to modify the standard readtable or the standard pprint
-----------------------------------------------------------------------
Summary of changes:
src/code/exports.lisp | 1 +
src/code/multi-proc.lisp | 25 ++++++++-----------------
src/general-info/release-20e.txt | 2 +-
3 files changed, 10 insertions(+), 18 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-5-g0232d22
by Raymond Toy 24 Mar '13
by Raymond Toy 24 Mar '13
24 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via 0232d2242e5acf9d1654cd0e4d7883de9e9f4705 (commit)
from 50b13399f16ca49f7b54a0b5bc427ad0a67b9579 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 0232d2242e5acf9d1654cd0e4d7883de9e9f4705
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Sun Mar 24 09:43:33 2013 -0700
Fix ticket:77 by adding the code given in the ticket.
diff --git a/src/code/multi-proc.lisp b/src/code/multi-proc.lisp
index 8b2df04..b0ce883 100644
--- a/src/code/multi-proc.lisp
+++ b/src/code/multi-proc.lisp
@@ -1972,3 +1972,20 @@
,lock 2 *current-process* nil)
#-x86 (when (eq (lock-process ,lock) *current-process*)
(setf (lock-process ,lock) nil)))))))
+
+(defun %make-thread (function name)
+ (mp:make-process (lambda ()
+ (let ((return-values
+ (multiple-value-list (funcall function))))
+ (setf (getf (mp:process-property-list mp:*current-process*)
+ 'return-values)
+ return-values)
+ (values-list return-values)))
+ :name name))
+
+(defun join-thread (thread)
+ (mp:process-wait (format nil "Waiting for thread ~A to complete" thread)
+ (lambda () (not (mp:process-alive-p thread))))
+ (let ((return-values
+ (getf (mp:process-property-list thread) 'return-values)))
+ (values-list return-values)))
diff --git a/src/general-info/release-20e.txt b/src/general-info/release-20e.txt
index 5ec6b03..06c2fc6 100644
--- a/src/general-info/release-20e.txt
+++ b/src/general-info/release-20e.txt
@@ -42,6 +42,7 @@ New in this release:
* :I486 and :PENTIUM (Always assume we're running on at least a
Pentium.)
* Update unicode to support Unicode 6.2.
+ * Add MP:JOIN-THREAD, as given in ticket #77.
* ANSI compliance fixes:
* Attempts to modify the standard readtable or the standard pprint
@@ -70,6 +71,7 @@ New in this release:
* Ticket #74 fixed.
* Ticket #76 fixed.
* Ticket #79 fixed.
+ * Ticket #77 fixed.
* Other changes:
* -8 option for build-all.sh is deprecated since we don't
-----------------------------------------------------------------------
Summary of changes:
src/code/multi-proc.lisp | 17 +++++++++++++++++
src/general-info/release-20e.txt | 2 ++
2 files changed, 19 insertions(+), 0 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-4-g50b1339
by Raymond Toy 23 Mar '13
by Raymond Toy 23 Mar '13
23 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via 50b13399f16ca49f7b54a0b5bc427ad0a67b9579 (commit)
from a1c04fe77fbea96b5c038547f64a6fff0089ed77 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 50b13399f16ca49f7b54a0b5bc427ad0a67b9579
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Sat Mar 23 10:38:29 2013 -0700
Update from logs.
diff --git a/src/general-info/release-20e.txt b/src/general-info/release-20e.txt
index 231c85d..5ec6b03 100644
--- a/src/general-info/release-20e.txt
+++ b/src/general-info/release-20e.txt
@@ -53,6 +53,8 @@ New in this release:
supported length can be handled. (See ticket #66 and #68.)
* A serious error in FILE-POSITION on streams using an encoding
other than latin1 has been fixed. See ticket #74.
+ * Fix startup crashes on some Debian Linux versions. This was
+ caused by the release string not having a patch version.
* Trac Tickets:
* Ticket #52 reopened.
@@ -67,6 +69,7 @@ New in this release:
* Ticket #73 fixed.
* Ticket #74 fixed.
* Ticket #76 fixed.
+ * Ticket #79 fixed.
* Other changes:
* -8 option for build-all.sh is deprecated since we don't
-----------------------------------------------------------------------
Summary of changes:
src/general-info/release-20e.txt | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-3-ga1c04fe
by Raymond Toy 23 Mar '13
by Raymond Toy 23 Mar '13
23 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via a1c04fe77fbea96b5c038547f64a6fff0089ed77 (commit)
from f51ee9dc1f66b02f7a9a0826b70550f3bc9fb222 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit a1c04fe77fbea96b5c038547f64a6fff0089ed77
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Sat Mar 23 10:27:43 2013 -0700
Fix ticket:79
* Initialize in-length to in-buffer-length, not 0.
* Added a few more debugging prints.
diff --git a/src/code/fd-stream.lisp b/src/code/fd-stream.lisp
index 7fcd0bc..80e2941 100644
--- a/src/code/fd-stream.lisp
+++ b/src/code/fd-stream.lisp
@@ -260,7 +260,7 @@
;; in-buffer-length, but could be less if we reached the
;; end-of-file.
#+unicode
- (in-length 0 :type index)
+ (in-length in-buffer-length :type index)
;;
;; Indicates how to handle errors when converting octets to
;; characters. If NIL, then the external format should handle it
@@ -1697,6 +1697,8 @@
(posn errno)
(unix:unix-lseek (fd-stream-fd stream) 0 unix:l_incr)
(declare (type (or (integer 0) null) posn))
+ #+nil
+ (format t "lseek returns ~D ~D~%" posn errno)
(cond (posn
;; Adjust for buffered output:
;; If there is any output buffered, the *real* file position
@@ -1716,6 +1718,8 @@
(decf posn (- (fd-stream-ibuf-tail stream)
(fd-stream-ibuf-head stream)))
+ #+nil
+ (format t "Updated posn = ~D~%" posn)
#+unicode
(when (fd-stream-string-buffer stream)
;; The string buffer contains Lisp characters,
@@ -1742,6 +1746,7 @@
(progn
(format t "new posn = ~D~%" posn)
(format t "in-buffer-length = ~D~%" in-buffer-length)
+ (format t "in-length = ~D~%" (fd-stream-in-length stream))
(format t "fd-stream-in-index = ~D~%" (fd-stream-in-index stream))))))
(when (fd-stream-in-buffer stream)
;; When we have an in-buffer (whether we have a
-----------------------------------------------------------------------
Summary of changes:
src/code/fd-stream.lisp | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-2-gf51ee9d
by Raymond Toy 23 Mar '13
by Raymond Toy 23 Mar '13
23 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via f51ee9dc1f66b02f7a9a0826b70550f3bc9fb222 (commit)
from 1e8b06be53f874e64d4f687247188349388fb1b4 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit f51ee9dc1f66b02f7a9a0826b70550f3bc9fb222
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Fri Mar 22 20:10:16 2013 -0700
Try to be careful about extracting the linux version from the (uname)
release. Some Debian versions have a release name like "3.7-trunk",
which is missing the patch version.
diff --git a/src/lisp/Linux-os.c b/src/lisp/Linux-os.c
index 296ff2a..cbd25fd 100644
--- a/src/lisp/Linux-os.c
+++ b/src/lisp/Linux-os.c
@@ -76,12 +76,26 @@ check_personality(struct utsname *name, char *const *argv, char *const *envp)
#if defined(__i386) || defined(__x86_64)
int major_version, minor_version, patch_version;
char *p;
+
p = name->release;
major_version = atoi(p);
- p = strchr(p,'.')+1;
- minor_version = atoi(p);
- p = strchr(p,'.')+1;
- patch_version = atoi(p);
+
+ /*
+ * Try to extract the minor and patch version, but if we can't
+ * just set it to zero. In particular, some Debian systems have a
+ * release like "3.7-trunk-686-pae" which is missing the patch
+ * version.
+ */
+
+ p = strchr(p,'.');
+ if (p) {
+ minor_version = atoi(p + 1);
+ p = strchr(p + 1,'.');
+ patch_version = p ? atoi(p + 1) : 0;
+ } else {
+ minor_version = 0;
+ patch_version = 0;
+ }
if ((major_version == 2
/* Some old kernels will apparently lose unsupported personality flags
-----------------------------------------------------------------------
Summary of changes:
src/lisp/Linux-os.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-a-1-g1e8b06b
by Raymond Toy 10 Mar '13
by Raymond Toy 10 Mar '13
10 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via 1e8b06be53f874e64d4f687247188349388fb1b4 (commit)
from 90b155a2a8cbf269e022f191f9b8566da8ace0da (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 1e8b06be53f874e64d4f687247188349388fb1b4
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Sat Mar 9 21:24:25 2013 -0800
Support ppc.
diff --git a/bin/build-all.sh b/bin/build-all.sh
index 7cf7c88..d5823af 100755
--- a/bin/build-all.sh
+++ b/bin/build-all.sh
@@ -60,8 +60,11 @@ done
# If -b not given, try to derive one instead of just using "build".
if [ -z "$BASE" ]; then
case `uname -s` in
- Darwin) # We only support darwin-x86 now. No ppc available anymore.
- BASE=darwin ;;
+ Darwin)
+ case `uname -p` in
+ powerpc) BASE=ppc ;;
+ i386) BASE=darwin ;;
+ esac ;;
SunOS)
case `uname -m` in
sun4u) BASE=sparc ;;
@@ -118,6 +121,8 @@ buildsun4 ()
case `uname -m` in
i386*|x86*|i86pc) buildx86 ;;
- sun*) buildsun4 ;;
+ sun*|"Power Mac*")
+ # buildsun4 works for sparc and ppc.
+ buildsun4 ;;
*) echo "Unsupported architecture: `uname -m`" ;;
esac
-----------------------------------------------------------------------
Summary of changes:
bin/build-all.sh | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp annotated tag snapshot-2013-03-a created. snapshot-2013-03-a
by Raymond Toy 08 Mar '13
by Raymond Toy 08 Mar '13
08 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The annotated tag, snapshot-2013-03-a has been created
at 415e27b9c0f1c1cf45733384ffc1c3926cfcff36 (tag)
tagging 90b155a2a8cbf269e022f191f9b8566da8ace0da (commit)
replaces snapshot-2013-03
tagged by Raymond Toy
on Thu Mar 7 21:09:51 2013 -0800
- Log -----------------------------------------------------------------
Snapshot 2013-03-a fixing a serious issue in REPLACE, caused by
a bug in DO-UNARY-BYTE-BASH.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
iEYEABECAAYFAlE5csAACgkQJ5IjUmgZO7KCjACdE7QJR5iHCwutBlhj8Wyk4xjN
qaoAoNK0u7HfwhovgKWDpIk8xmp7sqq5
=UvC0
-----END PGP SIGNATURE-----
Raymond Toy (12):
Update to Unicode 6.2.
Oops. Remove debugging echo.
Note ticket #74 fixed, and move a Change item to a Bugfix item.
Update to ASDF 2.32.
Update from logs.
Fix PARSE-WORD-BREAK-LINE to handle codepoints outside the BMP. The
Implement Rule WB13c for regional indicators.
Fix ticket:76
Update.
Merge branch 'master' into rtoy-unicode-6.2
Reindent STRING-NEXT-WORD-BREAK neatly.
Update.
-----------------------------------------------------------------------
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch rtoy-unicode-6.2 created. snapshot-2013-03-12-g90b155a
by Raymond Toy 07 Mar '13
by Raymond Toy 07 Mar '13
07 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, rtoy-unicode-6.2 has been created
at 90b155a2a8cbf269e022f191f9b8566da8ace0da (commit)
- Log -----------------------------------------------------------------
-----------------------------------------------------------------------
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-12-g90b155a
by Raymond Toy 07 Mar '13
by Raymond Toy 07 Mar '13
07 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via 90b155a2a8cbf269e022f191f9b8566da8ace0da (commit)
via e129c45a44b1dc1bd8806f19caf5782ca5f60f78 (commit)
via 58b88ebd2133e51ad084ae6835dc65137138cfb3 (commit)
via cae10dd1d8688fdbcd1e4c3a16d0130b8e8cdb41 (commit)
via b735224c492e7ff7a2dcd4fe1804a950401e8a65 (commit)
via 424edfe8570cd4eb38086d6bdbaa8cd7b0030772 (commit)
from 10ebd126e43b344377d384c55c1c611a82e9f4ae (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 90b155a2a8cbf269e022f191f9b8566da8ace0da
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 19:18:04 2013 -0800
Update.
diff --git a/src/general-info/release-20e.txt b/src/general-info/release-20e.txt
index a438863..231c85d 100644
--- a/src/general-info/release-20e.txt
+++ b/src/general-info/release-20e.txt
@@ -41,6 +41,8 @@ New in this release:
derivation.)
* :I486 and :PENTIUM (Always assume we're running on at least a
Pentium.)
+ * Update unicode to support Unicode 6.2.
+
* ANSI compliance fixes:
* Attempts to modify the standard readtable or the standard pprint
dispatch table will now signal a continuable error.
commit e129c45a44b1dc1bd8806f19caf5782ca5f60f78
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 00:37:50 2013 -0800
Reindent STRING-NEXT-WORD-BREAK neatly.
diff --git a/src/code/string.lisp b/src/code/string.lisp
index 7c6e3c1..4b90930 100644
--- a/src/code/string.lisp
+++ b/src/code/string.lisp
@@ -1586,80 +1586,80 @@
2
1))
(cat (char-word-break-category c)))
- (case cat
- ((:extend-or-format)
- (case context
- ((:cr :sep) j)
- (otherwise (lookup (+ j next-j) context))))
- (otherwise
- (case context
- ((:cr)
- (if (= c (char-code #\linefeed))
- ;; Rule WB3: Don't break CRLF, continue looking
- (lookup (+ j next-j) cat)
- j))
- ((:aletter)
- (case cat
- ((:aletter :numeric :extendnumlet)
- ;; Rules WB5, WB9, ?
- (lookup (+ j next-j) cat))
- ((:midletter :midnumlet)
- ;; Rule WB6, need to keep looking
- (lookup (+ j next-j) :aletter-midletter))
- (otherwise j)))
- ((:aletter-midletter)
- (case cat
- ((:aletter)
- ;; Rule WB7
- (lookup (+ j next-j) cat))
- (otherwise
- ;; Rule WB6 and WB7 were extended, but the
- ;; region didn't end with :aletter. So
- ;; backup and break at that point.
- (let ((j2 (index-of-previous-non-ignored j)))
- (if (< i j2) j2 j)))))
- ((:numeric)
- (case cat
- ((:numeric :aletter :extendnumlet)
- ;; Rules WB8, WB10, ?
- (lookup (+ j next-j) cat))
- ((:midnum :midnumlet)
- ;; Rules WB11, need to keep looking
- (lookup (+ j next-j) :numeric-midnum))
- (otherwise j)))
- ((:numeric-midnum)
- (case cat
- ((:numeric)
- ;; Rule WB11, keep looking
- (lookup (+ j next-j) cat))
- (otherwise
- ;; Rule WB11, WB12 were extended, but the
- ;; region didn't end with :numeric, so
- ;; backup and break at that point.
- (let ((j2 (index-of-previous-non-ignored j)))
- (if (< i j2) j2 j)))))
- ((:midletter :midnum :midnumlet)
- ;; Rule WB14
- j)
- ((:katakana)
- (case cat
- ((:katakana :extendnumlet)
- ;; Rule WB13, WB13a
- (lookup (+ j next-j) cat))
- (otherwise j)))
- ((:extendnumlet)
- (case cat
- ((:extendnumlet :aletter :numeric :katakana)
- ;; Rule WB13a, WB13b
- (lookup (+ j next-j) cat))
- (otherwise j)))
- ((:regional_indicator)
- (case cat
- ((:regional_indicator)
- ;; Rule WB13c
- (lookup (+ j next-j) cat))
- (otherwise j)))
- (otherwise j)))))))))
+ (case cat
+ ((:extend-or-format)
+ (case context
+ ((:cr :sep) j)
+ (otherwise (lookup (+ j next-j) context))))
+ (otherwise
+ (case context
+ ((:cr)
+ (if (= c (char-code #\linefeed))
+ ;; Rule WB3: Don't break CRLF, continue looking
+ (lookup (+ j next-j) cat)
+ j))
+ ((:aletter)
+ (case cat
+ ((:aletter :numeric :extendnumlet)
+ ;; Rules WB5, WB9, ?
+ (lookup (+ j next-j) cat))
+ ((:midletter :midnumlet)
+ ;; Rule WB6, need to keep looking
+ (lookup (+ j next-j) :aletter-midletter))
+ (otherwise j)))
+ ((:aletter-midletter)
+ (case cat
+ ((:aletter)
+ ;; Rule WB7
+ (lookup (+ j next-j) cat))
+ (otherwise
+ ;; Rule WB6 and WB7 were extended, but the
+ ;; region didn't end with :aletter. So
+ ;; backup and break at that point.
+ (let ((j2 (index-of-previous-non-ignored j)))
+ (if (< i j2) j2 j)))))
+ ((:numeric)
+ (case cat
+ ((:numeric :aletter :extendnumlet)
+ ;; Rules WB8, WB10, ?
+ (lookup (+ j next-j) cat))
+ ((:midnum :midnumlet)
+ ;; Rules WB11, need to keep looking
+ (lookup (+ j next-j) :numeric-midnum))
+ (otherwise j)))
+ ((:numeric-midnum)
+ (case cat
+ ((:numeric)
+ ;; Rule WB11, keep looking
+ (lookup (+ j next-j) cat))
+ (otherwise
+ ;; Rule WB11, WB12 were extended, but the
+ ;; region didn't end with :numeric, so
+ ;; backup and break at that point.
+ (let ((j2 (index-of-previous-non-ignored j)))
+ (if (< i j2) j2 j)))))
+ ((:midletter :midnum :midnumlet)
+ ;; Rule WB14
+ j)
+ ((:katakana)
+ (case cat
+ ((:katakana :extendnumlet)
+ ;; Rule WB13, WB13a
+ (lookup (+ j next-j) cat))
+ (otherwise j)))
+ ((:extendnumlet)
+ (case cat
+ ((:extendnumlet :aletter :numeric :katakana)
+ ;; Rule WB13a, WB13b
+ (lookup (+ j next-j) cat))
+ (otherwise j)))
+ ((:regional_indicator)
+ (case cat
+ ((:regional_indicator)
+ ;; Rule WB13c
+ (lookup (+ j next-j) cat))
+ (otherwise j)))
+ (otherwise j)))))))))
(declare (notinline lookup left-context))
(cond ((< i 0)
;; Rule WB1
commit 58b88ebd2133e51ad084ae6835dc65137138cfb3
Merge: cae10dd 10ebd12
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 00:30:11 2013 -0800
Merge branch 'master' into rtoy-unicode-6.2
commit cae10dd1d8688fdbcd1e4c3a16d0130b8e8cdb41
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 00:13:22 2013 -0800
Implement Rule WB13c for regional indicators.
diff --git a/src/code/string.lisp b/src/code/string.lisp
index 5a1a814..7c6e3c1 100644
--- a/src/code/string.lisp
+++ b/src/code/string.lisp
@@ -1653,6 +1653,12 @@
;; Rule WB13a, WB13b
(lookup (+ j next-j) cat))
(otherwise j)))
+ ((:regional_indicator)
+ (case cat
+ ((:regional_indicator)
+ ;; Rule WB13c
+ (lookup (+ j next-j) cat))
+ (otherwise j)))
(otherwise j)))))))))
(declare (notinline lookup left-context))
(cond ((< i 0)
commit b735224c492e7ff7a2dcd4fe1804a950401e8a65
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Tue Mar 5 22:15:43 2013 -0800
Fix PARSE-WORD-BREAK-LINE to handle codepoints outside the BMP. The
count needs to be incremented one to adjust for the UTF-16 encoding of
strings that we use.
diff --git a/src/i18n/tests/word-break-test.lisp b/src/i18n/tests/word-break-test.lisp
index 2fefec7..899b5a2 100644
--- a/src/i18n/tests/word-break-test.lisp
+++ b/src/i18n/tests/word-break-test.lisp
@@ -33,8 +33,12 @@
(let ((c (read s nil nil)))
(unless c
(return))
+ ;; Handle codepoints outside the BMP carefully.
(if (> c #xffff)
(let ((s (lisp::codepoints-string (list c))))
+ ;; Need to increment the count because of our
+ ;; UTF-16 encoding of strings.
+ (incf count)
(vector-push-extend (aref s 0) string)
(vector-push-extend (aref s 1) string))
(vector-push-extend (code-char c) string))
commit 424edfe8570cd4eb38086d6bdbaa8cd7b0030772
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Mon Mar 4 21:54:28 2013 -0800
Update to Unicode 6.2.
Still needs work because the word-break tests fail.
diff --git a/src/code/unidata.lisp b/src/code/unidata.lisp
index 37134cc..55e3a28 100644
--- a/src/code/unidata.lisp
+++ b/src/code/unidata.lisp
@@ -22,7 +22,7 @@
(defvar *unidata-path* #p"ext-formats:unidata.bin")
-(defvar *unidata-version* "$Revision: 1.28 $")
+(defvar *unidata-version* "$Revision: 1.29 $")
(defstruct unidata
range
@@ -61,7 +61,7 @@
;; The expected Unicode version. This needs to be synced with
;; build-unidata.lisp.
(defconstant +unicode-major-version+ 6)
-(defconstant +unicode-minor-version+ 1)
+(defconstant +unicode-minor-version+ 2)
(defconstant +unicode-update-version+ 0)
;;; These need to be synched with tools/build-unidata.lisp
@@ -1163,7 +1163,7 @@
;; pack-word-break in tools/build-unidata.lisp!
(aref #(:other :cr :lf :newline :extend :format
:katakana :aletter :midnumlet :midletter :midnum
- :numeric :extendnumlet)
+ :numeric :extendnumlet :regional_indicator)
(unicode-word-break-code code)))
;; Support for character name completion for slime.
diff --git a/src/i18n/BidiMirroring.txt b/src/i18n/BidiMirroring.txt
index 2e719bc..ec41b76 100644
--- a/src/i18n/BidiMirroring.txt
+++ b/src/i18n/BidiMirroring.txt
@@ -1,19 +1,19 @@
-# BidiMirroring-6.1.0.txt
-# Date: 2011-12-20, 19:31:00 GMT [KW, LI]
+# BidiMirroring-6.2.0.txt
+# Date: 2012-05-15, 24:19:00 GMT [KW, LI]
#
# Bidi_Mirroring_Glyph Property
#
# This file is an informative contributory data file in the
# Unicode Character Database.
#
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# This data file lists characters that have the Bidi_Mirrored=Yes property
# value, for which there is another Unicode character that typically has a glyph
# that is the mirror image of the original character's glyph.
#
-# The repertoire covered by the file is Unicode 6.1.0.
+# The repertoire covered by the file is Unicode 6.2.0.
#
# The file contains a list of lines with mappings from one code point
# to another one for character-based mirroring.
@@ -30,16 +30,8 @@
# characters exist with mirrored glyphs, are
# listed as comments at the end of the file.
#
-# Note: (2011-12-19) There is an inconsistency between the
-# following statement about the default value
-# of the Bidi_Mirroring_Glyph property and the
-# value of the @missing line for Bidi_Mirroring_Glyph in
-# PropertyValueAliases.txt. This inconsistency was discovered too
-# late in the release process to be resolved by
-# the UTC. The inconsistency will be resolved in a future revision.
-#
# Formally, the default value of the Bidi_Mirroring_Glyph property
-# for each code point is the code point itself, unless a mapping to
+# for each code point is <none>, unless a mapping to
# some other character is specified in this data file. When a code
# point has the default value for the Bidi_Mirroring_Glyph property,
# that means that no other character exists whose glyph is suitable
@@ -50,12 +42,13 @@
#
# This file was originally created by Markus Scherer.
# Extended for Unicode 3.2, 4.0, 4.1, 5.0, 5.1, 5.2, and 6.0 by Ken Whistler,
-# and for Unicode 6.1 by Ken Whistler and Laurentiu Iancu.
+# and for Unicode 6.1 and 6.2 by Ken Whistler and Laurentiu Iancu.
#
# ############################################################
#
# Property: Bidi_Mirroring_Glyph
#
+# @missing: 0000..10FFFF; <none>
0028; 0029 # LEFT PARENTHESIS
0029; 0028 # RIGHT PARENTHESIS
diff --git a/src/i18n/CaseFolding.txt b/src/i18n/CaseFolding.txt
index 0d9a409..df1813d 100644
--- a/src/i18n/CaseFolding.txt
+++ b/src/i18n/CaseFolding.txt
@@ -1,8 +1,8 @@
-# CaseFolding-6.1.0.txt
-# Date: 2011-07-25, 21:21:56 GMT [MD]
+# CaseFolding-6.2.0.txt
+# Date: 2012-08-14, 17:54:49 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
#
@@ -1222,3 +1222,5 @@ FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z
10425; C; 1044D; # DESERET CAPITAL LETTER ENG
10426; C; 1044E; # DESERET CAPITAL LETTER OI
10427; C; 1044F; # DESERET CAPITAL LETTER EW
+#
+# EOF
diff --git a/src/i18n/CompositionExclusions.txt b/src/i18n/CompositionExclusions.txt
index f12f7d6..cd19f42 100644
--- a/src/i18n/CompositionExclusions.txt
+++ b/src/i18n/CompositionExclusions.txt
@@ -1,5 +1,5 @@
-# CompositionExclusions-6.1.0.txt
-# Date: 2011-07-12, 00:13:00 GMT [KW, LI]
+# CompositionExclusions-6.2.0.txt
+# Date: 2012-05-15, 22:21:00 GMT [KW, LI]
#
# This file lists the characters for the Composition Exclusion Table
# defined in UAX #15, Unicode Normalization Forms.
@@ -7,7 +7,7 @@
# This file is a normative contributory data file in the
# Unicode Character Database.
#
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# For more information, see
@@ -203,3 +203,4 @@ FB4E # HEBREW LETTER PE WITH RAFE
# Total code points: 4
+# EOF
diff --git a/src/i18n/DerivedNormalizationProps.txt b/src/i18n/DerivedNormalizationProps.txt
index 2d71747..2ecd8e2 100644
--- a/src/i18n/DerivedNormalizationProps.txt
+++ b/src/i18n/DerivedNormalizationProps.txt
@@ -1,8 +1,8 @@
-# DerivedNormalizationProps-6.1.0.txt
-# Date: 2011-07-26, 04:18:07 GMT [MD]
+# DerivedNormalizationProps-6.2.0.txt
+# Date: 2012-05-23, 20:34:48 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
diff --git a/src/i18n/NameAliases.txt b/src/i18n/NameAliases.txt
index 3992620..482fb92 100644
--- a/src/i18n/NameAliases.txt
+++ b/src/i18n/NameAliases.txt
@@ -1,5 +1,5 @@
-# NameAliases-6.1.0.txt
-# Date: 2012-01-03, 21:52:00 GMT [KW]
+# NameAliases-6.2.0.txt
+# Date: 2012-05-15, 18:44:00 GMT [KW]
#
# This file is a normative contributory data file in the
# Unicode Character Database.
@@ -216,6 +216,7 @@
01A2;LATIN CAPITAL LETTER GHA;correction
01A3;LATIN SMALL LETTER GHA;correction
034F;CGJ;abbreviation
+0709;SYRIAC SUBLINEAR COLON SKEWED LEFT;correction
0CDE;KANNADA LETTER LLLA;correction
0E9D;LAO LETTER FO FON;correction
0E9F;LAO LETTER FO FAY;correction
diff --git a/src/i18n/NormalizationCorrections.txt b/src/i18n/NormalizationCorrections.txt
index 61800b8..b53bb40 100644
--- a/src/i18n/NormalizationCorrections.txt
+++ b/src/i18n/NormalizationCorrections.txt
@@ -1,10 +1,10 @@
-# NormalizationCorrections-6.1.0.txt
-# Date: 2011-06-23, 00:46:00 GMT [KW, LI]
+# NormalizationCorrections-6.2.0.txt
+# Date: 2012-05-15, 22:25:00 GMT [KW, LI]
#
# This file is a normative contributory data file in the
# Unicode Character Database.
#
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# The normalization stability policy of the Unicode Consortium
@@ -46,3 +46,5 @@ F951;96FB;964B;3.2.0 # Corrigendum 3
2F91F;43AB;243AB;4.0.0 # Corrigendum 4
2F95F;7AAE;7AEE;4.0.0 # Corrigendum 4
2F9BF;4D57;45D7;4.0.0 # Corrigendum 4
+
+# EOF
diff --git a/src/i18n/SpecialCasing.txt b/src/i18n/SpecialCasing.txt
index d650b6d..994043f 100644
--- a/src/i18n/SpecialCasing.txt
+++ b/src/i18n/SpecialCasing.txt
@@ -1,8 +1,8 @@
-# SpecialCasing-6.1.0.txt
-# Date: 2011-11-27, 05:10:51 GMT [MD]
+# SpecialCasing-6.2.0.txt
+# Date: 2012-05-23, 20:35:15 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
#
diff --git a/src/i18n/UnicodeData.txt b/src/i18n/UnicodeData.txt
index 9f20405..086379e 100644
--- a/src/i18n/UnicodeData.txt
+++ b/src/i18n/UnicodeData.txt
@@ -7190,6 +7190,7 @@
20B7;SPESMILO SIGN;Sc;0;ET;;;;;N;;;;;
20B8;TENGE SIGN;Sc;0;ET;;;;;N;;;;;
20B9;INDIAN RUPEE SIGN;Sc;0;ET;;;;;N;;;;;
+20BA;TURKISH LIRA SIGN;Sc;0;ET;;;;;N;;;;;
20D0;COMBINING LEFT HARPOON ABOVE;Mn;230;NSM;;;;;N;NON-SPACING LEFT HARPOON ABOVE;;;;
20D1;COMBINING RIGHT HARPOON ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RIGHT HARPOON ABOVE;;;;
20D2;COMBINING LONG VERTICAL LINE OVERLAY;Mn;1;NSM;;;;;N;NON-SPACING LONG VERTICAL BAR OVERLAY;;;;
@@ -18703,8 +18704,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
1242F;CUNEIFORM NUMERIC SIGN THREE SHARU VARIANT FORM;Nl;0;L;;;;3;N;;;;;
12430;CUNEIFORM NUMERIC SIGN FOUR SHARU;Nl;0;L;;;;4;N;;;;;
12431;CUNEIFORM NUMERIC SIGN FIVE SHARU;Nl;0;L;;;;5;N;;;;;
-12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;Nl;0;L;;;;;N;;;;;
-12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;Nl;0;L;;;;;N;;;;;
+12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;Nl;0;L;;;;216000;N;;;;;
+12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;Nl;0;L;;;;432000;N;;;;;
12434;CUNEIFORM NUMERIC SIGN ONE BURU;Nl;0;L;;;;1;N;;;;;
12435;CUNEIFORM NUMERIC SIGN TWO BURU;Nl;0;L;;;;2;N;;;;;
12436;CUNEIFORM NUMERIC SIGN THREE BURU;Nl;0;L;;;;3;N;;;;;
@@ -18739,8 +18740,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
12453;CUNEIFORM NUMERIC SIGN FOUR BAN2 VARIANT FORM;Nl;0;L;;;;4;N;;;;;
12454;CUNEIFORM NUMERIC SIGN FIVE BAN2;Nl;0;L;;;;5;N;;;;;
12455;CUNEIFORM NUMERIC SIGN FIVE BAN2 VARIANT FORM;Nl;0;L;;;;5;N;;;;;
-12456;CUNEIFORM NUMERIC SIGN NIGIDAMIN;Nl;0;L;;;;;N;;;;;
-12457;CUNEIFORM NUMERIC SIGN NIGIDAESH;Nl;0;L;;;;;N;;;;;
+12456;CUNEIFORM NUMERIC SIGN NIGIDAMIN;Nl;0;L;;;;-1;N;;;;;
+12457;CUNEIFORM NUMERIC SIGN NIGIDAESH;Nl;0;L;;;;-1;N;;;;;
12458;CUNEIFORM NUMERIC SIGN ONE ESHE3;Nl;0;L;;;;1;N;;;;;
12459;CUNEIFORM NUMERIC SIGN TWO ESHE3;Nl;0;L;;;;2;N;;;;;
1245A;CUNEIFORM NUMERIC SIGN ONE THIRD DISH;Nl;0;L;;;;1/3;N;;;;;
diff --git a/src/i18n/WordBreakProperty.txt b/src/i18n/WordBreakProperty.txt
index 7f3225c..2caa16b 100644
--- a/src/i18n/WordBreakProperty.txt
+++ b/src/i18n/WordBreakProperty.txt
@@ -1,8 +1,8 @@
-# WordBreakProperty-6.1.0.txt
-# Date: 2011-11-27, 05:10:51 GMT [MD]
+# WordBreakProperty-6.2.0.txt
+# Date: 2012-08-13, 19:12:09 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
@@ -395,6 +395,12 @@ E0100..E01EF ; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
# ================================================
+1F1E6..1F1FF ; Regional_Indicator # So [26] REGIONAL INDICATOR SYMBOL LETTER A..REGIONAL INDICATOR SYMBOL LETTER Z
+
+# Total code points: 26
+
+# ================================================
+
00AD ; Format # Cf SOFT HYPHEN
0600..0604 ; Format # Cf [5] ARABIC NUMBER SIGN..ARABIC SIGN SAMVAT
06DD ; Format # Cf ARABIC END OF AYAH
diff --git a/src/i18n/tests/NormalizationTest.txt b/src/i18n/tests/NormalizationTest.txt
index 68e5f07..806021a 100644
--- a/src/i18n/tests/NormalizationTest.txt
+++ b/src/i18n/tests/NormalizationTest.txt
@@ -1,8 +1,8 @@
-# NormalizationTest-6.1.0.txt
-# Date: 2011-11-27, 05:10:33 GMT [MD]
+# NormalizationTest-6.2.0.txt
+# Date: 2012-08-14, 17:54:58 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
#
@@ -18428,4 +18428,4 @@ D750 0334 11B5;D750 0334 11B5;1112 1173 0334 11B5;D750 0334 11B5;1112 1173 0334
11131 0334 11127;11131 0334 11127;11131 0334 11127;11131 0334 11127;11131 0334 11127; # (◌𑄱◌̴◌𑄧; ◌𑄱◌̴◌𑄧; ◌𑄱◌̴◌𑄧; ◌𑄱◌̴◌𑄧; ◌𑄱◌̴◌𑄧; ) CHAKMA O MARK, COMBINING TILDE OVERLAY, CHAKMA VOWEL SIGN A
11132 0334 11127;11132 0334 11127;11132 0334 11127;11132 0334 11127;11132 0334 11127; # (◌𑄲◌̴◌𑄧; ◌𑄲◌̴◌𑄧; ◌𑄲◌̴◌𑄧; ◌𑄲◌̴◌𑄧; ◌𑄲◌̴◌𑄧; ) CHAKMA AU MARK, COMBINING TILDE OVERLAY, CHAKMA VOWEL SIGN A
#
-# END OF FILE
+# EOF
diff --git a/src/i18n/tests/WordBreakTest.txt b/src/i18n/tests/WordBreakTest.txt
index 7957ea3..864dbce 100644
--- a/src/i18n/tests/WordBreakTest.txt
+++ b/src/i18n/tests/WordBreakTest.txt
@@ -1,8 +1,8 @@
-# WordBreakTest-6.1.0.txt
-# Date: 2011-12-07, 23:28:40 GMT [MD]
+# WordBreakTest-6.2.0.txt
+# Date: 2012-08-22, 12:41:18 GMT [MD]
#
# Unicode Character Database
-# Copyright (c) 1991-2011 Unicode, Inc.
+# Copyright (c) 1991-2012 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
# For documentation, see http://www.unicode.org/reports/tr44/
#
@@ -15,7 +15,7 @@
# × wherever there is not.
# <comment> the format can change, but currently it shows:
# - the sample character name
-# - (x) the Word_Break property* for the sample character
+# - (x) the Word_Break property value for the sample character
# - [x] the rule that determines whether there is a break or not
#
# These samples may be extended or changed in the future.
@@ -42,6 +42,8 @@
÷ 0001 × 0308 ÷ 0030 ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0001 ÷ 005F ÷ # ÷ [0.2] <START OF HEADING> (Other) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0001 × 0308 ÷ 005F ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0001 ÷ 1F1E6 ÷ # ÷ [0.2] <START OF HEADING> (Other) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0001 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0001 × 00AD ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0001 × 0308 × 00AD ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0001 × 0300 ÷ # ÷ [0.2] <START OF HEADING> (Other) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -86,6 +88,8 @@
÷ 000D ÷ 0308 ÷ 0030 ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 000D ÷ 005F ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 000D ÷ 0308 ÷ 005F ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 000D ÷ 1F1E6 ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 000D ÷ 0308 ÷ 1F1E6 ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 000D ÷ 00AD ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000D ÷ 0308 × 00AD ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000D ÷ 0300 ÷ # ÷ [0.2] <CARRIAGE RETURN (CR)> (CR) ÷ [3.1] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -130,6 +134,8 @@
÷ 000A ÷ 0308 ÷ 0030 ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 000A ÷ 005F ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 000A ÷ 0308 ÷ 005F ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 000A ÷ 1F1E6 ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 000A ÷ 0308 ÷ 1F1E6 ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 000A ÷ 00AD ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000A ÷ 0308 × 00AD ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000A ÷ 0300 ÷ # ÷ [0.2] <LINE FEED (LF)> (LF) ÷ [3.1] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -174,6 +180,8 @@
÷ 000B ÷ 0308 ÷ 0030 ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 000B ÷ 005F ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 000B ÷ 0308 ÷ 005F ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 000B ÷ 1F1E6 ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 000B ÷ 0308 ÷ 1F1E6 ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 000B ÷ 00AD ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000B ÷ 0308 × 00AD ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 000B ÷ 0300 ÷ # ÷ [0.2] <LINE TABULATION> (Newline) ÷ [3.1] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -218,6 +226,8 @@
÷ 3031 × 0308 ÷ 0030 ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 3031 × 005F ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 3031 × 0308 × 005F ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 3031 ÷ 1F1E6 ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 3031 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 3031 × 00AD ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 3031 × 0308 × 00AD ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 3031 × 0300 ÷ # ÷ [0.2] VERTICAL KANA REPEAT MARK (Katakana) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -262,6 +272,8 @@
÷ 0041 × 0308 × 0030 ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [9.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0041 × 005F ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0041 × 0308 × 005F ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0041 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0041 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0041 × 00AD ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0041 × 0308 × 00AD ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0041 × 0300 ÷ # ÷ [0.2] LATIN CAPITAL LETTER A (ALetter) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -306,6 +318,8 @@
÷ 003A × 0308 ÷ 0030 ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 003A ÷ 005F ÷ # ÷ [0.2] COLON (MidLetter) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 003A × 0308 ÷ 005F ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 003A ÷ 1F1E6 ÷ # ÷ [0.2] COLON (MidLetter) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 003A × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 003A × 00AD ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 003A × 0308 × 00AD ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 003A × 0300 ÷ # ÷ [0.2] COLON (MidLetter) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -350,6 +364,8 @@
÷ 002C × 0308 ÷ 0030 ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 002C ÷ 005F ÷ # ÷ [0.2] COMMA (MidNum) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 002C × 0308 ÷ 005F ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 002C ÷ 1F1E6 ÷ # ÷ [0.2] COMMA (MidNum) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 002C × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 002C × 00AD ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 002C × 0308 × 00AD ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 002C × 0300 ÷ # ÷ [0.2] COMMA (MidNum) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -394,6 +410,8 @@
÷ 0027 × 0308 ÷ 0030 ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0027 ÷ 005F ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0027 × 0308 ÷ 005F ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0027 ÷ 1F1E6 ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0027 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0027 × 00AD ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0027 × 0308 × 00AD ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0027 × 0300 ÷ # ÷ [0.2] APOSTROPHE (MidNumLet) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -438,6 +456,8 @@
÷ 0030 × 0308 × 0030 ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] COMBINING DIAERESIS (Extend_FE) × [8.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0030 × 005F ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0030 × 0308 × 005F ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0030 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ZERO (Numeric) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0030 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0030 × 00AD ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0030 × 0308 × 00AD ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0030 × 0300 ÷ # ÷ [0.2] DIGIT ZERO (Numeric) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -482,6 +502,8 @@
÷ 005F × 0308 × 0030 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.2] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 005F × 005F ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 005F × 0308 × 005F ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 005F ÷ 1F1E6 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 005F × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 005F × 00AD ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 005F × 0308 × 00AD ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 005F × 0300 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -504,6 +526,52 @@
÷ 005F × 0308 × 0031 ÷ 002C ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
÷ 005F × 0031 ÷ 002E × 2060 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [13.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
÷ 005F × 0308 × 0031 ÷ 002E × 2060 ÷ # ÷ [0.2] LOW LINE (ExtendNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 ÷ 0001 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] <START OF HEADING> (Other) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0001 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] <START OF HEADING> (Other) ÷ [0.3]
+÷ 1F1E6 ÷ 000D ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [3.2] <CARRIAGE RETURN (CR)> (CR) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 000D ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [3.2] <CARRIAGE RETURN (CR)> (CR) ÷ [0.3]
+÷ 1F1E6 ÷ 000A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [3.2] <LINE FEED (LF)> (LF) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 000A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [3.2] <LINE FEED (LF)> (LF) ÷ [0.3]
+÷ 1F1E6 ÷ 000B ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [3.2] <LINE TABULATION> (Newline) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 000B ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [3.2] <LINE TABULATION> (Newline) ÷ [0.3]
+÷ 1F1E6 ÷ 3031 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] VERTICAL KANA REPEAT MARK (Katakana) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 3031 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] VERTICAL KANA REPEAT MARK (Katakana) ÷ [0.3]
+÷ 1F1E6 ÷ 0041 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN CAPITAL LETTER A (ALetter) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0041 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN CAPITAL LETTER A (ALetter) ÷ [0.3]
+÷ 1F1E6 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 ÷ 0030 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0030 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
+÷ 1F1E6 ÷ 005F ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 005F ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 1F1E6 × 1F1E6 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 1F1E6 × 0308 × 1F1E6 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.3] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 1F1E6 × 00AD ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
+÷ 1F1E6 × 0308 × 00AD ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
+÷ 1F1E6 × 0300 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
+÷ 1F1E6 × 0308 × 0300 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
+÷ 1F1E6 ÷ 0061 × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0061 × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 ÷ 0061 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0061 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 ÷ 0061 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0061 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 ÷ 0061 ÷ 0027 × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0061 ÷ 0027 × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 ÷ 0061 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0061 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 ÷ 0031 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0031 ÷ 003A ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) ÷ [0.3]
+÷ 1F1E6 ÷ 0031 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0031 ÷ 0027 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [0.3]
+÷ 1F1E6 ÷ 0031 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0031 ÷ 002C ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) ÷ [0.3]
+÷ 1F1E6 ÷ 0031 ÷ 002E × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
+÷ 1F1E6 × 0308 ÷ 0031 ÷ 002E × 2060 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
÷ 00AD ÷ 0001 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) ÷ [999.0] <START OF HEADING> (Other) ÷ [0.3]
÷ 00AD × 0308 ÷ 0001 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] <START OF HEADING> (Other) ÷ [0.3]
÷ 00AD ÷ 000D ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) ÷ [3.2] <CARRIAGE RETURN (CR)> (CR) ÷ [0.3]
@@ -526,6 +594,8 @@
÷ 00AD × 0308 ÷ 0030 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 00AD ÷ 005F ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 00AD × 0308 ÷ 005F ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 00AD ÷ 1F1E6 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 00AD × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 00AD × 00AD ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 00AD × 0308 × 00AD ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 00AD × 0300 ÷ # ÷ [0.2] SOFT HYPHEN (Format_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -570,6 +640,8 @@
÷ 0300 × 0308 ÷ 0030 ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0300 ÷ 005F ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0300 × 0308 ÷ 005F ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0300 ÷ 1F1E6 ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0300 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0300 × 00AD ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0300 × 0308 × 00AD ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0300 × 0300 ÷ # ÷ [0.2] COMBINING GRAVE ACCENT (Extend_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -614,6 +686,8 @@
÷ 0061 × 2060 × 0308 × 0030 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [9.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0061 × 2060 × 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0061 × 2060 × 0308 × 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [13.1] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0061 × 2060 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0061 × 2060 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0061 × 2060 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 × 2060 × 0308 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 × 2060 × 0300 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -658,6 +732,8 @@
÷ 0061 ÷ 003A × 0308 ÷ 0030 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0061 ÷ 003A ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0061 ÷ 003A × 0308 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0061 ÷ 003A ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0061 ÷ 003A × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0061 ÷ 003A × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 003A × 0308 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 003A × 0300 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -702,6 +778,8 @@
÷ 0061 ÷ 0027 × 0308 ÷ 0030 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0061 ÷ 0027 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0061 ÷ 0027 × 0308 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0061 ÷ 0027 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0061 ÷ 0027 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0061 ÷ 0027 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 0027 × 0308 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 0027 × 0300 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -746,6 +824,8 @@
÷ 0061 ÷ 0027 × 2060 × 0308 ÷ 0030 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0061 ÷ 0027 × 2060 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0061 ÷ 0027 × 2060 × 0308 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0061 ÷ 0027 × 2060 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0061 ÷ 0027 × 2060 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0061 ÷ 0027 × 2060 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 0027 × 2060 × 0308 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 0027 × 2060 × 0300 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -790,6 +870,8 @@
÷ 0061 ÷ 002C × 0308 ÷ 0030 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0061 ÷ 002C ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0061 ÷ 002C × 0308 ÷ 005F ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0061 ÷ 002C ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0061 ÷ 002C × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0061 ÷ 002C × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 002C × 0308 × 00AD ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0061 ÷ 002C × 0300 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -834,6 +916,8 @@
÷ 0031 ÷ 003A × 0308 ÷ 0030 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0031 ÷ 003A ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0031 ÷ 003A × 0308 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0031 ÷ 003A ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0031 ÷ 003A × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0031 ÷ 003A × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 003A × 0308 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 003A × 0300 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COLON (MidLetter) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -878,6 +962,8 @@
÷ 0031 × 0027 × 0308 × 0030 ÷ # ÷ [0.2] DIGIT ONE (Numeric) × [12.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [11.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0031 ÷ 0027 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0031 ÷ 0027 × 0308 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0031 ÷ 0027 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0031 ÷ 0027 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0031 ÷ 0027 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 0027 × 0308 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 0027 × 0300 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] APOSTROPHE (MidNumLet) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -922,6 +1008,8 @@
÷ 0031 × 002C × 0308 × 0030 ÷ # ÷ [0.2] DIGIT ONE (Numeric) × [12.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) × [11.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0031 ÷ 002C ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0031 ÷ 002C × 0308 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0031 ÷ 002C ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0031 ÷ 002C × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0031 ÷ 002C × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 002C × 0308 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 002C × 0300 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] COMMA (MidNum) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -966,6 +1054,8 @@
÷ 0031 × 002E × 2060 × 0308 × 0030 ÷ # ÷ [0.2] DIGIT ONE (Numeric) × [12.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [11.0] DIGIT ZERO (Numeric) ÷ [0.3]
÷ 0031 ÷ 002E × 2060 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
÷ 0031 ÷ 002E × 2060 × 0308 ÷ 005F ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] LOW LINE (ExtendNumLet) ÷ [0.3]
+÷ 0031 ÷ 002E × 2060 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
+÷ 0031 ÷ 002E × 2060 × 0308 ÷ 1F1E6 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [0.3]
÷ 0031 ÷ 002E × 2060 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 002E × 2060 × 0308 × 00AD ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) × [4.0] SOFT HYPHEN (Format_FE) ÷ [0.3]
÷ 0031 ÷ 002E × 2060 × 0300 ÷ # ÷ [0.2] DIGIT ONE (Numeric) ÷ [999.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [4.0] COMBINING GRAVE ACCENT (Extend_FE) ÷ [0.3]
@@ -998,4 +1088,17 @@
÷ 2060 ÷ 0061 × 2060 × 0062 × 2060 × 00AD × 2060 × 0062 × 2060 × 0079 × 2060 × 2060 ÷ # ÷ [0.2] WORD JOINER (Format_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [5.0] LATIN SMALL LETTER B (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] SOFT HYPHEN (Format_FE) × [4.0] WORD JOINER (Format_FE) × [5.0] LATIN SMALL LETTER B (ALetter) × [4.0] WORD JOINER (Format_FE) × [5.0] LATIN SMALL LETTER Y (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
÷ 2060 ÷ 0061 × 2060 ÷ 0024 × 2060 ÷ 002D × 2060 ÷ 0033 × 2060 × 0034 × 2060 × 002C × 2060 × 0035 × 2060 × 0036 × 2060 × 0037 × 2060 × 002E × 2060 × 0031 × 2060 × 0034 × 2060 ÷ 0025 × 2060 ÷ 0062 × 2060 × 2060 ÷ # ÷ [0.2] WORD JOINER (Format_FE) ÷ [999.0] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] DOLLAR SIGN (Other) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] HYPHEN-MINUS (Other) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] DIGIT THREE (Numeric) × [4.0] WORD JOINER (Format_FE) × [8.0] DIGIT FOUR (Numeric) × [4.0] WORD JOINER (Format_FE) × [12.0] COMMA (MidNum) × [4.0] WORD JOINER (Format_FE) × [11.0] DIGIT FIVE (Numeric) × [4.0] WORD JOINER (Format_FE) × [8.0] DIGIT SIX (Numeric) × [4.0] WORD JOINER (Format_FE) × [8.0] DIGIT SEVEN (Numeric) × [4.0] WORD JOINER (Format_FE) × [12.0] FULL STOP (MidNumLet) × [4.0] WORD JOINER (Format_FE) × [11.0] DIGIT ONE (Numeric) × [4.0] WORD JOINER (Format_FE) × [8.0] DIGIT FOUR (Numeric) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] PERCENT SIGN (Other) × [4.0] WORD JOINER (Format_FE) ÷ [999.0] LATIN SMALL LETTER B (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
÷ 2060 ÷ 0033 × 2060 × 0061 × 2060 × 2060 ÷ # ÷ [0.2] WORD JOINER (Format_FE) ÷ [999.0] DIGIT THREE (Numeric) × [4.0] WORD JOINER (Format_FE) × [10.0] LATIN SMALL LETTER A (ALetter) × [4.0] WORD JOINER (Format_FE) × [4.0] WORD JOINER (Format_FE) ÷ [0.3]
-# Lines: 978
+÷ 0061 ÷ 1F1E6 ÷ 0062 ÷ # ÷ [0.2] LATIN SMALL LETTER A (ALetter) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) ÷ [999.0] LATIN SMALL LETTER B (ALetter) ÷ [0.3]
+÷ 1F1F7 × 1F1FA ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER R (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER U (Regional_Indicator) ÷ [0.3]
+÷ 1F1F7 × 1F1FA × 1F1F8 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER R (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER U (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER S (Regional_Indicator) ÷ [0.3]
+÷ 1F1F7 × 1F1FA × 1F1F8 × 1F1EA ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER R (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER U (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER S (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER E (Regional_Indicator) ÷ [0.3]
+÷ 1F1F7 × 1F1FA ÷ 200B ÷ 1F1F8 × 1F1EA ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER R (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER U (Regional_Indicator) ÷ [999.0] ZERO WIDTH SPACE (Other) ÷ [999.0] REGIONAL INDICATOR SYMBOL LETTER S (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER E (Regional_Indicator) ÷ [0.3]
+÷ 1F1E6 × 1F1E7 × 1F1E8 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER B (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER C (Regional_Indicator) ÷ [0.3]
+÷ 1F1E6 × 200D × 1F1E7 × 1F1E8 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [4.0] ZERO WIDTH JOINER (Extend_FE) × [13.3] REGIONAL INDICATOR SYMBOL LETTER B (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER C (Regional_Indicator) ÷ [0.3]
+÷ 1F1E6 × 1F1E7 × 200D × 1F1E8 ÷ # ÷ [0.2] REGIONAL INDICATOR SYMBOL LETTER A (Regional_Indicator) × [13.3] REGIONAL INDICATOR SYMBOL LETTER B (Regional_Indicator) × [4.0] ZERO WIDTH JOINER (Extend_FE) × [13.3] REGIONAL INDICATOR SYMBOL LETTER C (Regional_Indicator) ÷ [0.3]
+÷ 0020 × 200D ÷ 0646 ÷ # ÷ [0.2] SPACE (Other) × [4.0] ZERO WIDTH JOINER (Extend_FE) ÷ [999.0] ARABIC LETTER NOON (ALetter) ÷ [0.3]
+÷ 0646 × 200D ÷ 0020 ÷ # ÷ [0.2] ARABIC LETTER NOON (ALetter) × [4.0] ZERO WIDTH JOINER (Extend_FE) ÷ [999.0] SPACE (Other) ÷ [0.3]
+#
+# Lines: 1078
+#
+# EOF
diff --git a/src/i18n/tests/word-break-test.lisp b/src/i18n/tests/word-break-test.lisp
index 42d961a..2fefec7 100644
--- a/src/i18n/tests/word-break-test.lisp
+++ b/src/i18n/tests/word-break-test.lisp
@@ -33,7 +33,11 @@
(let ((c (read s nil nil)))
(unless c
(return))
- (vector-push-extend (code-char c) string)
+ (if (> c #xffff)
+ (let ((s (lisp::codepoints-string (list c))))
+ (vector-push-extend (aref s 0) string)
+ (vector-push-extend (aref s 1) string))
+ (vector-push-extend (code-char c) string))
(let ((c (read s)))
(handle-break c))
(incf count)))))
diff --git a/src/i18n/unidata.bin b/src/i18n/unidata.bin
index 0ee2dd9..30816cf 100644
Binary files a/src/i18n/unidata.bin and b/src/i18n/unidata.bin differ
diff --git a/src/tools/build-unidata.lisp b/src/tools/build-unidata.lisp
index 363d095..0f5bc42 100644
--- a/src/tools/build-unidata.lisp
+++ b/src/tools/build-unidata.lisp
@@ -54,7 +54,7 @@
;; The expected Unicode version
(defconstant +unicode-major-version+ 6)
-(defconstant +unicode-minor-version+ 1)
+(defconstant +unicode-minor-version+ 2)
(defconstant +unicode-update-version+ 0)
;;; These need to be synched with code/unidata.lisp
@@ -281,11 +281,14 @@
(cdr x))))
(mapc (lambda (x) (pass2 (cdr x))) (rest trie))))
(format t "~& Initializing...~%")
+ (force-output)
(let ((trie (cons nil nil)))
(loop for (name . code) in entries do (add-to-trie trie name code))
(format t "~& Pass 1...~%")
+ (force-output)
(pass1 trie 0)
(format t "~& Sorting...~%")
+ (force-output)
(dolist (key (sort (loop for k being the hash-keys of khash
collect k)
#'> :key #'length))
@@ -316,8 +319,10 @@
vec2 (make-array top :element-type '(unsigned-byte 32))
vec3 (make-array top :element-type '(unsigned-byte 32)))
(format t "~& Pass 2...~%")
+ (force-output)
(pass2 trie)
(format t "~& Finalizing~%")
+ (force-output)
(dotimes (i top)
(let ((xxx (aref vec2 i)))
(dotimes (j (aref keyl (ash xxx -18)))
@@ -614,9 +619,10 @@
;; ucd-directory should be the directory where UnicodeData.txt is
;; located.
(defun foreach-ucd (name ucd-directory fn)
- (format t "~& ~A~%" name)
+ (format t "~& ~A~%" name)
(with-open-file (s (make-pathname :name name :type "txt"
:defaults ucd-directory))
+ (format t "file = ~s~%" s)
(cond
((string= name "Unihan")
(loop for line = (read-line s nil) while line do
@@ -811,6 +817,7 @@
ucd-directory
(lambda (min max prop)
(let ((code (intern (string-upcase prop) "KEYWORD")))
+ (format t "~X-~X code = ~S~%" min max code)
(loop for i from min to max
as ent = (find-ucd i) do
(when ent
@@ -941,16 +948,18 @@
(or (position (ucdent-word-break ucdent)
'(:other :cr :lf :newline :extend :format
:katakana :aletter :midnumlet :midletter :midnum
- :numeric :extendnumlet))
+ :numeric :extendnumlet :regional_indicator))
0))
;; ucd-directory should be the directory where UnicodeData.txt is
;; located.
(defun build-unidata (&optional (ucd-directory "target:i18n/"))
(format t "~&Reading data from ~S~%" (probe-file ucd-directory))
+ (force-output)
(multiple-value-bind (ucd range) (read-data ucd-directory)
(setf (unidata-range *unicode-data*) range)
(format t "~&Building character name tables~%")
+ (force-output)
(let* ((data (loop for ent across ucd
when (char/= (char (ucdent-name ent) 0) #\<)
collect (cons (ucdent-name ent) (ucdent-code ent))
@@ -965,6 +974,7 @@
(make-ntrie32 :split #x54 :hvec hvec :mvec mvec :lvec lvec))))
(format t "~&Building Unicode 1.0 character name tables~%")
+ (force-output)
(let* ((data (loop for ent across ucd
when (plusp (length (ucdent-name1 ent)))
collect (cons (ucdent-name1 ent) (ucdent-code ent))))
@@ -976,12 +986,14 @@
(make-ntrie32 :split #x54 :hvec hvec :mvec mvec :lvec lvec))))
(format t "~&Building general category table~%")
+ (force-output)
(multiple-value-bind (hvec mvec lvec)
(pack ucd range #'ucdent-cat 0 8 #x53)
(setf (unidata-category *unicode-data*)
(make-ntrie8 :split #x53 :hvec hvec :mvec mvec :lvec lvec)))
(format t "~&Building simple case-conversion table~%")
+ (force-output)
(let ((svec (make-array 100 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t)))
(vector-push-extend 0 svec)
@@ -993,12 +1005,14 @@
:svec (copy-seq svec)))))
(format t "~&Building numeric-values table~%")
+ (force-output)
(multiple-value-bind (hvec mvec lvec)
(pack ucd range #'pack-numeric 0 32 #x63)
(setf (unidata-numeric *unicode-data*)
(make-ntrie32 :split #x63 :hvec hvec :mvec mvec :lvec lvec)))
(format t "~&Building decomposition table~%")
+ (force-output)
(let ((tabl (make-array 6000 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t)))
(multiple-value-bind (hvec mvec lvec)
@@ -1009,12 +1023,14 @@
:tabl (copy-seq tabl)))))
(format t "~&Building combining-class table~%")
+ (force-output)
(multiple-value-bind (hvec mvec lvec)
(pack ucd range #'ucdent-comb 0 8 #x64)
(setf (unidata-combining *unicode-data*)
(make-ntrie8 :split #x64 :hvec hvec :mvec mvec :lvec lvec)))
(format t "~&Building bidi information table~%")
+ (force-output)
(let ((tabl (make-array 10 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t)))
(multiple-value-bind (hvec mvec lvec)
@@ -1025,6 +1041,7 @@
:tabl (copy-seq tabl)))))
(format t "~&Building normalization quick-check tables~%")
+ (force-output)
(progn
(multiple-value-bind (hvec mvec lvec)
(pack ucd range (lambda (x)
@@ -1056,6 +1073,7 @@
(make-ntrie2 :split #x55 :hvec hvec :mvec mvec :lvec lvec))))
(format t "~&Building composition exclusion table~%")
+ (force-output)
(let ((exclusions (make-array 1 :element-type '(unsigned-byte 32)
:adjustable t
:fill-pointer 0)))
@@ -1065,8 +1083,10 @@
(setf (unidata-comp-exclusions *unicode-data*) (copy-seq exclusions)))
(format t "~&Building full case mapping tables~%")
+ (force-output)
(progn
(format t "~& Lower...~%")
+ (force-output)
(let ((tabl (make-array 100 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t))
(split #x65))
@@ -1077,6 +1097,7 @@
(make-full-case :split split :hvec hvec :mvec mvec :lvec lvec
:tabl (copy-seq tabl)))))
(format t "~& Title...~%")
+ (force-output)
(let ((tabl (make-array 100 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t))
(split #x65))
@@ -1087,6 +1108,7 @@
(make-full-case :split split :hvec hvec :mvec mvec :lvec lvec
:tabl (copy-seq tabl)))))
(format t "~& Upper...~%")
+ (force-output)
(let ((tabl (make-array 100 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t))
(split #x65))
@@ -1098,8 +1120,10 @@
:tabl (copy-seq tabl))))))
(format t "~&Building case-folding tables~%")
+ (force-output)
(progn
(format t "~& Simple...~%")
+ (force-output)
(let ((split #x54))
(multiple-value-bind (hvec mvec lvec)
(pack ucd range (lambda (x) (pack-case-folding-simple x))
@@ -1107,6 +1131,7 @@
(setf (unidata-case-fold-simple *unicode-data*)
(make-ntrie32 :split split :hvec hvec :mvec mvec :lvec lvec))))
(format t "~& Full...~%")
+ (force-output)
(let ((tabl (make-array 100 :element-type '(unsigned-byte 16)
:fill-pointer 0 :adjustable t))
(split #x65))
@@ -1118,6 +1143,7 @@
:tabl (copy-seq tabl))))))
(format t "~&Building word-break table~%")
+ (force-output)
(let ((split #x66))
(multiple-value-bind (hvec mvec lvec)
(pack ucd range (lambda (x) (pack-word-break x))
-----------------------------------------------------------------------
Summary of changes:
src/code/string.lisp | 142 +++++++++++++++++---------------
src/code/unidata.lisp | 6 +-
src/general-info/release-20e.txt | 2 +
src/i18n/BidiMirroring.txt | 21 ++---
src/i18n/CaseFolding.txt | 8 +-
src/i18n/CompositionExclusions.txt | 7 +-
src/i18n/DerivedNormalizationProps.txt | 6 +-
src/i18n/NameAliases.txt | 5 +-
src/i18n/NormalizationCorrections.txt | 8 +-
src/i18n/SpecialCasing.txt | 6 +-
src/i18n/UnicodeData.txt | 9 +-
src/i18n/WordBreakProperty.txt | 12 ++-
src/i18n/tests/NormalizationTest.txt | 8 +-
src/i18n/tests/WordBreakTest.txt | 113 ++++++++++++++++++++++++-
src/i18n/tests/word-break-test.lisp | 10 ++-
src/i18n/unidata.bin | Bin 1490793 -> 1490993 bytes
src/tools/build-unidata.lisp | 32 +++++++-
17 files changed, 273 insertions(+), 122 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0
[cmucl-cvs] [git] CMU Common Lisp branch master updated. snapshot-2013-03-6-g10ebd12
by Raymond Toy 06 Mar '13
by Raymond Toy 06 Mar '13
06 Mar '13
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "CMU Common Lisp".
The branch, master has been updated
via 10ebd126e43b344377d384c55c1c611a82e9f4ae (commit)
via 45fabc8f3fad8876627831708ec3c997d46ce4f8 (commit)
from 9c3da08b86682b804da381ebf7d6e2f6843a4394 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit 10ebd126e43b344377d384c55c1c611a82e9f4ae
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 00:29:09 2013 -0800
Update.
diff --git a/src/general-info/release-20e.txt b/src/general-info/release-20e.txt
index d16df6b..a438863 100644
--- a/src/general-info/release-20e.txt
+++ b/src/general-info/release-20e.txt
@@ -52,7 +52,6 @@ New in this release:
* A serious error in FILE-POSITION on streams using an encoding
other than latin1 has been fixed. See ticket #74.
-
* Trac Tickets:
* Ticket #52 reopened.
* Ticket #66 fixed.
@@ -65,6 +64,7 @@ New in this release:
* Ticket #72 fixed.
* Ticket #73 fixed.
* Ticket #74 fixed.
+ * Ticket #76 fixed.
* Other changes:
* -8 option for build-all.sh is deprecated since we don't
commit 45fabc8f3fad8876627831708ec3c997d46ce4f8
Author: Raymond Toy <toy.raymond(a)gmail.com>
Date: Wed Mar 6 00:27:44 2013 -0800
Fix ticket:76
Missed one place in DO-UNARY-BYTE-BASH to adjust the call to END-MASK
to use a bit offset instead of a byte offset. This affects anything
that was using DO-UNARY-BYTE-BASH, including REPLACE.
diff --git a/src/code/bit-bash.lisp b/src/code/bit-bash.lisp
index a9bb688..ba2706a 100644
--- a/src/code/bit-bash.lisp
+++ b/src/code/bit-bash.lisp
@@ -562,7 +562,7 @@
(unless (zerop dst-byte-offset)
;; We are only writing part of the first word, so mask off the
;; bits we want to preserve.
- (let ((mask (end-mask (- dst-byte-offset)))
+ (let ((mask (end-mask (* vm:byte-bits (- dst-byte-offset))))
(orig (funcall dst-ref-fn dst dst-word-offset))
(value (funcall src-ref-fn src src-word-offset)))
(declare (type unit mask orig value))
-----------------------------------------------------------------------
Summary of changes:
src/code/bit-bash.lisp | 2 +-
src/general-info/release-20e.txt | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
hooks/post-receive
--
CMU Common Lisp
1
0