Tue Jun 24 17:28:35 CEST 2008  attila.lendvai@gmail.com
  * optimize string-to-octets for simple-base-string

New patches:

[optimize string-to-octets for simple-base-string
attila.lendvai@gmail.com**20080624152835] {
hunk ./src/strings.lisp 120
-(defun lookup-string-vector-mapping (encoding)
-  (lookup-mapping *string-vector-mappings* encoding))
+(defparameter *simple-base-string-vector-mappings*
+  (instantiate-concrete-mappings
+   ;; :optimize ((speed 3) (safety 0) (debug 0) (compilation-speed 0))
+   :octet-seq-setter ub-set
+   :octet-seq-getter ub-get
+   :octet-seq-type (simple-array (unsigned-byte 8) (*))
+   :code-point-seq-setter string-set
+   :code-point-seq-getter string-get
+   :code-point-seq-type simple-base-string))
hunk ./src/strings.lisp 281
-          (mapping (lookup-string-vector-mapping encoding)))
+          (mapping (lookup-mapping *string-vector-mappings* encoding)))
hunk ./src/strings.lisp 300
-;;; FIXME: we shouldn't really need that coercion to UNICODE-STRING
-;;; but we kind of because it's declared all over.  To avoid that,
-;;; we'd need different types for input and output strings.  Or maybe
-;;; this is not a problem; figure that out.
hunk ./src/strings.lisp 303
-  (check-type string string)
-  (with-checked-simple-vector ((string (coerce string 'unicode-string))
-                               (start start) (end end))
-    (declare (type simple-unicode-string string))
-    (let* ((*suppress-character-coding-errors* (not errorp))
-           (mapping (lookup-string-vector-mapping encoding))
-           (bom (bom-vector encoding use-bom))
-           (vector (make-array (the array-index
-                                 (+ (funcall (octet-counter mapping)
-                                             string start end -1)
-                                    (length bom)))
-                               :element-type '(unsigned-byte 8))))
-      (replace vector bom)
-      (funcall (encoder mapping) string start end vector (length bom))
-      vector)))
+  (declare (optimize (speed 3) (safety 2)))
+  (let ((*suppress-character-coding-errors* (not errorp)))
+    (etypecase string
+      (simple-base-string
+       (unless end
+         (setf end (length string)))
+       (check-vector-bounds string start end)
+       (let* ((mapping (lookup-mapping *simple-base-string-vector-mappings*
+                                       encoding))
+              (bom (bom-vector encoding use-bom))
+              (bom-length (length bom))
+              (result (make-array (+ (length string) bom-length)
+                                  :element-type '(unsigned-byte 8))))
+         (replace result bom)
+         (funcall (the function (encoder mapping))
+                  string start end result bom-length)
+         result))
+      (string
+       ;; FIXME: we shouldn't really need that coercion to UNICODE-STRING
+       ;; but we kind of because it's declared all over.  To avoid that,
+       ;; we'd need different types for input and output strings.  Or maybe
+       ;; this is not a problem; figure that out.
+       (with-checked-simple-vector ((string (coerce string 'unicode-string))
+                                    (start start) (end end))
+         (declare (type simple-unicode-string string))
+         (let* ((mapping (lookup-mapping *string-vector-mappings* encoding))
+                (bom (bom-vector encoding use-bom))
+                (bom-length (length bom))
+                (result (make-array (+ (the array-index
+                                         (funcall (the function (octet-counter mapping))
+                                                  string start end -1))
+                                       bom-length)
+                                    :element-type '(unsigned-byte 8))))
+           (replace result bom)
+           (funcall (the function (encoder mapping))
+                    string start end result bom-length)
+           result))))))
hunk ./src/strings.lisp 374
-    (let ((mapping (lookup-string-vector-mapping encoding))
+    (let ((mapping (lookup-mapping *string-vector-mappings* encoding))
hunk ./src/strings.lisp 385
-    (let ((mapping (lookup-string-vector-mapping encoding))
+    (let ((mapping (lookup-mapping *string-vector-mappings* encoding))
}

Context:

[support external-format as the encoding parameter in bom-vector (needed for the stream code).
attila.lendvai@gmail.com**20080827220838] 
[TAG 0.3.0
Luis Oliveira <loliveira@common-lisp.net>**20080729032141] 
[update babel.asd for version 0.3.0
Luis Oliveira <loliveira@common-lisp.net>**20080729032137] 
[misc cleanups
Luis Oliveira <loliveira@common-lisp.net>**20080729032101] 
[Assume big-endianness by default in the UTF-32 decoder
Luis Oliveira <loliveira@common-lisp.net>**20080728223819
 
 - added regression test.
] 
[implement :USE-BOM in STRING-TO-OCTETS
Luis Oliveira <loliveira@common-lisp.net>**20080624151807] 
[80-column freak changes
Luis Oliveira <loliveira@common-lisp.net>**20080624151323] 
[Port babel-tests to Stefil
Luis Oliveira <loliveira@common-lisp.net>**20080624151044
 
 - Uses DEFSTEST a lot. Should probably get rid of that.
] 
[make lookup-mapping inlined and change it not to error when called with a concrete-mapping
attila.lendvai@gmail.com**20080624150857] 
[enable the sharp-backslash-syntax in the tests to make test-op work
attila.lendvai@gmail.com**20080624140030] 
[TAG 0.2.0
Luis Oliveira <loliveira@common-lisp.net>**20080609012936] 
[update babel.asd for version 0.2.0
Luis Oliveira <loliveira@common-lisp.net>**20080609012929] 
[added release.sh script
Luis Oliveira <loliveira@common-lisp.net>**20080609012843] 
[Add a test idea to NOTES
Luis Oliveira <loliveira@common-lisp.net>**20080505201319] 
[bah, fix my previous initial-buffer-size patch for streams
attila.lendvai@gmail.com**20080525104208] 
[added a :initial-buffer-size to with-output-to-sequence and co.
attila.lendvai@gmail.com**20080525081351] 
[stream's :element-type works with deftype'd (unsigned-byte 8)
attila.lendvai@gmail.com**20080521170815] 
[#\u reader changes, don't enable it when loaded, only by explicit user request
attila.lendvai@gmail.com**20080508112255] 
[make sure the external-format slot of the in-memory stream is an external-format instance.
attila.lendvai@gmail.com**20080506175901
 
 also fix make-in-memory-output-stream accepting '(unsigned-byte 8).
] 
[drop :as-list and add a :return-as keyword arg to get-output-stream-sequence and with-output-to-sequence
attila.lendvai@gmail.com**20080506175852] 
[optimize in-memory stream, make it accept plain vectors, too, not only ub8 vectors
attila.lendvai@gmail.com**20080506173131] 
[with-output-to-sequence accepts an :external-format and properly parses its body for declarations
attila.lendvai@gmail.com**20080506164113] 
[added a THE to fire up the make-array transform in string-to-octets
attila.lendvai@gmail.com**20080505195644] 
[added :external-format to make-in-memory-output-stream
attila.lendvai@gmail.com**20080505181439] 
[TAG 0.1.0
Luis Oliveira <loliveira@common-lisp.net>**20080422142948] 
[small notes in streams.lisp about status/todo
attila.lendvai@gmail.com**20080422180246] 
[factored out lookup-string-vector-mapping
attila.lendvai@gmail.com**20080418142943] 
[first scratch of stream support.
attila.lendvai@gmail.com**20080418141448
 
 in-memory bivalent vector-buffered streams with file-position and with-output-to-sequence (hope)fully works.
] 
[ECL doesn't like nor need fix-sharp-backslash.lisp
Luis Oliveira <loliveira@common-lisp.net>**20080406124100] 
[babel-tests.asd fixups
Luis Oliveira <loliveira@common-lisp.net>**20080406124010] 
[Move LOOKUP-MAPPING to external-formats.lisp and make it accept EXTERNAL-FORMATs too.
Stelian Ionescu <sionescu@common-lisp.net>**20080224170631] 
[Fix concatenate-string-to-octets.
Luis Oliveira <loliveira@common-lisp.net>**20080224163710
 
 - Was calling a nonexistent function.
 - Added a note about endianness to its unit test.
] 
[Fix run-babel-tests function.
Luis Oliveira <loliveira@common-lisp.net>**20080224163632] 
[Indent previous patch to 80 columns. :-)
Luis Oliveira <loliveira@common-lisp.net>**20080224131420] 
[Added concatenate-strings-to-octets
attila.lendvai@gmail.com**20080223223635] 
[fix the return values of the encoders, the upcoming concatenate-strings-to-octets patch/test relies on it
attila.lendvai@gmail.com**20080223223500] 
[Added declaims to make it possible to locally inline string-to-octets and friends upon explicit request
attila.lendvai@gmail.com**20080223220659] 
[Define EOL-STYLE type and enforce it.
Luis Oliveira <loliveira@common-lisp.net>**20071212215534
 
 - MAKE-EXTERNAL-FORMAT now takes EOL-STYLE as a keyword argument
   instead of an optional one.
] 
[Add some documentation
Luis Oliveira <loliveira@common-lisp.net>**20071212215223] 
[Addressing minor style obsessions.
Luis Oliveira <loliveira@common-lisp.net>**20071210175315] 
[Export EXTERNAL-FORMAT from the Babel package.
Luis Oliveira <loliveira@common-lisp.net>**20071210175250] 
[Fix/add TEST-OP
Luis Oliveira <loliveira@common-lisp.net>**20071210173842] 
[Fixed function test-file(used by ICONV-TEST): always take the truename of a .asd file.
Stelian Ionescu <sionescu@common-lisp.net>**20071204005231] 
[Fixed ISO-8859-11 table in *iso-8859-tables*, iso-8859-decode-check now passes.
Stelian Ionescu <sionescu@common-lisp.net>**20071204004713] 
[Fix :iso-8859-11 encoding
Stelian Ionescu <sionescu@common-lisp.net>**20071204003442] 
[Replace FIXNUM declarations with UB8 in ISO-8859 encodings.
Stelian Ionescu <sionescu@common-lisp.net>**20071203224806] 
[Misc. cleanups in the ISO-8859 encodings
Stelian Ionescu <sionescu@common-lisp.net>**20071203223901] 
[Fix :iso-8859-10 encoder
Stelian Ionescu <sionescu@common-lisp.net>**20071203182557] 
[Fix :iso-8859-5 encoder
Stelian Ionescu <sionescu@common-lisp.net>**20071203180914] 
[Fix :iso-8859-4 encoder
Stelian Ionescu <sionescu@common-lisp.net>**20071203175531] 
[Fix test-8bit-roundtrip to ignore undefined code points
Stelian Ionescu <sionescu@common-lisp.net>**20071203173924] 
[ENSURE-EXTERNAL-FORMAT can now handle arguments of the form '(<external-format> :eol-style <eol-style>)
Stelian Ionescu <sionescu@common-lisp.net>**20071023203450] 
[Fix docstring typo in src/string.lisp
Luis Oliveira <loliveira@common-lisp.net>**20070823120715] 
[New function: EXTERNAL-FORMAT-EQUAL
Luis Oliveira <loliveira@common-lisp.net>**20070823120655] 
[Add note to NOTES.
Luis Oliveira <loliveira@common-lisp.net>**20070823120635] 
[Fix ` placement in WITH-SIMPLE-VECTOR
Luis Oliveira <loliveira@common-lisp.net>**20070823120529] 
[Initialize errorp from *suppress-character-coding-errors* to keep the global policy when not overridden
attila.lendvai@gmail.com**20070823021323] 
[Add :RE endianness to UB-GET.
Luis Oliveira <loliveira@common-lisp.net>**20070813224944] 
[Handle reverse endian and character out-of-range in UTF-32.
Luis Oliveira <loliveira@common-lisp.net>**20070813211647
 
 - Add regression tests.
] 
[New tests
Luis Oliveira <loliveira@common-lisp.net>**20070813194552
 
 - Added tests adapted from SBCL's tests/octets.pure.lisp
   (which discovered all of those UTF-8 bugs)
 - Added tests against flexi-streams' ISO-8859-* tables.
] 
[Add debugging helper: RECOMPILE-MAPPINGS.
Luis Oliveira <loliveira@common-lisp.net>**20070813194527] 
[Fix handling of leading BOM in UTF-32.
Luis Oliveira <loliveira@common-lisp.net>**20070813194456] 
[Fix silly ISO-8859-11 bug.
Luis Oliveira <loliveira@common-lisp.net>**20070813194405
 
 - Decoder code was defined as encoder and vice-versa.
] 
[Handle overlong UTF-8 sequences properly.
Luis Oliveira <loliveira@common-lisp.net>**20070813175830
 
 - Also, defined more specific decoding error conditions:
   CHARACTER-OUT-OF-RANGE, INVALID-UTF8-START-BYTE,
   INVALID-UTF8-CONTINUATION-BYTE and OVERLONG-UTF8-SEQUENCE.
] 
[Handle erroneous UTF-8 continuation bytes properly.
Luis Oliveira <loliveira@common-lisp.net>**20070812202955] 
[Document UNICODE-CHAR and [SIMPLE-]UNICODE-STRING types..
Luis Oliveira <loliveira@common-lisp.net>**20070812202547] 
[New constant: UNICODE-CHAR-CODE-LIMIT
Luis Oliveira <loliveira@common-lisp.net>**20070812202152
 
 - An alias for CHAR-CODE-LIMIT (so, might not be #x110000).
] 
[Fix bug in UTF-8 encoder.
Luis Oliveira <loliveira@common-lisp.net>**20070812201758
 
 - Wasn't encoding the fourth octet in 4-octet sequences properly.
] 
[Detect and signal errors on overlong UTF-8 sequences.
Luis Oliveira <loliveira@common-lisp.net>**20070812201629] 
[Fix bug in the iso-8859-9 decoder.
Luis Oliveira <loliveira@common-lisp.net>**20070812201438] 
[split LITERAL-CHAR-CODE-LIMIT into {DE,EN}CODE-LITERAL-CODE-UNIT-LIMIT
Luis Oliveira <loliveira@common-lisp.net>**20070812201200
 
 - Retained LITERAL-CHAR-CODE-LIMIT for the common case when the limit
   is the same in either direction.
] 
[More WITH-SIMPLE-VECTOR improvements.
Luis Oliveira <loliveira@common-lisp.net>**20070811052048
 
 - Fix OPENMCL and ALLEGRO versions.
 - Make pure CL version non-copying for non-adjustable arrays.
] 
[Fix bug in slow implementation of WITH-SIMPLE-VECTOR.
Luis Oliveira <loliveira@common-lisp.net>**20070811041306] 
[Minor test fix and comment about UNICODE-STRING.
Luis Oliveira <loliveira@common-lisp.net>**20070811040040] 
[Fix last known :UTF-8B bug.
Luis Oliveira <loliveira@common-lisp.net>**20070811035954
 
 (passes all tests now)
] 
[Define, use and export three new types.
Luis Oliveira <loliveira@common-lisp.net>**20070811033803
 
 UNICODE-CHAR, UNICODE-STRING and SIMPLE-UNICODE-STRING.
] 
[Fix major bug in :UTF-8B decoder
Luis Oliveira <loliveira@common-lisp.net>**20070811033654
 
 - There's still one more known bug left.  One of the tests still fails.
] 
[Get rid of SBCL warnings in :UTF-8B code-point-counter
Luis Oliveira <loliveira@common-lisp.net>**20070811033541] 
[Added (buggy) implementation of the UTF-8B encoding.
Luis Oliveira <loliveira@common-lisp.net>**20070811021739
 
 - Added new tests, some of which fail.
] 
[SIMPLE-BASE-STRING vs SIMPLE-STRING
Luis Oliveira <loliveira@common-lisp.net>**20070811021337
 
 - Be more careful about the string type to expect.
   (added some comments)
 - Compile with (SAFETY 3) at least until this is figured out.
 - Added a DEBUG-MAPPINGS function to help in debugging decoders,
   encoders and counters.
] 
[Check for #+windows instead of #+(or win32 mswindows)
Luis Oliveira <loliveira@common-lisp.net>**20070811021043] 
[Fix missing branch in UTF-8 decoder.
Luis Oliveira <loliveira@common-lisp.net>**20070811021005] 
[Add ISO-8859-* character encodings.
Luis Oliveira <loliveira@common-lisp.net>**20070807064757
 
 - Simplified DEFINE-UNIBYTE-ENCODER/DECODER.
 - Added some important licensing notes.
] 
[Add license.texinfo as well.
Luis Oliveira <loliveira@common-lisp.net>**20070807035547] 
[Added ALEXANDRIA as dependency.
Luis Oliveira <loliveira@common-lisp.net>**20070807034852
 
 Patch courtesy of Stelian Ionescu.
] 
[Use texinfo docstrings for documentation.
Luis Oliveira <loliveira@common-lisp.net>**20070807032939
 
 (That doesn't mean there's real documentation yet though.)
] 
[Minor fixes to fix-sharp-backslash.lisp
Luis Oliveira <loliveira@common-lisp.net>**20070807032805] 
[Fix package brokeness
Luis Oliveira <loliveira@common-lisp.net>**20070728185707] 
[Fix copying version of WITH-SIMPLE-VECTOR
Luis Oliveira <loliveira@common-lisp.net>**20070727045535
 
 - Added regression test: STRING-TO-OCTETS.1
] 
[Fix sharp-backslash for allegro
Luis Oliveira <loliveira@common-lisp.net>**20070727030259] 
[Fix sharp-backslash reader macro
Luis Oliveira <loliveira@common-lisp.net>**20070727035543
 
 - SHARP-BACKSLASH was calling UNREAD-CHAR after PEEK-CHAR and CLHS says
   the consequences of doing that are undefined.
 - Added two new tests SHARP-BACKSLASH.1 and SHARP-BACKSLASH.2.
] 
[get-character-encoding: accept encoding objects
Luis Oliveira <loliveira@common-lisp.net>**20070726204432] 
[lookup-mapping: accept encoding objects
Luis Oliveira <loliveira@common-lisp.net>**20070726170531
 
 - Added regression test as well.
] 
[Add #\ reader macro that understands unicode code points
Luis Oliveira <loliveira@common-lisp.net>**20070726170421
 
 - Disabled for OpenMCL whose #\ already works as expected.
 - Updated tests.
] 
[Use trivial-features
Luis Oliveira <loliveira@common-lisp.net>**20070726170300] 
[external-format changes
Luis Oliveira <loliveira@common-lisp.net>**20070717030523
 
 - Move *default-character-encoding* to babel-encodings so that it
   can be used with external-formats which now accept :DEFAULT as
   a valid character encoding.
 - New function: ENSURE-EXTERNAL-FORMAT.
] 
[Bugfixes
Luis Oliveira <loliveira@common-lisp.net>**20070717025715
 
 - Split accessors into getters and setters.  The setf expanders
   didn't work on several Lisps.
 - Rewrite the UTF-8 octet counter to make CLISP happy.
 - Fix coding-error :buffer initarg.
 - Fix missing ONCE-ONLY macro in strings.lisp.
 - Fix tests for Lispworks which doesn't understand #\uXXXX.
] 
[Add max parameters to octet-counters and code-pointer-counters
Luis Oliveira <loliveira@common-lisp.net>**20070708034134
 
 This parameteres makes them count up to X characters or octets,
 and return a new END index, if necessary.
] 
[Wrap defconstants around eval-when in enc-unicode.lisp
Luis Oliveira <loliveira@common-lisp.net>**20070708034059] 
[Preliminary interface for external formats.
Luis Oliveira <loliveira@common-lisp.net>**20070708033940
 
 An external-format is just a combination of encoding and EOL style.
] 
[Set *suppress-character-coding-errors* to NIL
Luis Oliveira <loliveira@common-lisp.net>**20070628150908
 
 Also add some notes about Lisps that don't support unicode.
] 
[Add a few more notes.
Luis Oliveira <loliveira@common-lisp.net>**20070607011417] 
[Don't use colorize-lisp-examples.lisp for now in gendocs.sh
Luis Oliveira <loliveira@common-lisp.net>**20070605224236] 
[First patch.
Luis Oliveira <loliveira@common-lisp.net>**20070605211316
 
 Supported encodings: EBCDIC-US, ASCII, ISO-8859-1, UTF-8, UTF-16 and UTF-32.
] 
Patch bundle hash:
3901497fea8fb774ba6fee9d0038b17eaf6db7e5
