i have an implementation that reports char-code-limit less than actual -- it's ABCL (working on top of Java), only 256 codes are officially suported, but it uses Java strings, so there's no problem with handling Unicode strings -- i set *regex-char-code-limit* to some 10000 (thanks, Edi!). however, there are characters like 0xFFEF (the BOM), so i should set *regex-char-code-limit* to 65535. i think it's overkill to do that -- i see ppcre creates array of that size to do matching.
how do people cope with it on unicode-enabled lisps? (afaik SteelBank uses UCS-4 char codes, so there's definitely no sane char-code-limit)
does ppcre create that for each scanner? if there's one global array that's ok, but array for each scanner is too much..
does *use-bmh-matchers* affect usage of this array? if so, would it be much slower if i disable it?