Re: [cl-ppcre-devel] behavior of \w

12 Mar 2012


      If they insist on using "\w", there's no portable way to change this
except for patching the code.
Otherwise, they could of course use a character class or add their own
property resolver.
Cheers,
Edi.
On Mon, Mar 12, 2012 at 4:10 PM, Robert Brown robert.brown@gmail.com wrote:
...
Some folks I work with are using cl-ppcre.  They've run into an
incompatibility between cl-ppcre and the PCRE library that boils
down to cl-ppcre's handling of \w.  The behavior is documented in
cl-ppcre's manual:
CL-PPCRE uses ALPHANUMERICP to decide whether a character
 matches Perl's "\w", so depending on your CL implementation you
 might encounter differences between Perl and CL-PPCRE when
 matching non-ASCII characters.
This reliance on ALPHANUMERICP may be a misfeature.  It means
that cl-ppcre behaves differently depending on the Lisp
implementation it's running on.
My co-workers desire compatibility between cl-ppcre on SBCL
(where ALPHANUMERICP follows Unicode) and PCRE for matching
Latin-1 encoded strings.  They patched the cl-ppcre code to make
\w match a-z, A-Z, 0-9, and underscore.  Is there a better
workaround for them?
bob

cl-ppcre-devel site list
cl-ppcre-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/cl-ppcre-devel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [cl-ppcre-devel] behavior of \w