the "|m" was just in there for the example. I'm actually trying to match html tags with small contents inside like; <tag> V </tag> <tag> I </tag> <tag> A </tag> <tag> G </tag> <tag> R </tag> <tag> A </tag>
with the expression similar to "/(>.{1,4}<[\S\s]*){4}/" and noticed the behaviour when I changed the ".{1,4}" to "[.^<]{1,4}". I did notice that "\s" and "\S" are matched inside a character class, which is what I think led me to assume other meta-characters would be too. I've only been using regex for a while, so I am stumbling along. Thanks for the man page. I'm reading it now. Any other advice you could give for this expression would be great.
Thanks, Bryn
----- Original Message ----- From: Edi Weitz edi@agharta.de To: sites@brynmosher.com Cc: regex-coach@common-lisp.net Date: Wed, 26 Apr 2006 23:17:25 +0200 Subject: Re: [regex-coach] problem with dot '.' inside brackets
On Wed, 26 Apr 2006 12:44:30 -0700, sites@brynmosher.com wrote:
I've been using Regex-Coach 0.8.4 on Windows to test some SpamAssassin rules and noticed something odd:
Placing the following expression: [.|m]
to match the following data: bleh.com
Matches the '.' in bleh.com and not the first non-linefeed character as the '.' character in the expression should match. It's almost as if I had excaped the '.' like '.'. Using the expression '[.]' yields the same result. I've also noticed that the non-match character '^' doesn't work inside brackets as well.
Is this an error or am I crazy?
Well, at least it's not an error... :)
Most characters that have a special meaning in regular expressions (like the dot or the pipe symbol, for example) are treated like normal characters within character classes, i.e. within square brackets.
See 'man perlre' for details.
BTW, it seems that your understanding of character classes as a whole is wrong. If the dot /would/ match every non-linefeed character, then "[.|m]" would be equivalent to "[.]".
Cheers, Edi.