Hello,
This is my first post to this list. I have looked through the archives (searched on "greedy" and some other terms, actually) but don't find anything that seems to relate to my problem. So I'm writing to see if anyone else on the list has encountered something like this. There is something about the "." operator, especially the "non-greedy" version of it, and in particular its behaviour when used in conjunction with a parenthesized term which is optional.
I've put the pattern and a sample target string and written comments about the results I get from Regex Coach. I ran the pattern with "i" checked.
Pattern: ^\s*An appeal.+?(Joined )?Cases? ?t ?[-] ?\d{1,3}/ ?\d{2}.+?(between)?
Target string: An appeal against the judgment delivered on 15 January 2003 by the Second Chamber (Extended Composition) of the Court of First Instance of the European Communities in joined cases T-377/00 (1), T-379/00 (2), T-380/00 (2), T-260/01 (3) and T-272/01 (4) between Philip Morris International, Inc., R.J. Reynolds Tobacco Holdings, Inc., RJR Acquisition Corp., R.J. Reynolds Tobacco Company, R.J. Reynolds Tobacco International Inc., and Japan Tobacco, Inc., and Commission of the European Communities, supported by European Parliament, Kingdom of Spain, French Republic, Italian Republic, Portuguese Republic, Republic of Finland, Federal Republic of Germany, Hellenic Republic, Kingdom of the Netherlands, was brought before the Court of Justice of the European Communities on 25 March 2003 by R.J. Reynolds Tobacco Holdings, Inc., established in Winston-Salem, North Carolina (United States), RJR Acquisition Corp., established in Wilmington, Delaware (United States), R.J. Reynolds Tobacco Company, established in Winston-Salem, North Carolina (United States), R.J. Reynolds Tobacco International Inc., established in Winston-Salem, North Carolina (United States) and Japan Tobacco, Inc., established in Tokyo (Japan), represented by O.W. Brouwer, lawyer, and P. Lomas, solicitor. ============
What I want it to do is match the string from the beginning through "between", and when there is no instance of "between", I want it to match the entire string.
I would expect the example above to give me a match on 0-259 (i.e. through "between". But instead I get a match only on 0-189 (through the first case number). This makes no sense to me whatsoever. I would consider it a bug but Regex Coach and Perl v5.8.3 on FreeBSD give me the same results.
^\s*An appeal.+?(Joined )?Cases? ?t ?[-] ?\d{1,3}/ ?\d{2}.+(between)? gives me a match on 0-1279 (the whole string). Why doesn't it stop when it finds "between"?
^\s*An appeal.+?(Joined )?Cases? ?t ?[-] ?\d{1,3}/ ?\d{2}.+?(between) gives me the match I expect, 0-259.
^\s*An appeal.+?(Joined )?Cases? ?t ?[-] ?\d{1,3}/ ?\d{2}.+(between) also gives me the match I expect, 0-259.
But if I make the "(between)" optional, by putting a "?" after it, - the regex engine doesn't stop there when the ".+" is greedy, and - the regex engine doesn't find "between" when the ".+" is non-greedy, i.e. ".+?"
Can anyone enlighten me?
Many thanks, John Clements