Re: [regex-coach] Two Problems

24 Nov 2004

      On Wed, 24 Nov 2004 19:05:39 +0000, "Snow Squall" <snow_squall@hotmail.com> wrote:
...
I have two issues that i can't seem to track down....
These are really not on topic for this mailing list because it's about
the program Regex Coach, not about learning regular expression basics.
...
1.  I'm looking for a rule that will eliminate the following.  I'm
looking for all of my web pages that have some snippit of code
before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web
pages...  I've seen individuals sneak the following code in:
<!-- saved from -->
<!DOCTYPE HTML Public ..ect...
So is there a regex construct that will fail if any characters are
found before <!DOCTYPE  ???
^<!DOCTYPE

This assumes that you've read the whole HTML page into one string
...
2. Secondly, looking to find ONLY the .PDF's inside a test.com
domain.  I wish to match the pattern
http://www.test.com/snow/squall/index.pdf .  I know to start my
regex as http://www\.test\.com but how do i ignore all the directory
stuff and key in on the .pdf extension.
http://www\.test\.com/.*\.pdf

Cheers,
Edi.

Re: [regex-coach] Two Problems

Edi Weitz