[regex-coach] Two Problems

Hello, first time poster... I have two issues that i can't seem to track down.... 1. I'm looking for a rule that will eliminate the following. I'm looking for all of my web pages that have some snippit of code before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web pages... I've seen individuals sneak the following code in: <!-- saved from --> <!DOCTYPE HTML Public ..ect... So is there a regex construct that will fail if any characters are found before <!DOCTYPE ??? 2. Secondly, looking to find ONLY the .PDF's inside a test.com domain. I wish to match the pattern http://www.test.com/snow/squall/index.pdf . I know to start my regex as http://www\.test\.com but how do i ignore all the directory stuff and key in on the .pdf extension. Thanks

On Wed, 24 Nov 2004 19:05:39 +0000, "Snow Squall" <snow_squall@hotmail.com> wrote:
I have two issues that i can't seem to track down....
These are really not on topic for this mailing list because it's about the program Regex Coach, not about learning regular expression basics.
1. I'm looking for a rule that will eliminate the following. I'm looking for all of my web pages that have some snippit of code before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web pages... I've seen individuals sneak the following code in:
<!-- saved from --> <!DOCTYPE HTML Public ..ect...
So is there a regex construct that will fail if any characters are found before <!DOCTYPE ???
^<!DOCTYPE This assumes that you've read the whole HTML page into one string
2. Secondly, looking to find ONLY the .PDF's inside a test.com domain. I wish to match the pattern http://www.test.com/snow/squall/index.pdf . I know to start my regex as http://www\.test\.com but how do i ignore all the directory stuff and key in on the .pdf extension.
http://www\.test\.com/.*\.pdf Cheers, Edi.
participants (2)
-
Edi Weitz
-
Snow Squall