On Wed, 24 Nov 2004 19:05:39 +0000, "Snow Squall" snow_squall@hotmail.com wrote:
I have two issues that i can't seem to track down....
These are really not on topic for this mailing list because it's about the program Regex Coach, not about learning regular expression basics.
- I'm looking for a rule that will eliminate the following. I'm
looking for all of my web pages that have some snippit of code before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web pages... I've seen individuals sneak the following code in:
<!-- saved from -->
<!DOCTYPE HTML Public ..ect...
So is there a regex construct that will fail if any characters are found before <!DOCTYPE ???
^<!DOCTYPE
This assumes that you've read the whole HTML page into one string
- Secondly, looking to find ONLY the .PDF's inside a test.com
domain. I wish to match the pattern http://www.test.com/snow/squall/index.pdf . I know to start my regex as http://www%5C.test%5C.com but how do i ignore all the directory stuff and key in on the .pdf extension.
http://www%5C.test%5C.com/.*%5C.pdf
Cheers, Edi.