Hello, first time poster...
I have two issues that i can't seem to track down....
1. I'm looking for a rule that will eliminate the following. I'm looking for all of my web pages that have some snippit of code before the <!DOCTYPE... my <!DOCTYPE should start the HTML on my web pages... I've seen individuals sneak the following code in:
<!-- saved from --> <!DOCTYPE HTML Public ..ect...
So is there a regex construct that will fail if any characters are found before <!DOCTYPE ???
2. Secondly, looking to find ONLY the .PDF's inside a test.com domain. I wish to match the pattern http://www.test.com/snow/squall/index.pdf . I know to start my regex as http://www%5C.test%5C.com but how do i ignore all the directory stuff and key in on the .pdf extension.
Thanks