Stuart Sierra writes:
Pascal Bourguignon wrote:
Really guys, these are no captcha. Have a look at the student program in PAIP (norvig.com). It could easily solve these kind of problems, with very little or no added rules and code.
Okay, but what would you suggest? By definition, a CAPTCHA has to be something that can be solved by an algorithm. The best one can do, it seems, is add enough "noise" to the question to make it difficult to parse.
Not at all. A CAPTCHA musn't be easily resolvable by an algorithm. A good CAPTCHA needs at least 90 IQ points.
http://www.captcha.net/ http://en.wikipedia.org/wiki/Captcha
Mori et al. published a paper in IEEE CVPR'03 detailing a method for defeating one of the most popular CAPTCHAs, EZ-Gimpy, which was tested as being 92% accurate. The same method was also shown to defeat the more complex and less-widely deployed Gimpy program with an accuracy of 33%. However, the existence of implementations of their algorithm in actual use is indeterminate at this time.
For example a better captcha would be:
You've got tree apples, you eat one. How many oranges are you left with?
If you get as answer 2, you know you've got a program, or at least someone who cannot distinguish an apple from an orange, and who couldn't anyway answer: "What oranges?"
But even there, a program like shldru would be able to ask "What oranges?".
So the captcha would have to give a Turing Test, asking _several_ questions of different kinds, to be able to ascertain the essence of the "user".
Let me see. If I ask you this:
"Hiroshi Nohara was driving on the Road 66. He always felt uneasy when driving abroad. Can you explain me why?"
Well, the information needed to give an explanation is available on the Internet. Only it's not present in OpenCyc, and browsing the web to gather it, and making the right inferences, would take more time than what a human would need to answer.
As a human, can you answer that question?
What I'm trying to do here, is to build questions which are easy to answer, given common knowledge which is obvious for human, but which is not so well formalized and accessible in machine readable form.