Tuesday, March 24, 2009

Worst CAPTCHA ever?

For those of you who don't know, a CAPTCHA is program that can tell whether its user is a human or another computer. They're those little images of distorted text that you translate when you sign up for Gmail or leave a comment on someone's blog. Their purpose is to make sure that someone doesn't use a computer to sign up for millions of online accounts automatically, or to spam blogs randomly with cries for help from Nigerian princes. They make sure an actual human is filling out the form. They work based on the principle that no computer program can (currently) read distorted text as well as a human can.

I have a confession to make. I fail this scaled-down version of the Turing test on a regular basis. There's probably software out there that can outperform my current success rate of just about 2/3. Then there are those challenges that are impossible for human and computer. Just today I was presented with the following CAPTCHA, presumably containing two words.

The image was presented by reCAPTCHA, probably the best anti-bot service in widespread use on the internet. reCAPTCHA works by taking scanned images from books that Optical Character Recognition (OCR) software finds difficult or impossible read and presenting it in CAPTCHAs deployed all over the internet in order to get humans to read the words that the OCR software can't. The goal is to get a bunch of scanned books efficiently translated into digital form. It's a really clever and elegant solution to two problems in one.

Fortunately for enterprising young programmers, the CAPTCHA half of this problem is far from being solved. The line between what a human can read easily and what a software application can read is blurred and constantly changing. New ideas in the battle against Nigerian princes are always welcome.


UPDATE: After receiving a comment from Ben Maurer, the chief engineer at reCAPTCHA (and after Googling him to make sure he really is) it seems I have to eat a little crow. After spending enough time on the reCAPTCHA site for a statistically significant sample, I've determined that the image above is an outlier. My guess is that the word on the left is a scanned image or drop cap. Thanks, Ben, for straightening me out on this.

5 comments:

Ben Maurer said...

Hi,

I'm the chief engineer on reCAPTCHA. The word on the left is one that comes from books and that we're trying to OCR. As such, it is not graded.

Over 96% of reCAPTCHAs are solved correctly -- try submitting your best guess for each CAPTCHA on our site, and I think you'll find your success rate is extremely high.

Bill the Lizard said...

Ben,
Thanks for the reply. After spending a little bit of time on your site I found that you're absolutely right. I got 30 out of 30 correct.

André said...

I absolutely agree that reCAPTCHAs are getting harder. That's why I found your (closed) SO question and eventually this blog.
I tried out that page and out of 16 attempts I got only 8 correct.

Caden said...

Since it's 6 months later, I thought I'd give it a try as well. I got 7 of 10 correct.

As I am responsible for bringing accessibility to web applications at my organization, I hate CAPTCHAs. It seems to be nearly universally admitted that they are difficult for people with normal vision and impossible for those with low and no vision, and that they are hackable. Yet, when I try to protect the interests of disabled people coming to our site by disallowing them, all I hear is "Well, that's the best we can do." and "Everyone does it, so it must be ok." Really? I wish we had an automated Turing test which was easy for those with low vision, no vision, and or poor or no hearing.

Is it ironic that I have to fill out another CAPTCHA just to make this comment?

hikingmike said...

I just got 3 wrong out of 20 here - http://www.google.com/recaptcha/learnmore

I definitely don't get 96% and I thought I was pretty good at this. Not only that but I find myself taking longer to try to figure it out so it's more frustrating even if I do get it right. We don't want to give people more stress if we want them to fill out our form.

I found a captcha here that looks a little easier - http://code.google.com/p/cool-php-captcha/

By the way, the audio on your captcha is hilarious. I didn't even try to enter what it said. And I failed your text captcha the first time. I had to basically guess at the picture number. Thankfully I didn't have to enter anything again.