Tuesday, August 26, 2008

CAPTCHA and reCAPTCHA

CAPTCHA is a test that many readers have seen but likely do not know what it is. CAPTCHA stands for "Completely Automated Public Turing test to tell Computers and Humans Apart." CAPTCHAs consist of small images of letters or words that one must type in to convince a system that you are in fact a human and not a spambot. The letters are generally somewhat distorted or have stray additional lines added. Such excercises take advantage of the fact that humans are able to do difficult symbolic recognition processing that computers so far cannot.

One more recent innovation is reCAPTCHA. reCAPTCHA helps to solve an interesting problem: There are long-term efforts to enage in mass digitization of old books and records. However, there is a problem: simply saving images of the pages would take up much space and would make the pages unsearchable. Thus, books are being scanned and computer programs are being used to figure out what the words are. Some words however, if they are scratched, poorly written, water-damaged or subject to other problems can make the computers unable to recognize the original wording. This is the same issue that allows CAPTCHA to work. However, it is impractical to have humans comb through these many words to identify them all. Now, enter reCAPTCHA.

reCAPTCHA is just like CAPTCHA but the source words to be deciphered are words from old books which computers are having trouble recognizing. The same words needing to be recaptured are presented to multiple different users. If the users agree then the digitizers can be pretty sure that the humans successfuly recognized the correct word and can then digitize that word and use it as an additional word as a challenge word.

More specifically, each reCAPTCHA challenege consists of two words. One of which has a known answer and one which does not. The individual challenged does not know which word is in which category and thus must answer both.

This procedure is a brilliant way of harvesting otherwise lost processing power.

Now, this is all well and good, but what am I expected to do when the reCAPTCHA challenge is:

19 comments:

Anonymous said...

I once read about a method that spambots would use to get around CAPTCHAs. The bot would send the CAPTCHA image to a special site. Visitors to this special site would then type in the CAPTCHA message. If what the visitor types in allows the bot to beat the CAPTCHA, the site rewards the visitor with porn.

Perhaps reCAPTCHA should take note.

Unfortunately, I don't have any sources to share.

Joshua said...

Yes, this is discussed for example:

http://boingboing.net/2004/01/27/solving-and-creating.html

There's not much one can do about this attack. However, even then the CAPTCHAs still get solved so reCAPTCHA is happy. Are you suggesting that reCAPTCHA should run a separate website offering porn for reCAPTCHA work?

Anonymous said...

That reminds me! Have you seen this video? Or any of the games it talks about?

I wonder which came first, reCAPTCHA or this? (Well actually I guess I could look that up...)

Anonymous said...

Also, I think it says "Ruheleben" and "you".

Joshua said...

Harry, that's a very interesting video. Thanks very much. I had known about this sort of thing before but had not realized how far this sort of thing had progressed. (I strongly urge any interested reader to go see the video linked to by Harry).

Harry, I agree with the first word and strongly suspect that that word is the known word. I'm not convinced the second one is correct. Indeed, I'm not convinced the second one in the Roman alphabet.

Also, one last remark. One person sent me an email asking what "reCAPTCHA" stood for. Sorry if this was not clear. reCAPTCHA is a pun on "CAPTCHA" and "recapture" since the lost words are being "recaptured."

Gabreille Ehrlich said...

Hey Josh. I believe I mentioned that Josh (my brother) spent the summer working for a company called Laserfiche. One of his friends at the company was working on a program to transform scanned documents into text documents by processing the image, so I suspect once they get that working it could be used to digitize old books as well. I don't know too much about it though, as the conversation was a while ago and my memory is bad.

Anonymous said...

Oh, something I noticed that I don't think is mentioned in the video: The "asymmetric verification game" is actually very similar to a popular formula for party games. So I guess it's not surprising that people find it fun.

Unknown said...
This comment has been removed by a blog administrator.
Anitha Kishore said...

Its really wonderful reCAPTCHA for this... Very Excellent Blogger....


Bypass captcha

Joshua said...

Ok. A spam comment about CAPTCHA. I think I'll leave that one up for sheer amusement value.

Articlewriterjobs said...

hi grt.................the information about captcha and recaptcha were so informative and innovative..............


bypass captcha

Anonymous said...

hu

Anonymous said...

wonderful!excellent.its very informative.thanks for sharing this info.keep blogging.

captcha solver

Hyde said...

Woow!!! super excellent captchas... Really useful to us..


captcha solver

death by captcha said...

Really great Blog...Very Excellent information for using captcha... i look like that block... thanks for sharing..


deathbycaptcha




Unknown said...

Nice Blog!!! Excellent review for Captcha reCaptchas.. Thanks for Sharing...


captcha solving services

Unknown said...

Hai this is very wonderful explanation for Decaptcha Verification... Captcha verification using orumola, and that installed your Guidance, its very useful to us, I cognitive content the themes you placed along beryllium same absorbing.

Decaptcha

nimbuzzcon said...

Hi, This Article is very Superb!! Nice for the Decaptcha verification, Like your Decaptcha more informations Very great informative blog, using Decaptcha verify then your step by step instructions, it's a great work..

Thanks for sharing...

Decaptcha

Unknown said...

hi... this is great informations...
the Decaptcha verifications is the great,... The Decaptcha information for reach easy to all others.. that's a very interesting i reallyb excellent for this Decaptcha and recaptcha information..

Thanks for sharing recaptcha verification..

visit : Decaptcha