Why We use ReCAPTCHA to protect our web-applications

One of the greatest evils of the Internet era is unquestionably spam.

reCAPTCHA

Spam is any online activity of indiscriminate product and/or service marketing using the tools provided by the Internet, such as e-mail, but also forums, newsgroups, chats, etc. Spam is not just the indiscriminate sending of junk mail.

 

Several web programmes (such as forums, blogs, etc.) incorporate protection techniques within them to prevent this kind of behaviour, with the main goal of separating human interventions (which are presumed to be mainly legal) from automatic ones (carried out through robots that have the purpose of filling the web with promotional content). CAPTCHA is the name of this kind of tool (Acronym for Completely Automated Public Turing test to tell Computers and humans Apart ).

 

As many of you are already aware, the CAPTCHA is only a straightforward verification system that relies on an image file holding a word (often spelt with specific and articulated characters) that the user must write in order for the form to be completed. In this manner, the form will be processed if the user is a human; however, if the user is a robot, the robot’s inability to interpret the contents of that picture file will prevent the form from being correctly submitted.

 

The reasoning is sound from a logical standpoint, but spammers are getting better and better (the volume of spam is so high as to justify all the efforts made by spammers) to the point where they have created intelligent robots that can recognise (using OCR – Optica Character Recognition technology) the content of CAPTCHAs, effectively evading the security check.

 

ReCAPTCHA is a helpful online service that was developed by the esteemed Carnegie Mellon University and later acquired by Google in order to meet the security requirements of popular web applications. It enables you to integrate a CAPTCHA with high levels of security and innovative functionality (such as audio support for blind users).

 

ReCAPTCHA’s main goal is to prevent spam, but it also serves an even higher, more noble purpose: it helps with the enormous task of digitising thousands of old books so that their priceless contents can be preserved for future generations.

 

Actually, due to factors like deteriorated paper support that prevents accurate character reading, ancient books frequently cannot be automatically digitised using OCR techniques. Instead, human intervention is needed, which requires a significant investment of time and resources that the field of culture does not always have.

This is an illustration of the aforementioned OCR scanning issues:

In an effort to provide a solution to this issue, the developers of ReCAPTCHA have created an extremely clever system for combating spam and, concurrently, a network of “operators” (more or less aware) used in the digital translation of ancient texts that are incomprehensible to the digital eye of OCR.

 

The inspiration came from an intriguing fact: according to a recent estimate, Internet users solve about 60,000,000 CAPTCHAs every day. If we multiply this information by the average amount of time needed to read, understand, and write each CAPTCHA, we find that daily surfers spend about 150,000 hours total!

 

In order to “recover” this massive quantity of work for use in the translation and digitisation efforts for the Internet Archive, the brains supported by Carnegie Mellon University have opted to do so.

 

The system operates as follows: Using cutting-edge technology, ReCAPTCHA gives the user a CAPTCHA code composed of two words:

 

One with a clear meaning and one that is “unknown” (it is, in fact, a word taken from an ancient book not recognised by the OCR).

reCAPTCHA

Then noise and background lines distort both words.

 

The system approximatively guesses that the other term is correct if the user types the word with a certain meaning correctly. This is how:

 

ReCAPTCHA stores the human reading of the word “unknown” as a potential accurate translation of the same after the CAPTCHA verification is successful (and the form is processed). In order to have “statistically” certain answers about the meaning of the phrase of the old text that could not be scanned using OCR, this process is obviously repeated with thousands of different users.

 

ReCAPTCHA’s service offers the developer a number of benefits only from a technological standpoint:

 

There is no demand placed on the local server, and there is no requirement to install a library or buy applications to manage the CAPTCHA.

 

operating remotely To combat the most recent spam threats, ReCAPTCHA is continuously updated;

 

provides a high level of security by incorporating, among other things, an IP verification system that may automatically identify and block those deemed to be spam sources;

 

Last but not least, installing ReCPATCHA on the most popular CMS is also quite straightforward and can be done with a variety of plug-ins (such as WordPress).

 

We’ll look at how to include this resource into our web apps in later articles.

Leave a Reply

Your email address will not be published. Required fields are marked *