“”I am not a Robot” reCAPTCHA? Ptfff of course it is not innocent”


– said my friend matter-of-factly. Once again, I failed to impress her with my random fun fact. She is not surprise of how reCAPTCHAs might be insidious. In fact, no one is.


While digging up old memes, I came across this collection of posts complaining and ridicule the Google reCAPTCHA image labeling tests. And I am sure anyone who uses the Internet would, more than once, have their business intervened and time wasted in the similar way. The topic seems to be less relevant nowadays, I still notice we do not encounter it as frequently as we did anymore. And if we do, it gets harder. Therefore, apart from being a nosy test that interrupts the user’s flow, it gets me to wonder: How does it work and why it is getting rare and harder to solve? The answers are quite interesting, at least to me. And yes, this is another story about CAPTCHA, more specifically, the evolution of reCAPTCHA.

CAPTCHA (stands for Completely Automated Public Turing test to tell Computers and Humans Apart) is a system that allows web hosts to distinguish between humans, bots, and automated access to websites. The idea was to combat spambots and frauds online as the Internet expands. Users are asked to decipher distorted pieces of text or image which are randomly generated by the computer. In 2009, Google acquired the system and developed into a free service called reCAPTCHA for web hosts.

reCAPTCHA over the years, from v1 to v3

Since the acquisitions, reCAPTCHA has been through three major evolutions, claimed by Google to move towards frictionless user experience and enhanced security. Considering the disruptive nature of the classic system and the escalated complications of bots programs, reCAPTCHA v2 incorporates text and image recognition with other general user behavior assessment called “no CAPTCHA reCAPTCHA.” It includes the infamous “I am not robot” checkbox. While the user clicks on the box, the software tracks their mouse movement and speed for clues of any automated signs. And we are talking about such subtle gestures in the slight seconds before that click. The latest version introduced last year, uses an advanced risk analysis engine and adaptive challenges to tracks users’ online activity, analyze the interactions on the site, and return a score from bad to good. Instead of infuriating challenges, there is nothing but a notification “protected by reCAPTCHA.”. On its mechanisms, Google explained, “reCAPTCHA’s API sends hardware and software information, including device and application data, back to Google for analysis.”. It is clear to most cynical researchers that not only most of the user’s digital information, it is also cookies that the system is making uses. While cookies based data collection is almost a universal activity on the Internet, consternation rises as third-party cookies can allow companies to track users’ traffic across vast webpages. The company in response guaranteed that they use the signals strictly to fight spams and abuse.

“The implication is that Google isn’t just looking to identify whether you’re a human with its No CAPTCHA, but potentially exactly which human you are.”

Lara O’Reilly, Business Insider 

In 2011, Google used the CAPTCHA technology to digitalized the archives of the New York Times and books from Google Books. Actual words from archival texts that the recognition software cannot decipher replaced the random ones in the original model. These are feed into reCAPTCHA and have human does the job. The technology has become a powerful source for machine learning. The company reported that their machine learning algorithm could solve the most challenging CAPTCHA image and text puzzles with 99% accuracy, in contrast to humans, with only 33%. The classic CAPTCHAs thus becomes unreliable as the machine becomes smarter. Meanwhile, these images and texts become more and more distorted to keep up with the machine capacity. Likewise, millions of pictures that humans assess every day through the puzzles are likewise also used to train high-level auto robots and self-driving cars. Google then benefits from the free labor their users provided for the technology whose availability is still under question. Despite this being the natural motivation for Google to launch the invisible reCAPTCHA, the reasons for its harder challenges and rare sights seem to imbue with conspicuous motives.

Whether it was a situational improvement base on good intentions from Google or a masked scheme of mass data collection and ad targeting, it is still contentious. However, on estimation, 4.5 million websites are now using reCAPTCHA. One thing for sure, the acquisition of CAPTCHA does reinforce Google’s hold over the Internet. And that is no new news. With the slogan changed to “Do the right thing.”, in Google we hope the promise made by the Internet Society almost 30 years ago, of “open development, evolution and use of the Internet for the benefit of all people throughout the world” still stay true. Ultimately, a trivial system like CAPTCHAs elicits the transformation of technology into sophisticated and latent realms. Nowadays, with the withdrawal of reCAPTCHA into the “backstage”, complaints and concerns die down. Yet questions still lurk around. Who knows if it is another form of control and surveillance that has evolved beyond our visible and physical grasps. Is it true that we essentially prefer convenience and safety over freedom?