Emsisoft Knowledgebase: The Truth About CAPTCHA Cracking
CAPTCHAs, or Completely Automated Public Turing tests to tell Computers and Humans Apart, are those distorted word-images you find sometimes when you fill out forms on the web. They are utilized by high traffic websites and freemail providers like Google and Yahoo to discourage spammers from taking advantage of their comment sections or services.
CAPTCHAs work because computers can’t read their distorted text like humans can. In this way, CAPTCHAs are actually the opposite of a Turing test because they reveal how computers are not like humans (i.e., they don’t have eyes!).
Before CAPTCHAs, spammers could write programs that could register for 1000s of freemail accounts at once by automatically filling out forms with bogus information. They could then use these 1000s of accounts to send spam to 1000s of legitimate users. Before CAPTCHAs, spammers could also write programs that could automatically spam the comment section of a high traffic blog or a discussion forum.
Today, CAPTCHAs have made such programs obsolete; but, that doesn’t exactly mean that CAPTCHAs can’t be cracked. Quite the contrary, actually.
Monetized CAPTCHA Cracking
Today, there very literally exists a not-so underground CAPTCHA cracking economy. CAPTCHAs are everywhere, and the demand to bypass them in an efficient way has created a hyper-competitive supply of service providers.
In 2010, The University of California, San Diego published an in depth economic analysis on this phenomenon, called Re: CAPTCHAs — Understanding CAPTCHA-Solving Services in an Economic Context. This paper is a great read for anyone who wants to understand the driving forces behind any computer security arms race.
What the UCSD found was that although hackers can create highly advanced Optical Character Recognition (OCR) technologies to “read” CAPTCHAs as would the human eye, the favored approach to CAPTCHA cracking is to employ third world workers at sweatshop wages to solve CAPTCHAs en masse, by hand. Although by no means ethical, the latter approach is much more cost effective
Why OCR Doesn’t (Quite) Work
Creating a software that can read a CAPTCHA like the human eye is a titanic endeavor. Not only does it require a highly skilled programmer, but it also requires that that programmer is okay with investing a large portion of his or her time in creating a tool meant solely to spread spam. People like this do exist, but they are a lot harder to come by than impoverished citizens of third world nations willing to work for a few dollars a day.
What complicates the OCR CAPTCHA crack even further is that it’s like hitting a moving target. This is a theme prevalent throughout all of computer security: As soon as a problem is solved, a new and unpredicted one emerges. With malware, it’s usually the bad guys who have the upper hand, as new threats call for new means of protection. (This is why Emsisoft created the Behavior Blocker :). With CAPTCHAs, however, the roles are reversed. As soon as an effective CAPTCHA cracking OCR is made, companies who create CAPTCHAS begin to notice its efficacy. In response, these companies simply change the way they create their CAPTCHAs, and render the new OCR useless.
By far the largest CAPTCHA creator today is reCAPTCHA, which was actually acquired by Google in 2009. reCAPTCHA creates CAPTCHAS by scanning printed text and distorting the resulting imagery in a number of random ways. The end result is a one to two word phrase that can (usually) only be read by the human eye.
Throughout the years, there have been a number of OCRs which have claimed to be able to crack reCAPTCHA’s CAPTCHAs. One of the most popular ones in use today comes on an SEO booster program called XRumer. In October 2013, there were also rumors of an OCR developed by AI company Vicarious that could solve 90% of reCAPTCHA’s CAPTCHAs. While the former represents a money making tool used with moderate success by black hat spammers and the latter a legitimate endeavor in AI research, neither works as well as the real thing.
The Alternative: 1000 CAPTCHAs for 1 dollar
Today, there are a number of companies that exist solely to crack CAPTCHAs for aspiring spammers, with armies of third world workers, comprised of impoverished people who willingly sit at computers and solve CAPTCHAs manually for 8 hours a day.
Here is a list of some of the biggest players in the industry:
There are also companies that exist solely to recruit workers for the sites listed above. One of the largest is Russian-based KolotiBablo.com.
While the ethics surrounding this development are murky at best, the economics are quite clear. Why pay for or invest in an expensive technology when you can have better results at a fraction of the cost? Such are the ways of the world, and many of the CAPTCHA sweatshops listed above are quick to defend themselves against critique. The general defense is that $2-3/day is more than enough to feed most CAPTCHA crackers, and even their families. While this may be true, it completely side steps the downright robotic, thankless, and largely negative nature of the CAPTCHA cracking task: Making 1st world spammers rich.
A Third Way: CAPTCHA Bots
CAPTCHAs can also be cracked using botnets, but like OCR technology CAPTCHA botnets don’t really pay off at all. The idea behind CAPTCHA botnets is to use zombie computers to solve CAPTCHAs supplied by a C&C server. This technique occurred briefly in Koobface, a worm that propagated through social media sites back in 2009. Koobface spread itself by placing malicious links to websites where it could be downloaded in messages and on walls. To do this effectively, it needed fake social accounts. And to create fake social accounts, it needed to solve CAPTCHAs.
Rather than outsourcing to Captchabot or Antigate, the makers of Koobface decided to keep their CAPTCHA cracking in-house. They did this by integrating CAPTCHA cracking into the botnet. During the course of Koobface infection, zombie computers would be forced to repeatedly poll the C&C server for CAPTCHAs to solve. In response, the server would return CAPTCHAs disguised as Windows Security requests, with a countdown to shutdown. Users would be forced to provide solutions, and solutions would be used to create new social accounts and spread the malicious worm.
Are CAPTACHAs Even Effective?
CAPTCHAs were initially created to provide a reverse Turing test. By this measure, they are incredibly effective – as evinced by the fact that career spammers would rather use humans than computers to solve them!
As an anti-spam security solution, CAPTCHAs are only marginally effective. Today, CAPTCHAs are more of a financial deterrent than they are a foolproof means of prevention. In 2010’s demand driven CAPTCHA cracking market, the UCSD reported that anyone with $1000 could have about 1 million CAPTCHAs cracked in as little as 6.75 hours. That’s 41 CAPTCHA cracks per second! In 2014, prices are probably even cheaper and returns rates even faster; and yet, perhaps like OCR software, manual methods may have economic limits of their own.
In any event, to the career spammer CAPTCHAs represent little more than an operational expense. At the same time – and on the other end of the spectrum – CAPTCHAs represent a livelihood for impoverished people in today’s digital sweatshops. Between these extremes, there then lies the rest of us: Everyday Internet users, to whom those annoying, distorted word chunks mean little more than speed bumps as we browse.
Have a Great (Malware-Free) Day!
Voting: Select your Champion in the Emsisoft Illustration Contest 2014