Information security often takes the form of an arms race, as attackers develop novel ways to use or abuse services on the web to their own benefit, and defenders scramble to adapt to and block these new techniques. Few technologies better exemplify this arms race than the web element known as CAPTCHA.1 This component is designed to identify and block bots that attackers use to automate and scale up attacks such as credential abuse, web scraping, or, in the case of tools like sneaker bots, to quickly buy up limited supplies of commodities like fashionable sneakers.2 CAPTCHAs weed out bots by presenting puzzles within the browser’s response that ostensibly only humans can solve. In the beginning, these puzzles were mostly visual, and usually required users to parse distorted text and type it in. Over time, CAPTCHAs have come to include different types of puzzles, including identifying specific objects within a complex image, transcribing short audio files, or solving logical puzzles, such as turning an image right-side up. Google’s latest version of their reCAPTCHA tool, version 3, transparently analyzes user behavior in the browser instead of requiring specific human input.3
For more than a decade, however, attackers have had the ability to circumvent CAPTCHAs at scale and speed, not through advances in computer vision or artificial intelligence, but by identifying and farming out the puzzles to networks of human workers (known in this industry as solvers) in developing economies, and then returning the correct responses so that bots can continue on their assigned task. These services cost attackers (that is, the customers of the solver service) roughly USD $1-$3 per 1,000 correct solutions, depending on the service and type of puzzle. The solver networks, however, only pay workers roughly USD $0.40 for 1,000 correct solutions. Depending on the solver’s speed, that puts their pay anywhere from $2 to $5 per day.
To be clear, as of 2020, the risk that human CAPTCHA solvers present is now more or less manageable (if not solved) for multiple reasons. For one, it’s an old practice with which security practitioners are widely familiar, and several security vendors have devised ways to detect human CAPTCHA solver networks. Some security vendors have developed bot mitigation or anti-fraud solutions to replace CAPTCHAs. Meanwhile, advances in artificial intelligence-based CAPTCHA solvers threaten to make the human solver networks obsolete.
So why dive into these human solver networks now? Despite the low risk, this practice deserves a through examination because of the simple nature of the hack. Instead of trying to compete with defenders to develop sophisticated artificial intelligence, CAPTCHA solvers simply use low-cost, globally distributed human labor as a front-end for a botnet. This kind of problem-solving illuminates a fundamental aspect of the battle between attackers and defenders in information security. Just as the CAPTCHA element exemplifies the arms race, this method of circumventing security controls reveals that security practitioners often misrecognize the nature of the attacker’s advantage. Unpacking that misrecognition can provide clues to general guidelines for designing future controls that could transcend this arms race and stand the test of time. Before we get to all that, however, let’s dig into the basics of solver networks and how they function.
How CAPTCHA Solver Networks Work
Human CAPTCHA solver networks involve three entities: attackers who want to circumvent CAPTCHAs so that they can scale out attacks using bots; solver networks, such as Anti-Captcha, 2Captcha, and DeathbyCaptcha; and the solver labor force. The details of the solver networks’ architecture vary slightly from network to network. Generally speaking, however, the life cycle of the service for a single CAPTCHA is as follows: when an attacker identifies a CAPTCHA at a target site, they send the puzzle content, usually encoded in base64, to the solver network API. The solver network’s software returns a task identification number to the attacker and farms out the puzzle to one of its workers, who solves it and returns a solution (see Figure 1). Meanwhile, the attacker sends a request with the task number to the network API to check for a posted solution, repeating over short intervals. When the solver network software detects a solution with the same task number, it responds to the attacker’s request with the solution, the attacker places the solution in the text field, and the attack can continue.
While the process described in Figure 1 is for a single CAPTCHA event, the real point of this service is to enable scalability through automation. In an actual attack, an attacker would use scripts or off-the-shelf attack software to send hundreds or thousands of these requests per minute, which the solver network software would distribute to a large number of workers around the world.
One common method to accomplish this scaling uses a script to coordinate API traffic with the network in conjunction with a webdriver like Selenium to identify CAPTCHAs and place the returned solution back into the form. Figure 2 shows the Selenium/script approach from the attacker’s point of view.