Good Bots, Bad Bots, and What You Can Do About Both

Table of Contents

It’s hard to get through any news cycle today without bots coming up. Those we hear about most spread spam, propagate fake news, or create fake profiles and content on social media sites—often to influence public opinion, spark social unrest, or tamper with elections. During the 2016 US presidential election debates, bots were used on Twitter to promote one candidate over the other by 7:1.¹ Now, with the 2020 presidential election still two years away, the trolling continues in full force. In a recent 30-day period, 2% to 15% of negative tweets about declared candidates in the 2020 presidential race were traced back to bot accounts and their amplifiers.²

What are bots? Short for “robots,” they are software programs that run automated tasks (scripts) on the Internet. They were originally designed to do simple, mundane, repetitive tasks on the web that humans don’t want to do or can’t do as quickly. Beyond enabling automation, modern bots are often designed to simulate human behavior.

Depending on whose numbers you trust, it’s estimated that bots of all kinds account for 21%³ to over 50% of all Internet traffic today:

Former Wall Street security analyst Mary Meeker says global bot-generated Internet traffic surpassed human-generated traffic in 2016.⁴
In December 2018, the Wall Street Journal noted Adobe’s claim that “…about 28% of website traffic likely came from bots and other non-human signals.”⁵
In 2017, Twitter estimated as many as 48 million Twitter accounts—15% of its total users—were bots, not real people.⁶
According to Verizon, 77% of all reported web application breaches in 2016 started with botnets.⁷

So, how did we get to the current state of affairs with bots? Do they exist just to cause mayhem on the Internet, or within business, politics, and society? Are they ever used for anything good?

Here’s a bit of clarification about the good, the bad, and the ugly of bots. And we provide some constructive tips for keeping the bad ones out of your network.

When Bots are Good

“Good” bots are designed to help businesses and consumers. They’ve been around since the early 1990s⁸ when the first search engine bots were developed to crawl the Internet. Google, Yahoo, and Bing wouldn’t exist without them. Other examples of good bots—mostly consumer-focused—include:

Chatbots (a.k.a. chatterbots, smartbots, talkbots, IM bots, social bots, conversation bots) interact with humans through text or sound. One of the first text uses was for online customer service and text messaging apps like Facebook Messenger and iPhone Messages. Siri, Cortana, and Alexa are chatbots; but so are mobile apps that let you order coffee and then tell you when it will be ready, let you watch movie trailers and find local theater showtimes, or send you a picture of the car model and license plate when you request a ride service.
Shopbots scour the Internet looking for the lowest prices on items you’re searching for.
Monitoring bots check on the health (availability and responsiveness) of websites. Downdetector.com is an example of an independent site that provides real-time status information, including outages, of websites and other kinds of services.

When Bots are Bad

History teaches us that anything that’s designed for good will eventually be used for evil, and this is definitely true of bots. “Bad” bots are so prevalent because they can be easily built by someone as young as 13⁹ with basic programming skills, or purchased for very little money. You can buy bots to boost product ratings or commit ad fraud for as little as $2.00; you can buy 5,000 Twitter followers for less than $50.¹⁰ And you don’t need to go to the dark web anymore to buy bots: several types have recently been advertised for sale on Instagram.¹¹

For attackers, the real power of bots lies in botnets—large collections of malware-infected devices (zombies) that operate in concert under the direction of an attacker, or bot herder. Botnets provide the collective processing power that’s essential to pull off large-scale attacks. From a command and control server, the bot herder directs zombies what to do.

Virtually all computing devices that have been hijacked by bad bots are used for a variety of malicious activities, as outlined below. (Note: there is no “right” or standard way to categorize bots or their behavior. This is how we’ve grouped them for purposes of this article.)

Distributed denial-of-service (DDoS) attacks. Attackers can use botnets to make an application or network unavailable. In early 2000, botnets consisting of infected PCs launched successful DDoS attacks against Yahoo!, Amazon.com, CNN, E*TRADE, and eBay.¹² Today, many DDoS botnets are made up of infected Internet of Things (IoT) devices instead of PCs. With literally billions of vulnerable IoT devices on the market, attackers are able to build massive botnets (thingbots) to carry out enormous (up to 1 Tbps) DDoS attacks like the initial 2016 Mirai attacks on Krebs on Security, Dyn, and OVH, and the 2018 1.3 Tbps attack on GitHub. (F5 Labs reports extensively about the ongoing threat of thingbots (/content/f5-labs-v2/en/labs/articles/threat-intelligence/the-hunt-for-iot--multi-purpose-attack-thingbots-threaten-intern.html) and their expanded use beyond DDoS to mine cryptocurrency, collect credentials, and launch credential stuffing attacks.)
Credential stuffing. Taking advantage of the multi-billions¹³ of stolen account credentials, attackers use bots to launch automated attacks by “stuffing” stolen username and password combinations in the login pages of multiple websites. The ultimate goal is account takeover—and because so many people use the same credentials for many accounts, the success rate and payoff for attackers is high.
Gift and credit card fraud. Attackers use bots to break into gift card accounts looking for credentials; afterward, they create counterfeit cards and steal the cash value of the card. In the case of credit cards, attackers use bots to test stolen credit card credentials with small transactions (for example, $1.00). Upon success, hackers use the stolen credentials to make large purchases or completely drain accounts of cash.
Spam relay involves any type of unwanted “spammy” behavior such as filling inboxes with unwanted email containing malicious links, writing fake product reviews, creating fake social media accounts to write fake or biased content,¹⁴ racking up page views (for example, on a YouTube video) or followers (such as on Twitter or Instagram), writing provocative comments on forums or social media sites to stir up controversy, rigging votes, etc.
Web scraping protected content involves attackers scanning and extracting (stealing) copyrighted or trademarked data from websites, storing it locally, and then reusing it—often for competitive purposes—on their own websites. Scraped data can include intellectual property, product information, and product prices. Airline, hospitality, online gaming, and ticketing websites are particularly vulnerable to web scrapers.
Click fraud typically involves advertising fraud—the fraud being that a bot, not a human, is clicking on an ad and therefore has no intention of purchasing the advertised product or service. Instead, the goal is to boost revenue for a website owner (or other fraudster) who gets paid based on the number of ads clicked. Such bots skew data reported to advertisers and cost companies a lot of money because they end up paying for non-human clicks. Even worse, those companies get no revenue from fake “shoppers.” Click fraud can also be used by companies to deliberately drive up the advertising costs of their competitors.
Auction sniping occurs on online auction sites where bad actors place perfectly-timed, last-minute bids on goods or services to prevent humans from bidding.
Intelligence harvesting involves scanning web pages, Internet forums, social media sites, and other content to find legitimate email addresses and other information that attackers can use later for spam email or fraudulent advertising campaigns.

You get the idea: the possibilities for bad bots are nearly endless.

As you’ve likely noticed, some of these appear on both the good and the bad lists—they can be used for either good or evil, depending on the user’s intent. Selenium, for example, was originally designed as a portable software testing framework for web apps, but it can also be used as a web scraping tool or to create a bot. Popular scanner tools like Shodan and Shadow Security Scanner are considered semi-benign when used for research but can also be used by attackers to gather target information for attacks.

Why You Need to Care

Bad bots are wreaking havoc across enterprises and the Internet. Those that launch DDoS attacks (especially against web applications) can be particularly devastating, primarily because of the massive processing power that botnets possess. Most companies couldn’t begin to handle on their own a 300 – 500 Gbps attack, much less a 1 Tbps attack. And the cost can be equally devastating. In a 2017 survey of security professionals, 75% estimated a DDoS attack would cost their companies between $500,000 and $10 million.

And, bad bots will cost your company in more ways than just financial. They create extra traffic on your network that can slow your site to crawl—traffic you have to deal with, and that you pay for in bandwidth and cloud resources. Bots can drive traffic away from your site, stealing potential revenue from you; they can cost your company a significant amount in fraudulent advertising charges; they can sabotage your company’s reputation through fake news, bad reviews, and other underhanded tactics. And because so many of today’s bots are good at hiding—appearing to be legitimate users—they are much harder to detect.

How to Get Rid of Them

There is no single solution for getting bad bot traffic off your network, but there are effective methods for fighting them. It’s kind of like building a jigsaw puzzle: you collect lots of different clues and put them together to form a whole.

Here are some pieces to consider when solving your bot security puzzle:

Reverse lookup. Finding the source IP address is mostly ineffective for detecting bad bots as attacks are often launched from hundreds of different IP addresses to avert suspicion. But reverse lookup is an effective way to identify good bots, such as search engine traffic. These IP addresses can be allowlisted, but remember that allowlists must be updated regularly to be effective.
Signature-based detection. Bots can be identified by their signatures—unique or specific patterns that have been observed in the past. Although signature-based bot detection is low-risk and reliable, it can’t detect new, unknown bots. To keep up with new bots, you’ll need to continually update known bot signatures and create new ones based on your own traffic analysis.
Behavior-based bot detection involves looking for suspicious behavior such as high or irregular traffic volumes, opening of non-standard ports, attempts to start or stop processes, attempts to download executables or access restricted files, and robotic surfing patterns. These can all be signs of bot activity.
Examine the browser for faked User Agents and malicious browser extensions. Every browser has a signature of its own that identifies how it’s been built and configured on a particular device. Part of this signature includes the user agent, which identifies the name and version of the specific browser and operating system, but this and other information can be faked by a competent attacker. Suspicious browser extensions can also be a sign of bots. Many exist solely for the purpose of scraping web content or taking other malicious action. Signatures that appear nonsensical should be scored as possible bots.
Use CAPTCHAs to distinguish human from bot behavior on your website. This method is not foolproof, but it will help you to block some traffic.
Issue a JavaScript challenge to verify that a normal browser is being used. Most bots cannot respond to a JavaScript challenge, so if the browser sends back a response, the traffic is likely legitimate.
Throttle good bots such as search engines to reduce the load on your website. Good bots can still do their work while using a moderate amount of bandwidth.
Rate-limit suspicious traffic to a set threshold if you’re not sure yet whether it’s bot traffic. This can keep valid traffic flowing on your website while you investigate further, for example, whether a surge in traffic from a specific IP address is legitimate or is possibly a credential stuffing attack.
Scrape off “opportunistic” attacks, meaning those that are looking for systems running a particular application—for example, Outlook Web Access (OWA). If your organization doesn’t use OWA, you can create less noise and load on your network by automatically blocking any traffic that you know doesn’t apply to you.
Assign risk scores to sessions. Use a combination of the methods discussed above to add or subtract points to a risk score. From there, you can decide when and what type of action you want to take against risky clients.

It’s also helpful to know which methods of keeping out bad bots aren’t so effective. Take geofencing, for example. In the past, you could stop a lot of bots by blocking all traffic from a country known for launching attacks. But today’s attackers are far more sophisticated, launching attacks from hundreds of IP addresses across multiple geographies. Also, many companies have customers, partners, and other legitimate traffic coming from specific countries, so it’s often impractical to block traffic entirely from a specific location. While geofencing still has a place in your security toolbox, on its own it isn’t your best tool for fighting bots.

Conclusion

With both good and bad bots accounting for as much as 50% of all Internet traffic, it’s critical for businesses to recognize that this trend is on an upswing. The most clever, experienced attackers will continue developing more sophisticated, enormous, and dangerous bots—especially as literally billions of vulnerable IoT devices (with default passwords and unconfigurable interfaces) continue to flood the market. Until IoT manufacturers are compelled by law to secure these devices, attackers will continue to use them to launch larger, multi-vector, attacks that originate with bots. At the same time, the availability on the open Internet of simple bots for free or at ridiculously low prices to “script kiddies” is a growing problem, because even simple bots in the hands of inexperienced people can be extremely dangerous.

Our advice for organizations? Brace yourselves for an onslaught of more bot traffic of all types. Step up your security controls to effectively manage good bots and keep as much bad bot traffic as possible off your networks.

Recommendations

Technical

Preventative

use a web application firewall for signature-based and behavior-based bot detection
use CAPTCHAs
use JavaScript challenges
rate-limit suspicious traffic
block all known “opportunistic” traffic

Authors & Contributors

Debbie Walkowski (Author)

Security Threat Researcher, F5

About Debbie

All Articles

Raymond Pompon (Author)

Director of F5 Labs, F5

About Raymond

All Articles

Good Bots, Bad Bots, and What You Can Do About Both

Not all bots are bad, but for those that are, you need a multi-pronged strategy for keeping them off your network.

When Bots are Good

When Bots are Bad

Why You Need to Care

How to Get Rid of Them

Conclusion

Recommendations

Authors & Contributors

Vital Components

Attack Type:

Attack Method:

Attack Motive:

When Bots are Good

When Bots are Bad

Why You Need to Care

How to Get Rid of Them

Conclusion

Recommendations

Authors & Contributors

Vital Components

Attack Type:

Attack Method:

Attack Motive:

Read More from F5 Labs

The Ghost in the Shell: Why Agentic AI is a Corporate Security Nightmare

The State of Post-Quantum Cryptography (PQC) on the Web

Introducing the CASI Leaderboard