SOLUTIONS

Manage and Prevent Web Scraping

Scraping 101

Web scraping (also known as price scraping, harvesting, mining, mirroring, and scraper bots) refers to the use of automated tools to collect large amounts of data from a target application in order to reuse that data elsewhere.

Scraping can range from benign to malicious, depending on the source, objective, and frequency of the requests. For example, a search engine bot that respects scraping rates defined in the site’s robot.txt will likely be viewed as acceptable, whereas daily price scraping from a competitor is likely unwanted.

A top 5 US airline was losing money

Scrapers were increasing the airline’s infrastructure costs and affecting the airline’s ability to manage revenue, so the security team sought out F5.

Case Study: International Airline Fights Fare Scrapers

Key points:

Travel aggregators used bots to discover and publicize non-compliant ticketing options
Scraping accounted for 25% of traffic on main search URL
Unwanted scrapers evaded all existing security solutions before F5

25%

UNWANTED SCRAPING ACCOUNTED FOR 25% OF ALL SEARCH TRAFFIC ON A SINGLE URL.

The 3 Steps of Scraping

1. Write Attack Script

Using automated tools, off-the-shelf scripts, or even scraping-as-a-service providers, attackers can easily create scripts to discover and scrape website content including prices, promotions, articles, and metadata.

How attackers simulate users

A Distinguished VP Analyst at Gartner Research demonstrates techniques attackers leverage to imitate users.

2. Collect Data

Scraping campaigns can range from brazen to stealth, depending on the attacker’s skillset and aims. Execution of the scraping script may be distributed amongst hundreds or thousands of servers in order to blend in with traffic patterns of the enterprise’s entire user population.

Your marketing team may be the first to experience the symptoms of scraping attacks, including fallen search rankings and poorer conversion rates.

3. Monetize

The extracted data may be sold, used for price-comparison sites, or even used to create imitation sites for fraudulent purposes.

Even if the scraper is a partner, enterprises may prefer that the party retrieve data from a specified API, rather than consume expensive resources by requesting data directly from web servers.