BLOG

Global Privacy Assembly Warns of the Privacy Risks of Scraping

Jim Downey Miniatura
Jim Downey
Published September 13, 2023

Last year, the Global Privacy Assembly (GPA), an association of over 130 data protection and privacy regulators and enforcers, warned that credential stuffing by bots posed a global risk to data privacy and that organizations are obliged to take reasonable measures to prevent it. Now the International Enforcement Cooperation Working Group (IEWG) of the GPA has issued a joint statement warning that social media companies and other websites are responsible for protecting individuals’ personal information from unlawful data scraping.

What is Data Scraping?

Data scraping is the automated extraction of data from web and mobile applications. Scrapers extract data for many reasons from gaining competitive advantage to aggregating data for price comparisons. Organizations often try to prevent scraping because they do not want competitors gaining access to their pricing or inventory data or because the volume of scraping traffic impacts app performance. In many cases, scraping traffic can account for over 95% of overall traffic and can slow down or even crash applications.

How Does Data Scraping Put Data Privacy at Risk?

Privacy regulations are intended to protect the data privacy rights of individuals, enabling them to decide who can use their data for what purpose and for how long. When users share data on social media, they intend for it to be viewed by a certain group of people as set forth in the app’s privacy settings and policies. When this data is scraped without their knowledge, individuals lose control of their personal information as the scrapers may use it in ways separate from its intended purpose.

Data scraping also violates the right of individuals to request the deletion or correction of their personal data. Once the data has been scraped, even if the creator deletes the data from the social media site, the scrapers will continue using and sharing that data.

This loss of control over personal data, according to the GPA’s joint statement, puts individuals at risk in several ways. Criminals may use this data for social engineering, targeted phishing, and identity fraud. The data may enable criminals to profile individuals with the intention of bypassing authorization systems. And less than scrupulous marketers may use the data for direct marketing and spam.

Does this Pose Compliance Risk for Organizations?

According to the GPA’s joint statement, personal information, even when publicly available on the internet, is subject to data protection laws. This means that the mass data scraping of personal information may constitute a reportable data breach in many jurisdictions.

The joint statement recommends measures to prevent data scraping that are required by statutory requirements in many jurisdictions and may be “interpreted as such by courts and data protection authorities.”

What Does the GPA Recommend?

To mitigate the privacy risks of data scraping, the GPA recommends multi-layered technical and procedural controls beginning with the creation of a designated team to implement the controls and monitor their effectiveness. Other controls include rate limiting of users, monitoring for rapid linking within the social network, taking legal action against data scrapers to ensure the deletion of data, and notifying individuals and regulators of scraping activities that constitute a data breach.

As a component of the multi-layered controls, the GPA also recommends mitigating the bots that scrape data. The joint statement specifically mentions CAPTCHA and IP rate limiting. Organizations that want effective multi-layered protection should consider bot management solutions that use signal collection and AI to mitigate advanced bots that bypass CAPTCHA and IP rate limiting.

What Has Changed to Make this Threat More Concerning?

If we consider data scraping in light of recent developments in phishing, the threat to individual privacy becomes even more alarming. Phishing-as-a-Service (PhaaS) vendors provide full toolsets for launching phishing attacks, including email templates, fake websites designed to look authentic, contact details of potential targets, detailed how-to instructions, and customer support, all of which makes phishing more effective for even low-skilled attackers. Moreover, phishing has become the primary means to bypass MFA through real-time phishing proxies that fool users into submitting one-time passwords into attacker-controlled apps. (See the F5 Labs report on how phishing is rendering MFA ineffective.)

We should expect phishing to become ever more effective through the application of large language models. Reports of new attack tools for sale on the dark web, including WormGPT and FraudGPT, indicate criminals have begun to adapt generative AI for nefarious purposes, including phishing. These tools will almost certainly benefit from ingesting personal data scraped from multiple apps to craft effective, personalized phishing messages, a development that may lead to the automation of spear phishing.

Why Should We Care?

Data scraping and other uses of malicious automation such as credential stuffing threaten to expose our data to criminals. These criminals may aggregate that data and use it against us in social engineering schemes and phishing attacks, attacks that will result in financial fraud and other damages. The GPA is performing an important public service by issuing these warnings and making organizations aware of the harmful consequences of allowing bots to scrape our personal data. To learn about how F5 can help you address data scraping and data privacy, visit www.f5.com/go/solution/scraping.