Introduction
This year we are releasing our 2019 Application Protection Report as a series of short, tightly focused episodes. This helps ensure we provide timely threat intelligence that our readers can add to their own threat models and use to prepare appropriate defenses and responses. Last episode, we focused on PHP’s continuing run as one of the great weak points on the Internet. We also dived into the details of a widespread and unsophisticated reconnaissance campaign against PHP database targets.
This week we explore the world of data breaches, examining patterns across sectors, breach causes, and impacts. Our analysis is based on U.S. state-level breach notifications that organizations (or their lawyers) are legally obligated to provide whenever personally identifiable information under their control is exposed either to attackers, or to the public. Data in this form have some limitations—most specifically that these are legal documents, not technical ones, and lawyers often include only the legally obligated minimum amount of detail in order to protect their clients. Furthermore, the obligation to report varies from state to state. Some states mandate reporting for breaches involving their citizens, while others require reporting for any organization that “does business” in their state. Nevertheless, the legal underpinning for these notifications also means that they are the closest thing we have to verifiable and comparable breach data.
For the 2018 report, we reviewed 384 breaches in 2017 across four states: California, Idaho, Oregon, and Washington. Most U.S. states require that victims be notified of data breaches, but at the time, only a few states’ attorneys general collected and shared breach letters on their websites. Because California, Washington, Idaho, and Oregon represent more than 16% of the U.S. population, we felt this was still a sufficiently large sample size to allow us to draw some broader conclusions about what’s going wrong.
For this year’s report, we looked at 761 breaches reported in 2018 across 10 states representing 21.4% of the U.S. population: California, Washington, Wisconsin, Vermont, New Hampshire, Iowa, Maryland, Oregon, Idaho, and Delaware. Here are the conclusions we drew.
Top Threats
Last year, we found that two threat vectors—code injected form-jacking and phishing—had become significant and growing problems.
In our 2018 report (which analyzed data gathered in 2017), we found that payment card skimming via injection was the single greatest threat to applications. These attacks are injection attacks that exploit local web application vulnerabilities to load payment card software-skimmers (sometimes referred in breach reports as malware) into unsecured payment entry forms. Attacks of this type constituted 21 % of the total set of breaches we analyzed, and included many of the most significant injection-based breaches of that year. Most of the compromised payment forms and shopping cart applications ran on PHP; additionally, we found that PHP exploit attempts made up 58% of the total attack traffic observed in 2017 by Loryka sensors.
Phishing and other access control attacks were the second greatest threats, representing 14% of all breaches that we analyzed last year. As 2018 went on we found that phishing campaigns were growing in prevalence and sophistication, so it was no surprise to see that for this past year, phishing actually surpassed injection: phishing was responsible for 21% of breaches with a known root cause, whereas injection for payment card skimming was responsible for about 12%.
Specific exploits and tactics may shift slightly, but it looks as though the two weakest points on the internet—people and PHP-based payment card forms—are set to retain their unenviable crowns.
Sorting breaches by cause
As noted above, breach notification letters usually lack technical detail. Unfortunately, 13% of breach letters in 2018 did not attribute a specific cause, including the four largest breaches by number of exposed records. We cannot use these breach notifications to reach definitive conclusions, but we can at least organize them into categories by cause to get a sense of overall trends, if not specific diagnoses. Here are the significant breach cause categories as we found them, with our notes in parentheses:
Access-related breach causes included
- Email (yes, we also found this annoyingly vague)
- Phishing which resulted in access to email (no other details noted)
- Phishing to gain access to login credentials
- Social engineering by email to gain access (yes, this is probably the same as phishing)
- Brute forcing of credentials
- Credential stuffing
- Stolen access credentials (possibly from a phish?)
- Access credentials stolen from a third party (could be related to credential stuffing)
- Social engineering by telephone to gain access credentials (“Hi, I'm the county password inspector.”1)
Web breach causes included
- Web app code injection attacks (aka the form injection / Magecart / skimmer malware)
- Web hacking (no other details noted)
- Web application hacking
Accidental breach causes included
- Sending information to unintended recipients (wrong attachment or wrong receipent)
- Lost/stolen/misplaced physical assets (mostly laptops in cars)
- Access misconfigurations that allowed unauthorized access
Physical security breach causes included
- Physical infiltration (mostly burglary and laptops stolen out of cars)
- Point-of-sale device attacks and the placing of physical skimmer devices
Insider breach causes included
- Malicious data exfiltration
- Intentional misconfiguration/sabotage (more on this)
- Insiders at trusted third parties that abuse their authorized access
Malware breach causes included
- Any use of malware to manipulate or gain control of remote systems
- Ransomware attacks (which triggers a breach notification in some venues)
Third-party hacked refers to specific incidences where a cyber-related breach at a third party led to unauthorized access to organizational data.
Phishing (no other details given) are cases where phishing was mentioned but it was not clear whether the phishing was used to obtain access credentials, or to drop malware.

We found that access-related breaches constituted the largest proportion by far of the known breach causes, at 47%. The subcategories within access-related breaches obviously have significant overlap, but all of them are reducible to either one form or another of social engineering (as in the case of phishing) or credential theft. This is partly a reflection of the strength of other technical controls: if it were easier to circumvent authentication completely, we would not see so many attackers getting through this way. As it stands, this year’s real breaches show humans and their access structures remain one of the weakest point to applications.
Industry Profiles
While the lack of details in the dataset prevent us from identifying more specific trends, we were able to group the breaches into two rough target/vector profiles that corresponded to the two most common breach causes identified above.
Industry profile 1 – Organizations that accept payment cards on the web
One of the profiles we identified was a pattern of industry sectors with a high rate of compromise through payment form injection. The retail sector, which relies heavily on ecommerce transactions, had a disproportionately high rate of compromise by injection, with 72% of attributable breaches coming from that vector. Similar industries, such as manufacturing and technology, also tended to be breached this way. The public sector also had a high incidence of successful injection attacks, probably due to the prevalence of the Click2gov exploit that haunted local government and utility sites in 2017 and 2018.1
In short, for organizations that accept credit cards for online payment, payment form injection attacks such as Magecart are a significant risk and specific controls must be put into place to prevent and monitor for these attacks. We’re going to dive much deeper into this attack (and controls to mitigate it) in a future article.