2020 APR, Vol. 1: APIs, Architecture, and Making Sense of the Moment

The Application Protection Research Series is an ongoing project at F5 Labs that provides an overarching view of the application security landscape. While detailed analyses of specific attacks are critical for defenders to adapt to emerging techniques, it is easy to overemphasize tactics over strategy if those kinds of analyses are the only thing we consume. In contrast, this research is particularly focused on longer timeframes for analysis, since time is an indispensable component in calculating risk. This perspective also allows us to escape the hype around the vulnerability, APT, or exploit of the week. Where we can, we use quantitative methods to be as objective as possible. Where necessary, we read between the lines of both data and human behavior to tease out the crux of application security in the present moment.

In 2019, this project began to focus on the subject of application architecture. This focus came about for two reasons. The first is that applications have changed. Specifically, applications have moved toward an increasingly distributed and decentralized model. The result is a growing emphasis on connecting disparate services over the web as opposed to creating large, multifunctional monoliths. The fact that this movement has been gradual, and the sum of a number of smaller, more evolutionary developments, should not obscure the magnitude of this transformation.

The second reason for our new focus is the fact that attackers have successfully adapted to take advantage of these changes, and in general have done so more quickly than defenders have. The evidence for this successful adaptation lies in both the number of significant successful attacks over the last few years and the characteristics of these attacks. Even though attack details vary significantly, the overarching pattern is that these architectural shifts are causing organizational breaches in ways that should have been easily avoidable. This holds true across the spectrum of expected competency and maturity for defenders. It’s not just small startups that can’t afford to prioritize security that are being compromised through emerging architectures. Some big players in our industry are finding out too late that their data are hanging out all over the place.

Application programming interfaces, or APIs, are the most prominent and grave example of an architectural change driving actualized risk. As a result, we are devoting a big part of our 2020 Application Protection Research Series to understanding the changing relationship between APIs and application security. To be clear, we want to emphasize early on that APIs are not inherently more or less risky, and they have an enormous range of advantages in the present business environment. Rather, we argue they represent a transferal of risk away from traditional security foci and into new realms, and until we understand that transferal better, APIs will continue to be a source of security breaches for organizations. Our goal is not to scare people away but to provide options and guidelines for application owners to maximize the benefit and minimize the risk of these architectures. This article is the first in a series that will explore the topic from a number of angles, but we will begin here with the big picture surrounding API incidents.¹

API security may seem like old news, and a prodigious body of literature already exists on the subject. Some of it is even from F5 Labs, like our piece on what APIs are and why they matter as well as the article on API breaches for the 2019 Application Protection Report. Some people are also ready to take your money to fix your API problems. The market for API management or mediation solutions is established and growing. What can we add to this discussion?

What we want to emphasize, in addition to merely echoing existing calls to take API security seriously, is that the architectural shift APIs represent has broad and lasting implications for application security and ultimately for the balance of risk and reward for putting services on the web. Taken as an individual component, the API is an evolutionary development—a slightly different angle on the same basic value proposition of web applications. Taken in context with other architectural changes as well as economic forces presently shaping the Internet, however, APIs represent more of a break with the past than continuity. Seeing these changes as discontinuous is important in recognizing that API security, and the implications it has for application security, is not a one-time project. API security will define the next phase in application security; this is the battlefield on which the next few years will be fought.

Having said that, to establish a baseline, we will begin by presenting some general trends in API breaches and incidents over the last few years. Then we’ll unpack what we can conclude from these data before establishing deeper questions that we’ll be investigating over the course of 2020. But first, a note on our sources for this project.

Sources and Methods

The API incident data we use here comes from a collection of open-source reports on API incidents, including both confirmed breaches by malicious attackers and vulnerability discoveries by security researchers. We only selected incidents in which data were confirmed exposed or breached, as opposed to vulnerabilities whose exploitation depended on preconditions that made them largely theoretical. The dataset we used for this article lists 67 known incidents from 2018 to July 2020.²

One interesting facet of this dataset is that more than 75 percent of the incidents were reported by researchers as opposed to organizations that were breached by malicious attackers. Since we know that most organizations who are notified by security researchers will eventually publish their vulnerabilities, (to keep the researcher placated if for no other reason), we can assume that our data contain the bulk of those incidents that were responsibly disclosed. However, given the difficulty of identifying intrusions, and the enduring lag in time between intrusion and discovery, we feel that this dataset almost certainly under-reports the number of malicious API incidents. As successful attacks are discovered, the ramifications of our current Wild West period of API deployment will doubtless reverberate into the future.

Findings

The first thing that stands out about the data is the growing number of events. We observed nine incidents in 2018, 35 in 2019, and 25 so far in the first half of 2020. At this rate, the number of publicly reported API incidents will approach 50 by the end of 2020 (see Figure 1).

API Incidents, 2018-mid-2020. API incidents are becoming more frequent. At the current rate, a greater number of API incidents will occur in 2020 than in the previous two years combined.

When we break these incidents down into categories, the most frequent problem is a complete lack of authentication in front of API endpoints, followed by broken authentication and broken authorization (see Figure 2). Note that these categories are necessarily rough in order to generalize across incidents that contain a lot of variation. For instance, many of the broken authentication and authorization events stemmed at least partly from misconfigurations, but we felt that the recurring commonalities in the specific nature of these misconfigurations warranted a separate category. The real conclusion from this view is that the most frequent causes of API incidents in the last two years are issues that we could safely characterize as reflecting a low level of security maturity. We can expect better from ourselves as an industry.

Sorting incidents by sector (see Figure 3), was not particularly illuminative, partly because the majority of incidents occurred at organizations that we generally lump into the “tech” sector, and partly because differentiating between tech companies and non-tech companies in terms of consumption models is increasingly difficult. In fact, as more and more organizations publish APIs for their systems in an effort to enable partnerships and integration with other services, even differentiating on the basis of tech production is becoming difficult. We will explore this theme more in forthcoming API articles.

We find similar problems when we sort the sectors by causes, looking for a relationship between sectors and the root cause of incidents (see Figure 4). While there are some obvious patterns here—the predominance of incidents at tech companies in which no authentication, bad authentication, or bad authorization were primary factors—we already knew from simpler views of the data that these three causes, and the tech industry in general, were responsible for the bulk of incidents. Given that other sectors account for a greater proportion of incidents in which no authentication was present at all than they do for the dataset overall, we can cautiously conclude that tech companies are slightly ahead of the curve in terms of placing some authentication in front of API endpoints, but ultimately there is not a great deal of meaningful intelligence in this view.

Number of API incidents by sector and root cause.

Likewise if we look at relationships between incident causes and time (see Figure 5) If we extrapolate the numbers for the first half of 2020 to a full calendar year, we might cautiously conclude that a greater number of organizations are overcoming the first hurdle and are at least attempting to implement authentication in front of APIs. Realistically, though, not enough significance exists to draw any actionable conclusions here.

Time series analysis of API incidents by root cause.

In short, these open-source data do not reveal any statistically significant and practically meaningful relationships between target attributes and incident modes. (We also looked for patterns between incident types and geographic regions and found no meaningful pattern; the overrepresentation of U.S.-based organizations was mirrored across all incident types in similar proportions.) While it is clear that certain root causes are more prevalent in incidents, these root causes consistently come to the fore regardless of variations in the geographical location, industry, and time.

This consistency across inconsistent targets does, however, provide some interesting clues as to the nature and future of API security. These incidents seem to indicate a relative uniformity of risk across an otherwise varied landscape of organizations, no matter from what angle we examine it. This leads us to conclude, on a broad basis, that the security industry’s approach to API security is immature, and that the nature of the problem is currently not well understood. It is on this basis that we emphasize that the present model of API design and use, and architecture more broadly, needs to be understood as a fundamental transformation that demands, in turn, a new approach to securing applications, or even to conceptualizing trust and responsibility in networked computing. This transformation can be understood in many ways, but is best encapsulated in terms of a dramatic expansion in attack surface.

Mapping the New Attack Surface

In the current state of affairs, that is, in the context of a global economy that is integrating disparate services faster than security people can keep up, the most important thing about API security is not actually in the details of their exploitation, which vary from case to case even in the same categories. Rather, the most important thing is the most fundamental thing about them. An API’s sole purpose is to facilitate the transfer of information in and out of a network in ways that are necessarily and deliberately obscure to a normal user. As REST APIs are now the predominant style, that transfer of information is happening over the web, using HTTP methods. Even supposing every single partner for every single private API has its bases completely covered—which is a poor supposition that runs against the principles of Zero Trust and assume breach—at a minimum, each API endpoint represents an expansion of the attack surface, and therefore demands controls like any other endpoint. Public APIs are even more extreme. These are not merely new openings in the perimeter, they are openings that are publicized to the community with the exact skill set necessary to find and exploit vulnerabilities or misconfigurations in them.

At the extreme end of the spectrum, many of the largest and most popular web applications that rely heavily on integrating third-party services now contain hundreds of APIs. Each of these represents a separate opportunity for attackers, a transferal of traffic to a different context (if not an outright increase), and a business and security dependency on an entity outside of a system owner’s control. This kind of architecture represents a sufficiently different risk model with respect to visibility, complexity, inventory, and business partnerships that makes the old assumptions about baseline risk or threat modeling meaningless. This is why attributes like industry are not particularly meaningful. No matter what industry an organization is in or the value of its data, its API represents a path to something else, whether that is a partner, a customer, or a hardened piece of infrastructure that is not reachable any other way. The connective nature of APIs means that target attributes are less important and baseline controls are more important.

Understood in this way, API incidents of the type that we are seeing—authentication gaps, authentication failures, authorization failures, and unsanitized input—are a product of an old way of thinking colliding with a new reality of risk that is being driven by changes in business practices. We argue that if the existing approach were sufficient, we would not be seeing such consistent immaturity in API security across such a varied landscape of organizations, especially given that standards and tool sets already exist to manage this risk.

The connective nature of APIs means that target attributes are less important and baseline controls are more important.

What Can I Do About It?

Of course, this doesn’t mean that every single organization that publishes an API is a security disaster. Plenty of organizations have implemented these design principles and technologies well. Future articles will explore detailed case studies of specific scenarios that exemplify some of the vectors that we feel currently present the greatest risk. In the meantime, the following are some broad guidelines that can serve as a baseline for managing risk around REST APIs in any context.

Inventory: You cannot secure what you do not know about. This might sound silly, but in the context of everything moving to DevOps (which can lead to shadow IT issues) and increasingly granular integrations that only apply to specific lines of business, such as SEO integrations for marketing teams, maintaining awareness of API endpoints is not a trivial problem.
Authentication: If we were to stress a single point in this article, it would be that all APIs require authentication. While in some specific use cases another authentication protocol might make sense, the emerging consensus is that OpenID Connect, which is based on the OAuth 2.0 authorization protocol, is the preferred (and tested) method for API authentication.
Authorization: After no authentication and broken authentication, broken or absent authorization controls were one of the leading causes of API incidents. Because of the obscure and decentralized nature of API traffic, maintaining strict control of agent permissions is critical to preventing tampering, enumeration attacks, or lateral movement. OAuth 2.0 is the preferred standard for managing API authorization, and within that standard, the JSON Web Token (JWT) format is becoming the preferred way to implement token-based authorization.
Encryption: Forcing HTTPS for API connections is wise however you cut it, but since OAuth 2.0 requires Transport Layer Security (TLS) to maintain the confidentiality of secret keys, HTTPS is increasingly mandatory for APIs.
API gateways/mediation: API gateways have quickly come to be regarded as the minimum for enterprise architects who need to manage a diverse spectrum of APIs and their traffic. Gateways are particularly suited to managing north-south API traffic (that is, traffic coming from outside connections), and many gateways offer integrated authentication and authorization functionality. These can also limit the impact of an intrusion in the event that a public-facing API endpoint is compromised by other means.

One thing we need to emphasize in discussing API security is that it is not valuable at this point to discuss the inherent security strengths and weaknesses of this architectural style. APIs are already the emerging de facto standard for business integration. The question is not, “Are these things safe?” but “How can we make these things safe enough?” With that in mind, the goal of this piece has been to stress two specific perspectives that we believe will help make them safe enough: the first is the idea that, in practice, APIs are a revolutionary development that demand a fundamental change in approach. The other is realizing that a significant number of organizations that use them have not recognized this change, as evidenced by the complete lack of rudimentary controls. If we view APIs as simply the most obvious and prevalent example of a deeper and broader shift in terms of system design, this moment of recognition is even more important. The next few articles will articulate specific examples of ways in which organizations failed to recognize the paradigm shift until it was too late.

The following security controls are recommended to protect against API attacks.

Technical

Preventative

Use API authentication to prevent unauthorized access to API resources. OpenID Connect is the de facto standard for this.
Use API authorization to control permissions and user agent actions post-authentication. OAuth 2.0 is the de facto standard for this.
Inventory your APIs to document both the risk and their configuration. This can require working with disparate lines of business because each team may be integrating with services that are unknown to IT.
Use API gateways or other API mediation techniques to integrate and streamline access control, routing, and isolation.
Use TLS to encrypt API traffic, particularly for OAuth 2.0 authorization traffic, which requires encryption to maintain confidentiality/integrity of secrets.

Footnotes

¹ F5 Labs distinguishes between breaches and incidents in that a breach includes the exfiltration of sensitive information, whereas an incident is a broader category that indicates exposure of data but not necessarily exfiltration. This distinction is particularly important in the case of API security, because so many of the security events that we’ve examined for this analysis were reported by researchers, not malicious attackers, and therefore stopped short of exfiltration.

² A brief note on sources: while we strive to collect and analyze empirical, quantitative data, finding this kind of reliable and well-formed data is a constant challenge in our field. Many won’t, or can’t, share. Many want to share, but have incomplete data or data that are structured for purposes other than research like ours. When we lack high-quality data, we occasionally turn to manually aggregating events from open-source intelligence. These kinds of sources also present their own challenges in terms of detail and consistency from source to source, which means that our researchers inevitably need to make some judgment calls in terms of categorization and reliability. The API breach information presented here is drawn from such open-source intelligence, which accounts for the necessarily high-altitude view of the events.

2020 Application Protection Report, Volume 1: APIs, Architecture, and Making Sense of the Moment

Sources and Methods

Findings

Mapping the New Attack Surface

What Can I Do About It?

Attack Type:

Attack Method:

Affected Tiers

What's trending?

Research & Insights Featured On