Identifying Trends in Recent Cyberattacks
Web attacks vary quite a lot—by target, technique, objective, and attacker—which makes it difficult for a system owner to assess the instantaneous risk to their particular combination of systems until they’re attacked. To help defenders anticipate the risks they face, we analyzed several months’ worth of global honeypot traffic from our research partner Effluxio as part of the 2020 Application Protection Report. While honeypots have some limitations, they also provide a view of cyberattacks that is difficult to get any other way.
What Are Honeypots Good For?
Honeypots are unique among hosts in that they don’t actually serve a page or have a domain name associated with them, which means that no human is likely to stumble upon one. In exchange for losing practical utility as a server, honeypots solve one of the most difficult problems in security: differentiating malicious from normal traffic. Because all of the traffic that a honeypot logs is either automated, like search engine crawlers, or malicious (or both), they are useful in seeing which combinations of tactics and targets are front of mind for attackers.
What Are Honeypots Not Good For?
The scope of the Internet makes it practically impossible to capture a significant chunk of its malicious traffic, no matter how many sensors you have or how you distribute them. Furthermore, because most honeypots don’t provide any actual services, they are unlikely to capture traffic from targeted attack campaigns that are seeking a specific asset as opposed to a category of asset. While honeypots are good for getting a sense of what less sophisticated attack traffic looks like, they will probably not catch traffic from state-sponsored actors or high-end cybercriminals. In short, honeypots can’t rule out attacks or threat actors, but they can rule them in.
Recent Cyberattacks: Breaking Down the Data
The set of Effluxio data that we analyzed contains server requests from mid-July through mid-October of 2020. Effluxio prefers not to disclose the locations or number of sensors, but we can still make a number of observations and a few conclusions based on this traffic. Here are the basic characteristics of the data set:
- About 1,090,000 connections logged
- 89,000 unique IP addresses
- 22,000 unique target path/query combinations1
- 42 unique countries represented in the targets
- 183 unique countries represented in the sources2
Nearly every aspect of the data set is characterized by a long tail. This means that a small subset of IP addresses, target paths, and geographical targets stands out as particularly common. For instance, 42% of the traffic in the time period came from the top 1% of IP addresses, and 52% of the traffic was targeting the top 1% of target paths. However, only 0.2% of the traffic featured both the top 1% of IP addresses and the top 1% of target paths—meaning that a huge proportion of the traffic was composed of either IP addresses or targets that the sensors logged only a few times. The tails of the traffic distributions were so long that for both the target paths and the source IP addresses, the median number of instances was one. The single most frequently seen IP address made up 2% of total traffic, and the most frequently seen target path, the web root /, made up just under 20% of total traffic (see Figure 1).
We do know that the most widely seen IP address, 184.108.40.206, which had more than three times as many logged connections as the next one, is a known malicious IP address associated with Russian scanning networks, as shown in Figure 2. None of the other top 10 IP addresses stood out as confirmed malicious assets.