While investigating a recent threat campaign, F5 researchers encountered a strange behaviour where malicious requests were originating from legitimate Googlebot servers. This relatively infrequent behavior could potentially have serious consequences in environments where the trust level given to Googlebot influences an organization’s security decisions.
The Trust Paradox
Google’s official support site advises to “make sure Googlebot is not blocked”1 and provides instructions to verify that Googlebot is real.2 Both imply that trusting Googlebot traffic is somewhat mandatory if you’d like your site to show up in Google search engine results. Many vendors rely on the legitimacy of Google bot traffic and allowlist requests coming from genuine Googlebot servers. This means that malicious requests coming from Googlebot can bypass some security mechanisms without being inspected for content and potentially deliver malicious payloads. On the other hand, if an organization’s mitigation mechanism automatically denylists IP addresses delivering malicious requests, that organization could easily be tricked into blocking Googlebot, which may lead to a lower ranking in Google’s search engine.
Was Googlebot Hijacked?
After verifying that the requests we received on our threat intelligence system came from real Googlebot servers, we started looking into the possibility of an attacker creating such a scenario. It seems there are a couple of ways this could happened. One would be by controlling the Googlebot server, which seemed highly unlikely. Another would be by sending a fake User-Agent from another Google service. But since the requests originated from a Googlebot’s subdomain and Googlebot’s IP address pool and not from a different Google service (like Google Sites, for example), this possibility was also ruled out. That left us with the most likely scenario: that the service was being abused.
How Does the Googlebot Crawling Service Work?
Essentially, Googlebot follows every new or updated link on your website and then follows links from those pages, and so on. This is done to allow Google to add pages previously unknown to its search engine database. It also allows Google to analyze websites and later, make them available to users searching on Google’s search engine. Technically, “following links” means sending a GET request to every URL listed in the links on the website. So, Googlebot servers generate requests based on links they do not control and, as it seems, do not validate.
Given that Googlebot follows links, attackers figured out a simple method to trick Googlebot into send malicious requests to arbitrary targets. An attacker can add malicious links on a website, each link composed of the target’s address and a relevant attack payload. Here is a generic example of such a malicious link:
<a href="http://victim-address.com/exploit-payload">malicious link<a>
When Googlebot crawls through the page with this link, it follows the link and sends a malicious GET request holding exploit-payload to the attacker’s target of choice, in this case, victim-address.com.
Using this very method, we managed to manipulate Googlebot to send malicious requests to a target of choice. We used two servers when testing this method—one for the attacker and another for the target. Using Google Search Console, we configured Googlebot to crawl through our attacker’s server where we added a web page that contained a link to the target server. That link held a malicious payload. After some time sniffing traffic on the target server, we spotted the malicious request with our crafted malicious URL hitting the server. The origin of the request was none other than a legitimate Googlebot.