These days, many web security incidents involve automation. Web-scraping, password reuse, and click-fraud attacks are perpetrated by adversaries trying to mimic real users, and thus will attempt to look like they are coming from a browser. As a website owner, you want to ensure you serve humans, and as a web service provider you want programmatic access to your content to go through your API instead of being scraped through your heavier and less stable web interface.
Assuming that you have basic checks for cURL-like visitors, the next reasonable step is to ensure that visitors are using real, UI-driven browsers — and not headless browsers like PhantomJS and SlimerJS.
In this article, we’re going to demonstrate some techniques for identifying visits by PhantomJS. We decided to focus on PhantomJS because it is the most popular headless browser environment, but many of the concepts that we’ll cover are applicable to SlimerJS and other tools.
NOTE: The techniques presented in this article are applicable to both PhantomJS 1.x and 2.x, unless explicitly mentioned. First up: is it possible to detect PhantomJS without even responding to it?
As you may be aware, PhantomJS is built on the Qt framework. The way Qt implements the HTTP stack makes it stick out from other modern browsers.
First, let’s take a look at Chrome, which sends out the following headers:
In PhantomJS, however, the same HTTP request looks like this:
You’ll notice the PhantomJS headers are distinct from Chrome (and, as it turns out, all other modern browsers) in a few subtle ways:
Checking for these HTTP header aberrations on the server, it should be possible to identify a PhantomJS browser.
But, is it safe to believe these values? If an adversary uses a proxy to rewrite headers in front of the headless browser, they could modify those headers to appear like a normal modern browser instead.
We may not be able to trust the User-Agent value as delivered via HTTP, but what about on the client?
Unfortunately, it is similarly trivial to change user-agent header and navigator.userAgent values in PhantomJS, so this might not be enough.
navigator.plugins contains an array of plugins that are present within the browser. Typical plugin values include Flash, ActiveX, support for Java applets, and the “Default Browser Helper”, which is a plugin that indicates whether this browser is the default browser in OS X. In our research, most fresh installs of common browsers include at least one default plugin — even on mobile.
This is unlike PhantomJS, which doesn’t implement any plugins, nor does it provide a way to add one (using the PhantomJS API).
The following check might then be useful:
It’s also not difficult to imagine a custom build of PhantomJS with real, implemented plugins. This is easier than it sounds because the Qt framework on which PhantomJS is built provides a native API for implementing plugins.
After measuring several times, it appears that if the alert dialog is suppressed within 15 milliseconds, the browser is probably not being controlled by a human. But using this approach means bothering real users with an alert they’ll manually have to close.
PhantomJS 1.x exposes two properties on the global object:
However, these properties are part of an experimental feature and may change in the future.
One such method is Function.prototype.bind, which is missing in PhantomJS 1.x and older. The following example checks whether bind is present, and that it has not been spoofed in the executing environment.
This code is a little too tricky to explain in detail here, but you can find out more from our presentation.
For example, suppose that PhantomJS calls evaluate on the following code:
Note that this example uses a custom indexOfString() function, left as an exercise for the reader, since the native String.prototype.indexOf can be spoofed by PhantomJS to always return a negative result.
Now, how do you get a PhantomJS script to evaluate this code? One technique is to override some frequently used DOM API functions that are likely to be called. For example, the code below overrides document.querySelectorAll to inspect the browser’s stack trace:
To learn more, we recommend watching this recording of our presentation from AppSec USA 2014 (slides). We’ve also put together a GitHub repository of example implementations — and possible circumventions — of the techniques presented here.
Thanks for reading, and happy hunting.
Sergey Shekyan – @sshekyan
Ben Vinegar – @bentlegen
Bei Zhang – @ikarienator