Sampling Requests with NGINX Conditional Logging

NGINX | April 24, 2019

NGINX can record a very detailed log of every transaction it processes. Such logs are known as access logs, and you can fine‑tune the detail that is recorded for different services or locations with a customizable log‑file format.

By default, NGINX logs every transaction it processes. This might be necessary for compliance or security purposes, but for a busy website, the volume of data generated can be overwhelming. In this article, we show how to selectively log transactions based on various criteria, and how to use that knowledge to sample data points about requests in a quick and lightweight way.

Except as noted, this post applies to both NGINX Open Source and NGINX Plus. For ease of reading, we’ll refer to NGINX throughout.

Background – Quick Overview of NGINX Access Log Configuration

NGINX access logs are defined using the log_format directive. You can define several different named log formats; for example, a full log format named main, and an abbreviated log format named notes to record three data points about a request:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

The log format can reference NGINX variables and other values calculated at logging time.

You then use the access_log directive to instruct NGINX to log a transaction once it completes. This directive specifies the location of the log file and the log format to use:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

By default, NGINX logs all transactions using the following configuration:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

If you define your own access_log, it overrides (replaces) the default access log.

Conditional Logging

Sometimes you might wish to log certain requests only. This is done using conditional logging, as follows:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

Access Logs Are Not Inherited

Access log settings do not stack or inherit; an access_log directive in one context overrides (replaces) access logs declared in parent contexts.

For example, if you want to log additional information about traffic to the URI /secure, you might define an access log in a location /secure {...} block. This access log replaces the general access log defined elsewhere in the configuration.

The example in the previous section addresses this problem. It uses two access logs in the same context, with conditional logging that logs requests for /secure to a dedicated log file.

Challenges with Access Logs

Suppose that you wish to determine some statistical information about traffic to your website:

  • What’s the typical geographic split of users?
  • Which SSL/TLS ciphers and protocols do my users use?
  • What’s the split of web browsers?

The general access log is often not an appropriate place to log this information. You might not wish to pollute the access log with the additional fields needed for your study, and on a busy site, the overhead of logging all transactions would be too high.

In this case, you can log a limited set of fields to a specialized log. To reduce the load on the system, you might also wish to sample a subset of requests.

Sampling Techniques

Sampling from 1% of Requests

The following configuration uses the $request_id variable as a unique identifier for each request. It uses a split_clients block to sample data by logging just 1% of requests:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

Sampling from 1% of Unique Users

Suppose that we wish to sample one data point from each user (or from 1% of users), such as the User-Agent header. We can’t just sample from all requests because users who generate a large number of requests are then over‑represented in our data.

We use a map block to detect the presence of a session cookie, which tells us whether a request comes from a new user or from a user we have seen before. We then sample requests coming from new users only:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

Sampling Unique Things

Not all clients honor session cookies, however. For example, a web spider might ignore cookies, so every request it issues is identified as coming from a new user, skewing our results.

Wouldn’t it be great if we could sample from requests when it’s the first time we see a new thing? The thing can be a new IP address, a new session cookie value, a new User-Agent header, a not-seen-before host header, or even a combination of these. This way, we sample data for each thing only once.

Clearly, we need to store state (a list of the things we have seen), and for this we turn to NGINX Plus’ key‑value store. The key‑value store maintains an in‑memory key‑value database that can be accessed from NGINX Plus configuration using variables; the database optionally supports automatic expiry of entries (the timeout parameter), persistent storage (state), and cluster synchronization (sync). For each thing that is not already in the store, we log the request and add the thing to the store so it does not get logged again.

In NGINX Plus R18 and later, it’s very easy to set key‑value pairs while processing a transaction:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

A Real-World Example – Sampling TLS Parameters

This article was inspired by a real‑world problem – how can I configure TLS according to good practices without excluding users with legacy devices?

TLS best practice is a moving target. TLS 1.3 was ratified a year ago, but many clients only talk previous TLS versions; ciphers are declared ‘insecure’ and retired, but older implementations rely on them; ECC certificates offer greater performance than RSA, but not all clients can accept ECC. Many TLS attacks rely on a “man in the middle” who intercepts the cipher negotiation handshake and forces the client and server to select a less secure cipher. Therefore, it’s important to configure NGINX Plus to not support weak or legacy ciphers, but doing so might exclude legacy clients.

In the following configuration example, we sample each TLS client, logging the SSL protocol, cipher, and User-Agent header. Assuming that each client selects the most recent protocols and most secure ciphers it supports, we can then evaluate the sampled data and determine what proportion of clients get excluded if we remove support for older protocols and ciphers.

We identify each client by its unique combination of IP address and User-Agent, but identifying clients by session cookie or another method works just as well.

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

This generates a log file with entries like the following:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

We can then process the file using a variety of methods to determine the spread of data:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

We identify the low‑volume, less secure ciphers, check the logs to determine which clients are using them, and then make an informed decision about removing ciphers from the NGINX Plus configuration.

Conclusion

NGINX’s conditional logging can be used to sample a subset of the requests that NGINX manages, and write a standard or special‑purpose log. This technique is useful if you ever need to take a quick sample of traffic for statistical analysis, such as determining the spread of SSL parameters.

You need to put some thought into how you sample data so that busy users or spiders are not over‑represented. You can use variables in NGINX configuration, along with the map and split_clients directives, to select and filter requests.

For situations where the decision is more complex, or where high confidence of accuracy is desired, you can build sophisticated selectors in NGINX configuration. The NGINX Plus key‑value store enables you to accumulate state and share it across NGINX Plus instances in a cluster if needed.

Try out request sampling with NGINX Plus for yourself – start your free 30-day trial today or contact us to discuss your use cases.


Share

About the Author

Owen Garrett
Owen GarrettSr. Director, Product Management

More blogs by Owen Garrett

Related Blog Posts

Automating Certificate Management in a Kubernetes Environment
NGINX | 10/05/2022

Automating Certificate Management in a Kubernetes Environment

Simplify cert management by providing unique, automatically renewed and updated certificates to your endpoints.

Secure Your API Gateway with NGINX App Protect WAF
NGINX | 05/26/2022

Secure Your API Gateway with NGINX App Protect WAF

As monoliths move to microservices, applications are developed faster than ever. Speed is necessary to stay competitive and APIs sit at the front of these rapid modernization efforts. But the popularity of APIs for application modernization has significant implications for app security.

How Do I Choose? API Gateway vs. Ingress Controller vs. Service Mesh
NGINX | 12/09/2021

How Do I Choose? API Gateway vs. Ingress Controller vs. Service Mesh

When you need an API gateway in Kubernetes, how do you choose among API gateway vs. Ingress controller vs. service mesh? We guide you through the decision, with sample scenarios for north-south and east-west API traffic, plus use cases where an API gateway is the right tool.

Deploying NGINX as an API Gateway, Part 2: Protecting Backend Services
NGINX | 01/20/2021

Deploying NGINX as an API Gateway, Part 2: Protecting Backend Services

In the second post in our API gateway series, Liam shows you how to batten down the hatches on your API services. You can use rate limiting, access restrictions, request size limits, and request body validation to frustrate illegitimate or overly burdensome requests.

New Joomla Exploit CVE-2015-8562
NGINX | 12/15/2015

New Joomla Exploit CVE-2015-8562

Read about the new zero day exploit in Joomla and see the NGINX configuration for how to apply a fix in NGINX or NGINX Plus.

Why Do I See “Welcome to nginx!” on My Favorite Website?
NGINX | 01/01/2014

Why Do I See “Welcome to nginx!” on My Favorite Website?

The ‘Welcome to NGINX!’ page is presented when NGINX web server software is installed on a computer but has not finished configuring

Deliver and Secure Every App
F5 application delivery and security solutions are built to ensure that every app and API deployed anywhere is fast, available, and secure. Learn how we can partner to deliver exceptional experiences every time.
Connect With Us
Sampling Requests with NGINX Conditional Logging | F5