Performance Testing NGINX Ingress Controllers in a Dynamic Kubernetes Cloud Environment

NGINX | September 22, 2020

Editor – This post is part of a 10-part series:

Reduce Complexity with Production-Grade Kubernetes
How to Improve Resilience in Kubernetes with Advanced Traffic Management
How to Improve Visibility in Kubernetes
Six Ways to Secure Kubernetes Using Traffic Management Tools
A Guide to Choosing an Ingress Controller, Part 1: Identify Your Requirements
A Guide to Choosing an Ingress Controller, Part 2: Risks and Future-Proofing
A Guide to Choosing an Ingress Controller, Part 3: Open Source vs. Default vs. Commercial
A Guide to Choosing an Ingress Controller, Part 4: NGINX Ingress Controller Options
How to Choose a Service Mesh
Performance Testing NGINX Ingress Controllers in a Dynamic Kubernetes Cloud Environment (this post)

You can also download the complete set of blogs as a free eBook – Taking Kubernetes from Test to Production.

As more and more enterprises run containerized apps in production, Kubernetes continues to solidify its position as the standard tool for container orchestration. At the same time, demand for cloud computing has been pulled forward by a couple of years because work-at-home initiatives prompted by the COVID‑19 pandemic have accelerated the growth of Internet traffic. Companies are working rapidly to upgrade their infrastructure because their customers are experiencing major network outages and overloads.

To achieve the required level of performance in cloud‑based microservices environments, you need rapid, fully dynamic software that harnesses the scalability and performance of the next‑generation hyperscale data centers. Many organizations that use Kubernetes to manage containers depend on an NGINX‑based Ingress controller to deliver their apps to users.

In this blog we report the results of our performance testing on three NGINX Ingress controllers in a realistic multi‑cloud environment, measuring latency of client connections across the Internet:

The NGINX Ingress Controller maintained by the Kubernetes community and based on NGINX Open Source. As in previous blogs, we refer to it here as the community Ingress controller. We tested version 0.34.1 using an image pulled from the Google Container Registry.
NGINX Ingress Controller based on NGINX Open Source, version 1.8.0 maintained by NGINX.
NGINX Ingress Controller based on NGINX Plus, version 1.8.0 maintained by NGINX.

Testing Protocol and Metrics Collected

We used the load‑generation program wrk2 to emulate a client, making continuous requests over HTTPS during a defined period. The Ingress controller under test – the community Ingress controller, NGINX Ingress Controller based on NGINX Open Source, or NGINX Ingress Controller based on NGINX Plus – forwarded requests to backend applications deployed in Kubernetes Pods and returned the response generated by the applications to the client. We generated a steady flow of client traffic to stress‑test the Ingress controllers, and collected the following performance metrics:

Latency – The amount of time between the client generating a request and receiving the response. We report the latencies in a percentile distribution. For example, if there are 100 samples from latency tests, the value at the 99th percentile is the next-to-slowest latency of responses across the 100 test runs.
Connection timeouts – TCP connections that are silently dropped or discarded because the Ingress controller fails to respond to requests within a certain time.
Read errors – Attempts to read on a connection that fail because a socket from the Ingress controller is closed.
Connection errors – TCP connections between the client and the Ingress controller that are not established.

Topology

For all tests, we used the wrk2 utility running on a client machine in AWS to generate requests. The AWS client connected to the external IP address of the Ingress controller, which was deployed as a Kubernetes DaemonSet on GKE-node-1 in a Google Kubernetes Engine (GKE) environment. The Ingress controller was configured for SSL termination (referencing a Kubernetes Secret) and Layer 7 routing, and exposed via a Kubernetes Service of Type LoadBalancer. The backend application ran as a Kubernetes Deployment on GKE-node-2.

For full details about the cloud machine types and the software configurations, see the Appendix.

Testing Methodology

Client Deployment

We ran the following wrk2 (version 4.0.0) script on the AWS client machine. It spawns 2 wrk threads that together establish 1000 connections to the Ingress controller deployed in GKE. During each 3‑minute test run, the script generates 30,000 requests per second (RPS), which we consider a good simulation of the load on an Ingress controller in a production environment.

where:

-t – Sets the number of threads (2)
-c – Sets the number of TCP connections (1000)
-d – Sets the duration of the test run in seconds (180, or 3 minutes)
-L – Generates detailed latency percentile information for export to analysis tools
-R – Sets the number of RPS (30,000)

For TLS encryption, we used RSA with a 2048‑bit key size and Perfect Forward Secrecy.

Each response from the back‑end application (accessed at https://app.example.com:443) consists of about 1 KB of basic server metadata, along with the 200 OK HTTP status code.

Back-End Application Deployment

We conducted test runs with both a static and dynamic deployment of the back‑end application.

In the static deployment, there were five Pod replicas, and no changes were applied using the Kubernetes API.

For the dynamic deployment, we used the following script to periodically scale the backend nginx deployment from five Pod replicas up to seven, and then back down to five. This emulates a dynamic Kubernetes environment and tests how effectively the Ingress controller adapts to endpoint changes.

Performance Results

Latency Results for the Static Deployment

As indicated in the graph, all three Ingress controllers achieved similar performance with a static deployment of the back‑end application. This makes sense given that they are all based on NGINX Open Source and the static deployment doesn’t require reconfiguration from the Ingress controller.

Latency Results for the Dynamic Deployment

The graph shows the latency incurred by each Ingress controller in a dynamic deployment where we periodically scaled the back‑end application from five replica Pods up to seven and back (see Back‑End Application Deployment for details).

It’s clear that only the NGINX Plus-based Ingress controller performs well in this environment, suffering virtually no latency all the way up to the 99.99th percentile. Both the community and NGINX Open Source‑based Ingress controllers experience noticeable latency at fairly low percentiles, though in a rather different pattern. For the community Ingress controller, latency climbs gently but steadily to the 99th percentile, where it levels off at about 5000ms (5 seconds). For the NGINX Open Source‑based Ingress controller, latency spikes dramatically to about 32 seconds by the 99th percentile, and again to 60 seconds by the 99.99th.

As we discuss further in Timeout and Error Results for the Dynamic Deployment, the latency experienced with the community and NGINX Open Source‑based Ingress controllers is caused by errors and timeouts that occur after the NGINX configuration is updated and reloaded in response to the changing endpoints for the back‑end application.

Here’s a finer‑grained view of the results for the community Ingress controller and the NGINX Plus-based Ingress controller in the same test condition as the previous graph. The NGINX Plus-based Ingress controller introduces virtually no latency until the 99.99th percentile, where it starts climbing towards 254ms at the 99.9999th percentile. The latency pattern for the community Ingress controller steadily grows to 5000ms latency at the 99th percentile, at which point latency levels off.

Timeout and Error Results for the Dynamic Deployment

This table shows the cause of the latency results in greater detail.

NGINX Open Source Community NGINX PlusConnection errors 33365 0 0Connection timeouts 309 8809 0Read errors 4650 0 0

With the NGINX Open Source‑based Ingress controller, the need to update and reload the NGINX configuration for every change to the back‑end application’s endpoints causes many connection errors, connection timeouts, and read errors. Connection/socket errors occur during the brief time it takes NGINX to reload, when clients try to connect to a socket that is no longer allocated to the NGINX process. Connection timeouts occur when clients have established a connection to the Ingress controller, but the backend endpoint is no longer available. Both errors and timeouts severely impact latency, with spikes to 32 seconds at the 99th percentile and again to 60 seconds by the 99.99th.

With the community Ingress controller, there were 8,809 connection timeouts due to the changes in endpoints as the back‑end application scaled up and down. The community Ingress controller uses Lua code to avoid configuration reloads when endpoints change. The results show that running a Lua handler inside NGINX to detect endpoint changes addresses some of the performance limitations of the NGINX Open Source‑based version, which result from its requirement to reload the configuration after each change to the endpoints. Nevertheless, connection timeouts still occur and result in significant latency at higher percentiles.

With the NGINX Plus-based Ingress controller there were no errors or timeouts – the dynamic environment had virtually no effect on performance. This is because it uses the NGINX Plus API to dynamically update the NGINX configuration when endpoints change. As mentioned, the highest latency was 254ms and it occurred only at the 99.9999 percentile.

Conclusion

The performance results show that to completely eliminate timeouts and errors in a dynamic Kubernetes cloud environment, the Ingress controller must dynamically adjust to changes in back‑end endpoints without event handlers or configuration reloads. Based on the results, we can say that the NGINX Plus API is the optimal solution for dynamically reconfiguring NGINX in a dynamic environment. In our tests only the NGINX Plus-based Ingress controller achieved the flawless performance in highly dynamic Kubernetes environments that you need to keep your users satisfied.

Appendix

Cloud Machine Specs

Machine Cloud Provider Machine TypeClient AWS m5a.4xlargeGKE-node-1 GCP e2-standard-32GKE-node-2 GCP e2-standard-32

Configuration for NGINX Open Source and NGINX Plus Ingress Controllers

Kubernetes Configuration

Notes:

This configuration is for NGINX Plus. References to nginx‑plus were adjusted as necessary in the configuration for NGINX Open Source.
NGINX App Protect is included in the image (gcr.io/nginx-demos/nap-ingress:edge), but it was disabled (the -enable-app-protect flag was omitted).