BLOG

How F5 NGINX Plus Powers AI Clusters

Liam Crilly Thumbnail
Liam Crilly
Published July 03, 2025

Over the past decade, NGINX Open Source has been one of the most widely used web servers in the world and a top application delivery solution by market share. It has helped load balance and reverse proxy everything from small startups and academic research projects to some of the world’s largest web applications.

Just as it became a default choice for application delivery, NGINX has quietly become a critical linchpin in training and serving AI applications. Leading AI frameworks, toolkits, libraries, and platforms—such as Intel OpenVINO Model Server, NVIDIA Morpheus, Meta’s vLLM, NVIDIA Triton, and others—ship with native configurations for F5 NGINX Plus (and NGINX Open Source) to handle gRPC/HTTP proxying , SSL/TLS termination, health-check-aware load balancing and dynamic reconfiguration out of the box. Numerous AI services and solutions that run on Kubernetes clusters list F5 NGINX Ingress Controller as one of their preferred options for managing traffic in and out of the AI clusters, both for model training and inference. Peel back the covers and you’ll find it running almost everywhere you find AI.

Across a wide array of AI use cases, NGINX is a key enabler in the AI stack. Whether you're fine-tuning foundation models, streaming token outputs from LLMs, or routing requests to real-time anomaly detection endpoints, chances are NGINX is already in the path.

Why AI teams choose NGINX Plus

  • Kubernetes-native ingress: Most AI platforms today run on Kubernetes, and NGINX remains a default or preferred ingress in tools like Run:ai, KServe, and Ray Serve. As AI apps expand to hybrid, multi-cloud, and edge environments, the NGINX Gateway Fabric offers a Kubernetes-native implementation of the Gateway API with lightweight footprint and fine-grained traffic control—giving AI teams better command over routing, retries, and observability without adding mesh complexity.
  • Dynamic rollouts at scale: AI inference workloads often involve high-value, GPU-bound sessions that require careful versioning and zero downtime. NGINX supports dynamic configuration reloads, weighted traffic splitting, and active health checks—allowing teams to roll out new model versions safely without disrupting in-flight sessions or overwhelming GPU queues.
  • Production-ready API handling: Model servers like Triton, vLLM, and OpenVINO rely on gRPC or HTTP/2 for fast, structured communication. NGINX brings mature, high-performance support for these protocols, along with connection reuse, session stickiness, TLS termination, and request buffering—all essential for handling bursty or long-lived AI inference traffic.
  • Operational control: NGINX Plus unlocks advanced features like RESTful configuration updates, live upstream management, and enterprise-grade web application firewall (WAF). For teams managing dozens or hundreds of NGINX instances across clusters, F5 NGINX One adds a centralized console for managing configurations, health, and security policies—ideal for teams supporting multiple model types or AI use cases with different access and risk profiles.
  • F5 AI Gateway: Purpose-built for AI workloads, the AI Gateway extends NGINX with a security-first approach to AI traffic. It includes customizable protections against prompt injection and toxic output as well as rate limiting and usage quotas to help prevent scraping, flooding, or runaway queries in GPU-constrained environments. Different security rules can be applied to different inference routes—for example, using stricter policies for generative models while keeping vector APIs more permissive. All traffic can be logged at a token or request level, feeding into observability pipelines and supporting audit requirements.

Major AI frameworks, tools, and managed services integrate NGINX

NGINX is one of the default ingress choices for many of the leading AIOps stacks, tools, and managed services.

AI framework

How NGINX is used 

Practical benefit 

Intel OpenVINO Model Serve A demo by F5 and Intel deploys model shards behind NGINX Plus (YouTube) One gateway can route to CPU, GPU, or VPU back   ends.
NVIDIA Triton Helm chart installs Triton with NGINX Plus Ingress for gRPC access (GitHub) HTTP/2 multiplexing keeps GPU utilization high.
NVIDIA Morpheus  "How I Did It” guide secures Morpheus through NGINX Plus  Ingress (F5 Community) TLS offload and adaptive WAF in front of real-time security inference.  
NVIDIA (XLIO) Guide to deploying NGINX over NVIDIA Accelerated IO (XLIO) (docs.nvidia.com) Enhanced TLS offload and performance tuning, including build instructions with OpenSSL support and sample files.  
Meta vLLM  Official docs explain balancing multiple vLLM instances via NGINX (vLLM) Quick horizontal scaling for text-generation endpoints.

MLOps teams are able to drop in NGINX products for the same reasons that teams managing microservices and APIs (both of which are essential in AI deployments) have adopted NGINX. It’s lightweight, modular, portable, and handles high token volumes across a wide variety of environments. AI developers and machine learning engineers can deploy NGINX as part of standing up their common AI recipes, pulling in a container image configured by their platform or MLOps team. NGINX integrates with hardware acceleration across most common platforms and processor architectures.

AI components that list NGINX as a default option span the full spectrum of AI infrastructure, from low-level GPU scheduling to high-level model serving, deployment orchestration, and enterprise-grade governance. Together, they demonstrate how NGINX supports a wide range of use cases: securely routing traffic to inference endpoints, enabling scalable and efficient model delivery, managing multi-tenant cluster access, and enforcing operational policies around version control, auditing, and regulatory compliance.

  • KServe: Deployment guides assume an existing NGINX Ingress Controller domain for routing inference services.
  • Ray Serve: Documentation includes instructions for configuring NGINX Ingress Controller to expose dashboards and model endpoints.
  • Seldon Core v2: Production deployment chapters describe setting up NGINX Ingress Controller via Helm, including for canary and shadow traffic scenarios.
  • Run:ai: Prerequisites list NGINX as a validated ingress controller for multi-tenant GPU-sharing Kubernetes clusters.
  • AWS SageMaker: Documentation provides examples using NGINX and Gunicorn to front custom inference containers.
  • Azure AKS: Microsoft offers a managed NGINX Ingress Controller as a built-in, out-of-the-box option for in-cluster ingress traffic.
  • DataRobot: Installation instructions recommend using the NGINX Ingress Controller (v4.0.0+) for path-based routing to Portable Prediction Servers on EKS.

NGINX offers a paved path to MLOps

Collectively, these platforms and tools span the full spectrum of AI infrastructure—from low-level GPU scheduling to high-level model serving, deployment orchestration, and enterprise-grade governance. Together, they demonstrate how NGINX supports a wide range of use cases: securely routing traffic to inference endpoints, enabling scalable and efficient model delivery, managing multi-tenant cluster access, and enforcing operational policies around version control, auditing, and regulatory compliance. The list is expanding and we’re excited to see what the next generation of AI-native companies builds with NGINX.

Get help scaling your AI with F5 NGINX One.