F5 BIG-IP Next for Kubernetes joins the NVIDIA Enterprise AI Factory validated design

F5 Ecosystem | January 05, 2026

Ahmed GuetariVice President, Product Management – Service Provider | F5

At NVIDIA GTC and again at CES, NVIDIA has been clear about where enterprise AI is headed: AI factories—purpose-built, validated infrastructure and software stacks designed to deliver predictable performance, lower costs, and operational confidence for AI inference and training.

NVIDIA Enterprise AI Factory validated design brings this vision to life, combining NVIDIA accelerated computing, networking, software, and orchestration into a full-stack validated design that enterprises can deploy on-premises with confidence. As part of this architecture, NVIDIA continues to expand its ecosystem of validated partners that solve real, production-scale challenges across AI infrastructure.

Today, we’re excited to share that F5 BIG-IP Next for Kubernetes has been validated to run on NVIDIA RTX PRO Servers, featuring NVIDIA BlueField-3 DPUs, and is being included in the NVIDIA Enterprise AI Factory validated design. Customers, among the more than 20,000 organizations deploying F5 BIG-IP today, can seamlessly extend their trusted F5 BIG-IP capabilities as they deploy NVIDIA Enterprise AI factories.

“F5 BIG-IP Next for Kubernetes on NVIDIA BlueField delivers high-performance traffic management, security, and AI-aware controls into a validated, enterprise-ready solution.”

NVIDIA Enterprise AI Factory validated design is built around repeatable building blocks that ensure predictable performance, security, and scalability as enterprises move AI into production. Rather than assembling point solutions, the architecture defines how GPUs, DPUs, networking, and software work together as a cohesive system. F5’s inclusion in this validated design reinforces that approach.

In addition to design validation, we’ve completed detailed lab performance testing that quantifies the real-world performance and efficiency benefits of running BIG-IP Next for Kubernetes accelerated by NVIDIA DPUs across AI inference workloads. The results and methodology are available in our Validated Performance for AI Inference report.

Why AI factories need a full-stack approach

AI inference performance isn’t only just about faster GPUs. It’s also about the full stack that can optimize the systems. In production environments, networking, security, and traffic management increasingly determine how quickly tokens are generated, how consistently models respond, and how efficiently infrastructure is used.

As AI services scale across tenants, models, and users, enterprises need to effectively manage multiple elements. This is particularly critical for organizations deploying on-premises or sovereign AI environments, where performance, governance, and control cannot be delegated to public cloud services.

Enterprise needs include maximizing host CPUs that today must handle networking and security tasks, consistent latency for optimal time-to-first-token (TTFT), visibility and control over token usage, and the ability to enforce governance, fairness, and compliance.

This is where DPUs—and the software that runs on them—become foundational to the AI factory.

Offloading the right work to the right silicon

F5 BIG-IP Next for Kubernetes accelerated on NVIDIA BlueField-3 DPUs offloads critical networking and security functions from the host CPU to the DPU’s programmable ARM cores and hardware acceleration engines.

These offloaded services include load balancing and traffic steering, TLS termination and encryption, firewall and security policy enforcement, and API protection and intrusion detection.

By moving these functions onto the NVIDIA BlueField DPUs, host CPUs are freed for general-purpose workloads, while GPUs remain fully focused on AI inference.

The results are measurable and material—more than a 30% increase in token generation throughput and up to a 60% reduction in TTFT.

These gains translate directly into faster responses, higher model utilization, and improved infrastructure efficiency—exactly what enterprise AI factories demand.

Beyond offload: token governance and intelligent LLM routing

Offloading is only the starting point.

The programmable data plane within BIG-IP Next for Kubernetes, running on NVIDIA BlueField DPUs, enables advanced AI-aware services that go beyond traditional networking.

Token governance built in

BIG-IP Next for Kubernetes introduces native token governance capabilities that allow enterprises to count and track tokens per tenant, per user, or per model; enforce token rate limits and usage policies; and support compliance, chargeback, and fairness requirements.

As token-based pricing becomes the dominant economic model for AI, governance at the infrastructure layer is essential.

Intelligent LLM routing with NVIDIA NIM

Through integration with NVIDIA NIM microservices, BIG-IP Next for Kubernetes can dynamically route inference requests to the most appropriate model based on query complexity, performance requirements, or policy constraints.

This enables faster responses for simple queries, optimal model utilization across diverse workloads, and policy-driven routing aligned to cost, performance, and compliance goals.

Together, token governance and intelligent routing transform BIG-IP Next for Kubernetes from a networking component into a control plane for AI inference traffic.

A validated building block of the NVIDIA enterprise AI factory

NVIDIA RTX PRO Servers, complemented by NVIDIA BlueField-3 DPUs, represent the key accelerated computing platform components of this full-stack validated design for enterprise AI factories.

With BIG-IP Next for Kubernetes now validated on NVIDIA BlueField and included in the enterprise AI factory design, customers benefit from a production-ready, optimal networking and security layer to deliver deterministic performance for AI workloads. They also get improved GPU and CPU efficiency and built-in controls for token economics and governance.

This validation reinforces a shared vision between F5 and NVIDIA: AI infrastructure must be designed, not assembled.

Looking ahead

As enterprises move from AI experimentation to production AI factories, the infrastructure stack must evolve to support performance, efficiency, and governance at scale. As NVIDIA continues to evolve the Enterprise AI Factory, F5 will expand its role across traffic management, security, and AI-aware controls to support next-generation inference platforms at scale.

F5 BIG-IP Next for Kubernetes on NVIDIA BlueField DPUs delivers exactly that—combining high-performance traffic management, security, and AI-aware controls into a validated, enterprise-ready solution.

We’re proud to collaborate with NVIDIA as part of the Enterprise AI Factory ecosystem and look forward to expanding joint innovation and go-to-market efforts in the months ahead.

Learn more about the NVIDIA Enterprise AI Factory validated design and how F5 fits in.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Tags: NVIDIA, F5 Application Delivery and Security Platform (ADSP), F5 BIG-IP