F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

F5 Ecosystem | October 28, 2025

Ash BhalgatSenior Director of AI Networking and Security Solutions, Ecosystem, and Marketing

This blog post is jointly authored by Ahmed Guetari, Vice President of Product Management - Service Provider at F5 and Ash Bhalgat, Senior Director of AI Networking and Security Ecosystems at NVIDIA.

AI is entering an era where inference performance and security define success in delivering on customer expectations. In the evolving era of the token economy, AI infrastructure is no longer just about raw compute. It’s about orchestrating, securing, and scaling inference capabilities from cloud to edge data centers. Cloud operators building generative AI and inference platforms face an urgent need to maximize GPU efficiency, increase token capacity, reduce latency, and secure every layer of their AI infrastructure.

F5 addresses these challenges through scaling inference through the NVIDIA Cloud Partner (NCP) reference architecture. This essential blueprint defines how leading AI cloud providers design, build, and operate GPU-accelerated infrastructure. The reference architecture integrates best-in-class technologies spanning compute, networking, storage, and security, to ensure NVIDIA Cloud Partners can deliver reliable, high-performance AI services at scale.

“By aligning F5’s capabilities with the NVIDIA Cloud Partner reference architecture, we’re helping to formalize a new ledger—one where token throughput, cost-per-token, latency, power efficiency, and security are first-order citizens.”

Through this collaboration, F5 BIG-IP now plays a critical role in enabling secure, high-throughput inference within the NVIDIA ecosystem.

F5 is tightly integrating networking, security, and application delivery capabilities to power intelligent, token-driven AI platforms with NVIDIA. Redefining what it means to run AI at scale, delivering the speed, protection, and intelligence required to power the new token economy.

NCP Reference Architecture: A foundation for the AI cloud

The NCP reference architecture provides a comprehensive framework for deploying AI clouds. It combines NVIDIA accelerated computing, networking, and software capabilities with complementary technologies from leading ecosystem partners, delivering high-performance, scalable, and secure AI solutions in the cloud.

As part of this architecture, F5 BIG-IP brings advanced traffic management, zero trust security, advanced services and observability to GPU-powered AI workloads, helping NCPs deploy, scale, and secure inference services with confidence.

The rise of the token economy and the NCP reference architecture

For AI cloud providers and enterprises, tokens are the new currency—measured by throughput, total latency, time to first token, energy efficiency, and cost per token. Success hinges on the infrastructure connecting users to GPU clusters. To support high-performance AI services, traffic routing, protection, observability, multi-tenancy, and policy enforcement must all operate at line rate without introducing bottlenecks.

The NCP reference architecture codifies this blueprint. It defines how sovereign clouds and AI clouds should interconnect compute, networking, storage, telemetry, and security. NVIDIA BlueField-3 DPUs serve as the linchpin for north-south traffic in these architectures.

F5 joins this ecosystem as a first-class infrastructure enabler, embedding networking, security, and AI-aware control directly into the reference fabric.

F5: Accelerating NCP AI clouds for performance, security, and efficiency

F5 has already produced great results by integrating NVIDIA technology. In April 2025, F5 announced the general availability of F5 BIG-IP Next for Kubernetes accelerated with NVIDIA BlueField-3 DPUs. This solution offloads network processing, security enforcement, and traffic intelligence onto the DPU, freeing CPUs for business applications. SoftBank, an NCP that recently underwent cloud proof of concept (PoC) testing, obtained outstanding performance results.

Beyond raw performance, F5 BIG-IP’s capabilities align tightly with the operational demands of NCPs such as unified ingress/egress policy control, service mesh, distributed denial-of-service (DDoS) mitigation, zero trust enforcement, API protection, workload isolation, and multi-tenant observability, all in one pass.

In our engagement, we’ve refined large language model (LLM) routing logic, token-aware traffic metering and governance, and support for Model Context Protocol (MCP), bringing more control and intelligence into the data path itself.

By supporting the NCP reference architecture, BIG-IP now becomes an anchored component of how AI clouds should be built, deployed, and governed.

Accelerating AI throughput: 30% more tokens, 60% faster TTFT

Early validation results are highly compelling. When F5 BIG-IP services are deployed alongside the NVIDIA accelerated computing platform, token generation increases by over 30%, while time to first token (TTFT) drops by 60%.

These gains translate into longer, more context-aware responses, faster inference cycles, and a 30% reduction in cost per token. Combined with higher tokens per watt, this integration enhances both performance and energy efficiency, key factors in the new economics of AI.

For cloud customers, the benefits of running on NCPs are substantial: faster time-to-value, lower operational costs, and improved user experiences across every deployed model.

These gains reflect more than incremental improvements; they signal a structural uplift in how AI cloud can deliver, protect, and monetize services.

Three pillars of F5’s impact within the NCP reference architecture

F5’s inclusion in the NCP reference architecture will lead to performance gains, enhanced security, and expanded functionality for customers deploying AI services, including the following key areas:

1. Performance gains: Built for the token economy: F5 BIG-IP integrates seamlessly with NVIDIA GPUs, DPUs, networking fabrics, and software platform components (NVIDIA Dynamo, NVIDIA NIM) to maximize AI inference throughput. The F5 solution optimizes prompt routing, load balancing, and inference efficiency, eliminating bottlenecks, and delivering more tokens per second per cluster.

In an environment where every token contributes to revenue and model accuracy, these gains enable higher productivity, faster responses, and greater profitability. The collaboration between F5 and NVIDIA unlocks the full potential of accelerated infrastructure in the emerging token economy, where efficiency and performance directly drive business outcomes.

2. Security reinvented for AI inference: AI inference at scale introduces new attack surfaces, from data exposure to abuse of model endpoints. F5 brings its deep expertise in application security and policy enforcement directly into NVIDIA’s reference architecture, providing multi-layered protection that fortifies GPU clouds from the core to the edge. This includes:

Advanced DDoS defense and adaptive traffic shielding for inference APIs.
Zero trust enforcement and policy-based identity control across environments.
Full Layer 7 firewalling and MCP security for end-to-end traffic integrity.
Comprehensive data protection to meet compliance and privacy standards globally.

The result is a trusted AI fabric where performance and protection coexist, allowing NCP customers to scale confidently without compromising safety or compliance.

3. Expanded functionality and AI-native control: Beyond throughput and security, the F5-NVIDIA collaboration introduces new value-added services that extend control, visibility, and intelligence across GPU clusters. Capabilities such as LLM routing, granular token governance, adaptive observability, and context-aware traffic steering enable AI operators to optimize workload placement and resource usage in real time.

These features bring enterprise-grade application delivery to the AI domain, ensuring every token, every model, and every user interaction is handled with precision and efficiency. It’s how F5 turns complexity into control, bridging traditional applications with the fast-evolving world of generative AI.

A new ledger for AI infrastructure

By aligning F5’s capabilities with the NCP reference architecture, we’re helping to formalize a new ledger—one where throughput, cost-per-token, latency, power efficiency, and security are first-order citizens in AI infrastructure design.

AI clouds built on this foundation can scale more predictably, monetize more transparently, and adapt more swiftly as models and workloads evolve. NCPs, enterprises, and sovereign clouds now have a validated reference stack to lean on, one that delivers both trust and performance.

F5 isn’t simply joining the NVIDIA ecosystem. Together, we advance the token economy, where every microsecond, watt, and token counts. As a strategic enabler of secure, high-performance, and economically scalable AI infrastructure, F5 is collaborating with NVIDIA to help customers generate tokens more efficiently, faster, safer, and smarter.

We look forward to working with you—our customers, partners, and fellow innovators—to build the next generation of AI services without compromise.

To learn more, visit our F5 and NVIDIA webpage.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5