F5 Unleashes Innovation with Powerful New AI Capabilities on BIG-IP Next for Kubernetes on NVIDIA BlueField-3 DPUs

F5 Ecosystem | June 11, 2025

Business leaders know they need to put AI front and center. But that’s easier said than done. AI can be complex, expensive, and risky. And both the technology and the ecosystem are evolving rapidly.

First, there is a clear shift away from a one-size fits all approach. Predictive AI/ML, generative AI, and now agentic AI are all being adapted for specific industries and applications. As purpose-built AI models proliferate, the AI landscape is becoming increasingly diverse.

It’s now clear that AI applications require tailored infrastructure, not only optimized for performance, cost, and energy efficiency, but also able to keep pace with the rapidly evolving needs of AI models, applications and agents. A perfect example is Model Context Protocol (MCP), a powerful innovation that didn’t even exist just a few months ago.

As organizations race to take advantage of generative AI and increasingly AI agents, some are building their own dedicated data centers. Others are turning to specialized providers deploying cloud-scale infrastructures tailored to support multiple large language models (LLMs). Often called AI factories or Neoclouds, these platforms feature massive investments in accelerated computing, networking, and storage, all purpose-built to meet the intense performance and scale the demands of AI workloads.

Building sovereign, scalable AI and LLM inference infrastructure requires tackling four key challenges:

Latency and performance – Fast, responsive AI is essential, especially for interactive use cases. Nobody likes staring at a spinner waiting for an AI to think.
Data security and privacy – LLMs often handle sensitive data. Ensuring secure, private inference is critical and even more complex due to different security rules and compliance across cloud and on-premises environments.
Regulatory compliance – With AI expanding across industries, regulations like the European Union’s General Data Protection Regulation (GDPR) add strict rules around data use, model selection, transparency, and fairness. Navigating these is essential.
Model management and integration – AI models need ongoing management including versioning, monitoring, and updates, and they must integrate smoothly into existing systems. It's not plug-and-play, but protocols such as MCP are making it easier, despite the security challenges AI models face.

Deploying the best chip for the job

At F5, we are collaborating with NVIDIA, to help ensure AI factories and cloud-scale AI infrastructure rise to the demands of modern AI. Today, at NVIDIA GTC Paris 2025, we’re unveiling the next level of innovation with new capabilities for F5 BIG-IP Next for Kubernetes deployed on NVIDIA BlueField-3 DPUs. This builds on the enhanced performance, multi-tenancy, and security that we introduced at GTC San Jose 2025. Part of the F5 Application Delivery and Security Platform, F5 BIG-IP Next for Kubernetes runs natively on NVIDIA BlueField-3, powerful, programmable processors purpose-built for data movement and processing.

By offloading tasks like network processing, storage management, and security operations (e.g., encryption and traffic monitoring), DPUs free up valuable CPU cycles and GPU resources to focus on AI training and inference. This reduces bottlenecks, boosts performance, and improves latency, helping AI factories operate faster and more efficiently delivering more tokens.

Located on network interface cards, DPUs manage data flow across servers and between external customers/users/agents and the AI factory, orchestrating networking and security at scale. F5 BIG-IP Next for Kubernetes deployed on NVIDIA BlueField-3 DPUs became generally available in April.

Routing AI prompts to the right place for the right outcome

LLMs have advanced rapidly in recent months, now offering a wide range of sizes, costs, and domain-specific expertise. Choosing the right model for each prompt not only ensures better responses and regulatory compliance but also optimizes for resources consumption, cost. and latency.

With today’s integration of NVIDIA NIM microservices, organizations can now intelligently route AI prompt requests to the most suitable LLM or precisely to the right model for each task. For example, lightweight, energy-efficient models can handle simple requests, while more complex or large and specialized prompts are directed to larger or domain-specific models.

This approach allows AI factories to use computing resources more efficiently, reducing inference costs by up to 60%. It’s a win-win for both model providers and model users to have a better response, faster, and at better cost.

Less for more: Caching eliminates redundant computation and boosts token output

In addition to GPUs, NVIDIA continues to innovate at the software level to tackle key challenges in AI inference. NVIDIA Dynamo and KV cache, which are included with NVIDIA NIM, are great examples. NVIDIA Dynamo introduces disaggregated serving for inference, separating context understanding (prefill) that is GPU compute heavy from response generation (decode) that is memory-bandwidth heavy, across different GPU clusters. This improves GPU utilization and simplifies scaling across data centers by efficiently handling scheduling, routing, and memory management. KV cache optimizes how model context is stored and accessed. By keeping frequently used data in GPU memory and offloading the rest to CPU or storage, it eases memory bottlenecks allowing support for larger models or more users without the need for extra hardware.

A powerful new capability of BIG-IP Next for Kubernetes is its support for KV caching, which speeds up AI inference while reducing time and energy use. Combined with intelligent routing from NVIDIA Dynamo, based on few explicit metrics such as GPU memory usage and other criteria, this enables significantly lower time to first token (TTFT), higher tokens generation, and ultimately more prompt throughput. DeepSeek has shown gains of 10x to 30x in capacity.

Customers can use F5 programmability to extend and adapt F5 BIG-IP capabilities to meet their precise and unique needs at very high performance.

Operationalizing and securing MCP and for safe and sovereign agentic AI

For most organizations and particularly large ones, like financial services, telcos, and healthcare companies with complex legacy systems, agentic AI holds strong appeal. Built on LLMs, these AI agents can navigate complex databases, servers, tools, and applications to retrieve precise information, unlocking new levels of efficiency and insight.

Introduced by Anthropic in November 2024, MCP is transforming how AI systems interact with real-world data, tools, and services. Acting as standardized connectors, MCP servers enable AI models to access APIs, databases, and file systems in real time, allowing AI to transcend the limitations of static training data and execute tasks efficiently. As adoption grows, these servers require advanced reverse proxies with load balancing, strong security, authentication, authorization for data and tools as well as seamless Kubernetes integration, making MCP a key pillar of sovereign AI infrastructure and securing and enabling agentic AI.

Deployed as a reverse proxy in front of the MCP servers, BIG-IP Next for Kubernetes deployed on NVIDIA BlueField-3 DPUs can scale and secure MCP servers, verifying requests, classifying data, checking their integrity and privacy—thereby protecting both organizations and LLMs from security threats and data leaks. Meanwhile, F5 programmability makes it straightforward to ensure the AI application complies with the requirements of MCP and other protocols.

If token is the new currency, then let’s count it, govern it, and spend it wisely

In recent earnings announcements, some major organizations have begun disclosing the number of tokens generated each quarter, their growth, and the revenue tied to them. This reflects a growing need among our customers: the ability to track, manage, and control token usage just like a budget to avoid unexpected costs as happens sometimes with public clouds.

That’s why BIG-IP Next for Kubernetes now includes new capabilities for metering and governing token consumption across the organization. When customers ask, we listen and deliver with care.

Building secure, fast, sovereign and flexible AI factories

As industries develop AI factories and countries build their sovereign AI, AI agents are emerging and infrastructure, ecosystems and applications must be flexible and adaptable. Organizations that deploy AI efficiently will move faster, serve customers better, and reduce costs. But to realize this potential, AI must remain secure, scalable, and cost-effective without slowing the pace of innovation.

That’s where F5 comes in, Last March we delivered performance, multi-tenancy, and security. Now with BIG-IP Next for Kubernetes, we’re enabling innovation built to move at the speed of AI.

Our promise: More tokens per dollar, per watt. Try it and see the difference firsthand.

Attending GTC Paris 2025?

F5 is proud to be a Gold Sponsor of NVIDIA GTC Paris 2025. Visit us at Booth G27 to experience how the F5 Application Delivery and Security Platform supports secure, high-performance AI infrastructure, and attend our joint session with NVIDIA, Secure Infrastructure by Design: Building Trusted AI Factories, on Thursday, June 12 at 10:00 AM CEST.

To learn more about F5 BIG-IP Next for Kubernetes deployed on NVIDIA BlueField-3 DPUs, see my previous blog post. Also, be sure to read our press release for today’s announcement.

F5’s focus on AI doesn’t stop here—explore how F5 secures and delivers AI apps everywhere.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Tags: Kubernetes (k8S), 2025

About the Author

Ahmed Guetari

More blogs by Ahmed Guetari

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5