Helping AI cloud providers reduce costs while driving profits

F5 Ecosystem | December 17, 2025

AI has moved from an interesting experiment to a core business capability in a very short time. To keep pace, cloud providers are building out GPU-as-a-Service (GPUaaS) and Large Language Model-as-a-Service (LLMaaS) offerings that form the backbone of modern AI factories. These environments turn data, compute, and models into high-value outcomes for customers.

These offerings dramatically simplify AI adoption. But they also expose providers to new financial and operational pressures. It is one thing to deploy AI infrastructure. It is another to run it profitably at scale with predictable economics. GPU costs, token-based monetization challenges, latency concerns, and multi-tenant security all converge into a complex business equation.

The cloud AI providers who achieve sustainable economics will be the ones who can now begin architecting for scale.

This is where F5 BIG-IP Next for Kubernetes, especially when deployed on NVIDIA BlueField-3 DPUs, has begun to reshape how AI clouds operate. The platform gives providers a structured way to grow revenue, reduce cost per token, and run complex AI environments more efficiently.

The shifting economics of AI factories

GPUaaS and LLMaaS took off because they remove huge barriers to entry.

GPUaaS lets customers train and fine-tune models without investing in GPU clusters or specialized infrastructure. Healthcare providers, industrial companies, and enterprises can pursue high-value AI projects such as diagnostic imaging or predictive automation without becoming infrastructure operators.

LLMaaS does the opposite. It removes the need to train models entirely. Customers integrate pre-trained models like Gemini or GPTs into their applications through APIs and focus on building experiences such as chatbots, copilots, or real-time analytics.

These models reduce friction for customers but concentrate the complexity on the provider. The economics become highly sensitive to GPU utilization, latency (especially time-to-first-token), token metering accuracy, multi-tenant security, and operational overhead.

This is where the architecture directly shapes profitability.

Driving revenue through more tokens

Revenue for GPUaaS and LLMaaS scales with consumption. The more tokens processed and workloads served, the higher the revenue. That only happens if the platform delivers fast, predictable experiences.

BIG-IP Next for Kubernetes improves revenue potential by keeping GPUs focused on AI computation rather than networking, routing, or inspection tasks. Offloading these functions to NVIDIA BlueField-3 DPUs frees compute cycles and reduces overhead on GPUs, allowing them to serve more workloads.

Testing has shown that this architecture can generate more than 30% additional tokens from the same GPU infrastructure and reduce time-to-first-token significantly.

Those improvements directly impact customer experience and consumption. When LLM responses feel fast and interactive, users naturally do more with the service. When throughput increases, the same GPUs deliver more billable activity. This is how AI services grow revenue without requiring more hardware.

Reducing costs by improving GPU utilization

GPUs are the financial core of any AI cloud. Every inefficiency, idle gap, or CPU bottleneck translates directly into higher cost per token and lower margins.

BIG-IP Next for Kubernetes helps reduce costs by shifting significant networking, security, and traffic processing onto the NVIDIA BlueField-3 DPUs. This eliminates overhead that would otherwise consume CPU and GPU cycles. It also improves traffic patterns and workload routing so providers can operate closer to optimal GPU utilization.

As a result, providers can achieve meaningfully lower cost per token and significantly higher efficiency across their AI clusters.

Improved GPU utilization compounds this effect. The more consistently GPUs are fed work, the lower the cost per customer and the higher the overall revenue generated per GPU hour. Architecture directly shapes economics.

Increasing operational efficiency

AI clouds must operate across multiple clusters, regions, GPUs, and model types. The complexity grows quickly and can overwhelm teams without the right control points.

BIG-IP Next for Kubernetes provides secure multi-tenancy, strong segmentation, API protection, and deep observability across Kubernetes environments. These capabilities give providers clear control over how workloads behave, how tenants are isolated, and how policies are enforced.

For LLMaaS specifically, BIG-IP Next for Kubernetes enables token-aware routing, improved cache efficiency, and consistent traffic handling across different GPUs and models. This makes inference workflows faster, more efficient, and more predictable.

Operationally, this means providers onboard tenants faster, troubleshoot issues more quickly, and scale their infrastructure without multiplying overhead. All of this drives better economics and a more sustainable service model.

Turn AI factories into sustainable businesses

Demand for AI is growing faster than most providers can build infrastructure. The long-term winners will not simply be the platforms with the biggest clusters. They will be the ones that grow revenue through performance, reduce costs through efficiency, and run operations with clarity and control.

BIG-IP Next for Kubernetes, especially when used with NVIDIA BlueField-3 DPUs, is designed specifically for that business reality. It improves throughput, reduces latency, strengthens multi-tenancy, optimizes GPU efficiency, and lowers cost per token in ways validated through testing.

AI workloads are not slowing down. The providers who achieve sustainable economics will be the ones who can now architect for scale.

This joint F5-NVIDIA architecture offers a clear path toward that outcome.

Learn how F5 and NVIDIA are laying the technical foundation for the next era of intelligent, goal-driven AI systems.

Share

About the Author

Scott Calvet
Scott CalvetDirector, Product Marketing

More blogs by Scott Calvet

Related Blog Posts

Build a quantum-safe backbone for AI with F5 and NetApp
F5 Ecosystem | 12/09/2025

Build a quantum-safe backbone for AI with F5 and NetApp

By deploying F5 and NetApp solutions, enterprises can meet the demands of AI workloads, while preparing for a quantum future.

F5 ADSP Partner Program streamlines adoption of F5 platform
F5 Ecosystem | 11/19/2025

F5 ADSP Partner Program streamlines adoption of F5 platform

The new F5 ADSP Partner Program creates a dynamic ecosystem that drives growth and success for our partners and customers.

Accelerate Kubernetes and AI workloads with F5 BIG-IP and AWS EKS
F5 Ecosystem | 11/17/2025

Accelerate Kubernetes and AI workloads with F5 BIG-IP and AWS EKS

The F5 BIG-IP Next for Kubernetes software will soon be available in AWS Marketplace to accelerate managed Kubernetes performance on AWS EKS.

F5 NGINX Gateway Fabric is a certified solution for Red Hat OpenShift
F5 Ecosystem | 11/11/2025

F5 NGINX Gateway Fabric is a certified solution for Red Hat OpenShift

F5 collaborates with Red Hat to deliver a solution that combines the high-performance app delivery of F5 NGINX with Red Hat OpenShift’s enterprise Kubernetes capabilities.

F5 Silverline Mitigates Record-Breaking DDoS Attacks
F5 Ecosystem | 08/26/2021

F5 Silverline Mitigates Record-Breaking DDoS Attacks

Malicious attacks are increasing in scale and complexity, threatening to overwhelm and breach the internal resources of businesses globally. Often, these attacks combine high-volume traffic with stealthy, low-and-slow, application-targeted attack techniques, powered by either automated botnets or human-driven tools.

Phishing Attacks Soar 220% During COVID-19 Peak as Cybercriminal Opportunism Intensifies
F5 Ecosystem | 12/08/2020

Phishing Attacks Soar 220% During COVID-19 Peak as Cybercriminal Opportunism Intensifies

David Warburton, author of the F5 Labs 2020 Phishing and Fraud Report, describes how fraudsters are adapting to the pandemic and maps out the trends ahead in this video, with summary comments.

Deliver and Secure Every App
F5 application delivery and security solutions are built to ensure that every app and API deployed anywhere is fast, available, and secure. Learn how we can partner to deliver exceptional experiences every time.
Connect With Us
Helping AI cloud providers reduce costs while driving profits | F5