The Need for AI Infrastructure Solutions to Focus on GPU Optimization

F5 Ecosystem | July 11, 2024

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

Generative AI is accelerating the impact of AI on infrastructure. We had already entered an infrastructure renaissance, with technologists reviving an interest and admiration for the lowly network, compute, and storage layers of the data center. Mainly driven by the “death” of Moore’s Law and the emergence of edge computing, we were already seeing the rise of specialized processing units—xPUs—years ago.

Today, generative AI—and video gaming, to be fair—has made GPUs a household term and GPU optimization a new need.

That’s because GPUs are high demand and low in supply. Organizations are already shelling out—or planning to shell out—significant percentages of their overall IT budget on this powerful piece of hardware. And some of that investment is in their own infrastructure, and some goes to support public cloud infrastructure.

But it all goes to support availability of GPU resources for operating AI applications.

But as we look around, we find that the introduction of a new type of resource into infrastructure poses challenges. For years, organizations have treated infrastructure as a commodity. That is, it’s all the same.

And it largely was. Organizations standardized on white boxes or name brand servers, all with the same memory and compute capabilities. That made infrastructure operations easier, as there was no need in traffic management to worry about whether a workload ran on server8756 or server4389. They had the same capabilities.

But now? Oh, GPUs change all that. Now infrastructure operations need to know where GPU resources are and how they’re utilized. And there are signs that may not be going so well.

According to the State of AI Infrastructure at Scale 2024 “15% report that less than 50% of their available and purchased GPUs are in use.”

Now, it’s certainly possible that those 15% of organizations simply don’t have the load required to use more than 50% of their GPU resources. It’s also possible that they do and aren’t.

Certainly, some organizations are going to find themselves in that latter category; scratching their heads about why their AI apps don’t perform as well as users expect when they have plenty of spare GPU capacity available.

Part of it is about infrastructure and making sure that workloads are properly matched to required resources. Not every workload in an AI app needs GPU capacity, after all. The workload that will benefit from it is the inferencing server, and not much else. So that means some strategic architecture work at the infrastructure layer, making sure that GPU-hungry workloads are running on GPU-enabled systems while other app workloads are running on regular old systems.

That means provisioning policies that understand which nodes are GPU-enabled and which are not. That’s a big part of GPU optimization. It also means that the app services that distribute requests to those resources need to be smarter, too. Load balancing, ingress control, and gateways that distribute requests are part of the efficiency equation when it comes to infrastructure utilization. If every request goes to one or two GPU-enabled systems, not only will they perform poorly but it leaves orgs with “spare” GPU capacity they paid good cash money for.

It also might mean leveraging those GPU resources in the public cloud. And doing that means leveraging networking services to make sure data shared is secure.

In other words, AI applications are going to have a significant impact on infrastructure in terms of distributedness and in how its provisioned and managed in real time. There’s going to be increased need for telemetry to ensure operations has an up-to-date view of what resources are available and where, and some good automation to make sure provisioning matches workload requirements.

This is why organizations need to modernize their entire enterprise architecture. Because it isn’t just about layers or tiers anymore, it’s about how those layers and tiers interconnect and support each other to facilitate the needs of a digitally mature business that can harness the power of AI.

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Tags: Office of the CTO, 2024

About the Author

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

More blogs by Lori Mac Vittie

Featured Blog Posts

F5 accelerates and secures AI inference at scale with NVIDIA Cloud Partner reference architecture

Securing AI models and agents without compromise: How F5’s acquisition of CalypsoAI will deliver end-to-end AI runtime protection

Quantum ready: A practical guide to enabling PQC with F5

Related Blog Posts

F5 Ecosystem | 11/24/2025

Multicloud chaos ends at the Equinix Edge with F5 Distributed Cloud CE

Simplify multicloud security with Equinix and F5 Distributed Cloud CE. Centralize your perimeter, reduce costs, and enhance performance with edge-driven WAAP.

API,

F5 Ecosystem | 10/22/2024

At the Intersection of Operational Data and Generative AI

Help your organization understand the impact of generative AI (GenAI) on its operational data practices, and learn how to better align GenAI technology adoption timelines with existing budgets, practices, and cultures.

F5 Ecosystem | 12/19/2022

Using AI for IT Automation Security

Learn how artificial intelligence and machine learning aid in mitigating cybersecurity threats to your IT automation processes.

Office of the CTO,

2022

F5 Ecosystem | 02/24/2022

Most Exciting Tech Trend in 2022: IT/OT Convergence

The line between operation and digital systems continues to blur as homes and businesses increase their reliance on connected devices, accelerating the convergence of IT and OT. While this trend of integration brings excitement, it also presents its own challenges and concerns to be considered.

Office of the CTO,

2022

F5 Ecosystem | 10/05/2020

Adaptive Applications are Data-Driven

There's a big difference between knowing something's wrong and knowing what to do about it. Only after monitoring the right elements can we discern the health of a user experience, deriving from the analysis of those measurements the relationships and patterns that can be inferred. Ultimately, the automation that will give rise to truly adaptive applications is based on measurements and our understanding of them.

2020,

Office of the CTO

F5 Ecosystem | 12/23/2019

Inserting App Services into Shifting App Architectures

Application architectures have evolved several times since the early days of computing, and it is no longer optimal to rely solely on a single, known data path to insert application services. Furthermore, because many of the emerging data paths are not as suitable for a proxy-based platform, we must look to the other potential points of insertion possible to scale and secure modern applications.

2019,

Office of the CTO