Enterprises are moving quickly from AI experimentation to AI operations. That shift changes what organizations need from application delivery infrastructure.
For years, Kubernetes traffic management was often treated as a technical implementation detail. Platform teams selected an ingress controller, application teams configured routes, and security teams added controls where they could. That model worked when the primary challenge was serving web apps and APIs. It is less sufficient when the same platform must now support modern applications, distributed APIs, security policy, and AI inference workloads that consume scarce and expensive compute.
The 2026 F5 State of Application Strategy Report shows why this matters now: 78% of organizations are operating AI inference themselves, and 52% are using multi-model orchestration to adapt and extend their AI models. In practical terms, enterprises are not just calling a single model through a single service. They are running inference across environments, coordinating multiple models, and making AI part of production application architecture.
AI brings new urgency to a long-standing enterprise challenge: how to give teams the speed they need while maintaining governance, security, and operational consistency.
F5 NGINX Gateway Fabric helps answer that question by bringing three important capabilities together: conformance with Gateway API 1.5, integration with F5 WAF for NGINX, and support for the Gateway API Inference Extension. Together, these capabilities point to a more standardized, secure, and AI-ready approach to Kubernetes traffic management.
“The gateway is no longer only about getting traffic into a cluster. It is becoming a control point for how modern applications, APIs, and AI services are delivered, protected, and operated at scale.”
Why standards matter
Gateway API is becoming an important standard for Kubernetes traffic management because it addresses a familiar enterprise challenge: too many teams solving the same problem in different ways.
Traditional ingress approaches often rely heavily on implementation-specific annotations and custom configuration. That can work for individual teams, but it becomes harder to govern at enterprise scale. Policies are difficult to audit. Migration paths become more complicated. Platform teams may struggle to offer self-service without giving up control. Application teams may move fast, but the operating model becomes fragmented.
Gateway API introduces a more role-oriented, Kubernetes-native model. Platform teams can manage shared gateway infrastructure. Application teams can define routing for their own services. Security and operations teams gain a clearer structure for applying policy. The value is not simply technical elegance. The value is a cleaner operating model.
NGINX Gateway Fabric conformance with Gateway API 1.5 is important in that context. Conformance does not mean every implementation is identical, nor does it remove the need for architectural decisions. But it does give enterprises more confidence that they are building on a standard that is maturing and being tested across the Kubernetes ecosystem.
The significance is straightforward. Standards reduce avoidable complexity. They make platform decisions easier to scale across teams and clusters. They reduce the risk of locking every workflow to a proprietary pattern. And they create a foundation for long-term modernization, where teams can adopt newer traffic management models without forcing every application to move at once.
That matters because Kubernetes is rarely a single-cluster story in large enterprises. Organizations often run multiple clusters across business units, regions, development stages, and cloud environments. Without a common approach, every cluster can become its own exception. Gateway API, implemented through NGINX Gateway Fabric, gives platform teams a path to offer consistency without slowing down application delivery.
Security where apps run
Security is often strongest when it is embedded into the platform rather than bolted on after deployment. That is especially true in Kubernetes, where application teams expect self-service and rapid release cycles.
The integration between NGINX Gateway Fabric and F5 WAF for NGINX brings enterprise application security closer to the Kubernetes gateway layer. With this integration, security teams can define and manage application protection policies while platform teams make those protections available through Kubernetes-native workflows.
The business value is not that every IT leader needs to understand the technical mechanics of how policy is attached. The value is that security teams can maintain ownership of protection policies while platform and application teams continue to move quickly.
That separation of responsibilities is critical. If security depends on every application team implementing controls correctly, the enterprise inherits inconsistent protection. If security slows every release, teams look for workarounds. A better model is shared governance: security teams define and manage policy, platform teams provide the enforcement point, and application teams keep delivering software.
This is particularly relevant as more customer-facing applications, APIs, and AI-enabled services move onto Kubernetes. The risk profile expands as the platform becomes more strategic. Web application attacks, misconfigurations, unmanaged routes, and inconsistent security posture are not only technical concerns; they become business continuity, compliance, and reputation concerns.
A gateway-level WAF model does not replace secure development practices or broader application security programs. But it gives organizations a practical enforcement layer close to where applications are delivered. That means security can become part of the platform architecture instead of a separate process that has to chase every deployment.
Preparing for AI inference
AI inference changes traffic management because AI workloads do not behave like traditional web requests.
A conventional web request is usually short-lived and relatively predictable. Inference requests can vary significantly in duration, payload size, cost, and resource impact. A single request can tie up expensive GPU capacity. Multiple models may be used for different tasks. Some models may be optimized for cost, some for performance, and some for accuracy. The routing decision can affect user experience, infrastructure utilization, and operating cost.
This is why the Gateway API Inference Extension matters. It enables model-aware routing so requests can be directed by model version, model type, and cost-performance profile. With NGINX Gateway Fabric support, organizations can improve user experience and infrastructure utilization while routing lower-value or latency-tolerant requests to more cost-efficient models.
The point is not the underlying scheduling mechanism. It’s that AI delivery needs operational maturity. When 78% of organizations are operating AI inference themselves, inference is no longer just a data science concern. It becomes part of the enterprise application delivery stack. And when 52% are using multi-model orchestration, traffic management must account for a more dynamic AI environment.
NGINX Gateway Fabric support for the Inference Extension helps platform and machine learning teams bring AI workloads into a familiar Kubernetes operating model. Rather than creating a separate delivery architecture for AI, organizations can begin to manage AI services alongside apps and APIs, using a standards-aligned gateway approach.
That does not mean every enterprise AI challenge is solved at the gateway. Model governance, data governance, security testing, cost management, and compliance still require broader controls. But routing is a foundational part of the AI production environment. If requests are not directed efficiently, if infrastructure signals are ignored, or if every team builds its own path to inference, complexity compounds quickly.
The most important shift is architectural: AI becomes another class of production workload that needs secure, observable, policy-driven delivery.
Modern app delivery for the AI era
NGINX Gateway Fabric should not be evaluated only as a Kubernetes networking component. It should be viewed as part of a broader platform strategy for modern application delivery.
The combination of Gateway API 1.5 conformance, F5 WAF for NGINX integration, and Inference Extension support matters because it connects three priorities:
- Standardization: Reduce fragmentation by aligning Kubernetes traffic management to a maturing community standard.
- Security: Apply enterprise application protection closer to where Kubernetes applications are exposed.
- AI readiness: Prepare the delivery layer for inference workloads that are more dynamic, resource-intensive, and business-critical than traditional services.
The practical benefit is a more consistent operating model. Platform engineering can provide shared infrastructure and self-service patterns. Application teams can move faster without inventing their own traffic model. Security teams can apply policy through established controls. ML teams can bring inference workloads into Kubernetes without treating AI delivery as a one-off architecture.
This matters because the next phase of AI adoption will not be measured only by how many pilots an organization launches. It will be measured by how reliably, securely, and efficiently AI-enabled applications run in production.
Many enterprises already have the ingredients: Kubernetes platforms, application delivery infrastructure, security programs, AI initiatives, and teams responsible for each. The challenge is bringing those pieces together in a way that scales. NGINX Gateway Fabric helps by giving organizations a standards-aligned gateway layer that can support traditional applications, APIs, security policy, and emerging AI inference patterns.
That is a practical value proposition. Organizations do not need to start over or rebuild their application delivery architecture from scratch. They can evolve Kubernetes traffic management toward a model that is more consistent, more secure, and better prepared for AI workloads. It gives them a path to evolve Kubernetes traffic management toward a model that is more consistent, more secure, and better prepared for AI workloads.
As AI moves deeper into enterprise applications, the infrastructure conversation has to move with it. The gateway is no longer only about getting traffic into a cluster. It is becoming a control point for how modern applications, APIs, and AI services are delivered, protected, and operated at scale.
To learn more, visit the F5 NGINX Gateway Fabric webpage.
Also, be sure to attend the F5 AI Summit , a three-hour virtual event Tuesday, June 23, where we’ll be talking in depth about delivering AI workloads on Kubernetes in the session, “Delivering AI on Kubernetes with F5 NGINX.”
About the Author

Related Blog Posts

Kubernetes-native WAF for the gateway era: F5 WAF for NGINX now integrates with F5 NGINX Gateway Fabric
F5 extends WAFs to deliver consistent, scalable protection across clusters and environments with F5 NGINX Gateway Fabric and F5 NGINX Ingress Controller.

From dashboard fatigue to operational excellence: Why XOps needs F5 Insight for ADSP
Learn how F5 Insight for ADSP lays the visibility foundation for XOps—turning fragmented signals across applications and infrastructure into actionable intelligence.

The hidden cost of unmanaged AI infrastructure
AI platforms don’t lose value because of models. They lose value because of instability. See how intelligent traffic management improves token throughput while protecting expensive GPU infrastructure.

Govern your AI present and anticipate your AI future
Learn from our field CISO, Chuck Herrin, how to prepare for the new challenge of securing AI models and agents.

F5 recognized as one of the Emerging Visionaries in the Emerging Market Quadrant of the 2025 Gartner® Innovation Guide for Generative AI Engineering
We’re excited to share that F5 has been recognized in 2025 Gartner Emerging Market Quadrant(eMQ) for Generative AI Engineering.
Self-Hosting vs. Models-as-a-Service: The Runtime Security Tradeoff
As GenAI systems continue to move from experimental pilots to enterprise-wide deployments, one architectural choice carries significant weight: how will your organization deploy runtime-based capabilities?
