Inference is now the center of operational gravity

Industry Trends | May 13, 2026

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

Insights from the 2026 State of Application Strategy Report

For years now, we’ve had the luxury of treating AI as something adjacent to the real system. It lived in pilots, proofs of concept, and carefully contained experiments where failure carried little consequence. That framing made sense when inference was sporadic and disconnected from the core application flow.

It no longer applies.

Inference has moved squarely into the request path, which changes everything. It now behaves like any other production workload, inheriting the same expectations around latency, availability, scalability, and security. And the data backs that up: roughly three quarters of organizations are already running AI-enabled applications, with more than 80% expecting AI to be embedded across their portfolios in the near term. At that point, AI stops being optional. It becomes part of the system.

Observability has shifted from hindsight to input

One of the clearest indicators of this transition is what has happened to observability. Historically, it functioned as a diagnostic tool. You collected telemetry, analyzed it, and used it to understand what had already occurred.

According to our 2026 State of Application Strategy Report, that role has changed and the numbers regarding the use of observability data make that unmistakable.

Alerting jumped from 41% in 2023 to 96% in 2026.
Automation driven by operational data rose from 51% to 97%.
Insights climbed from 33% to 97%.
Root cause analysis increased from 47% to 95%.
SLO and SLI reporting moved from 41% to 96%.

Those are not incremental improvements. That is near-universal adoption, and more importantly, near-universal dependency.

Observability is no longer just reporting on the system; it is feeding the system. Operational data now drives runtime decisions, shaping automation, influencing routing, and informing policy enforcement. In an inference-driven architecture, that is not a nice-to-have. It is a prerequisite.

Operational pressure concentrates around inference

If you want to understand where a system truly matters, follow the operational pressure. In AI architectures, that pressure converges around the layers that wrap inference.

From an operational perspective, respondents pointed to:

Prompt layer: 29%
Token and control layer: 23%
Model routing: 17%
Output moderation: 14%

Security professionals reported nearly identical priorities:

Prompt layer: 25%
Token and control layer: 23%
Output moderation: 19%
Model routing: 17%

That alignment matters. It shows that both operations and security teams recognize the same thing: the critical work happens at inference time.

This is where latency becomes visible, where cost accumulates, and where risk must be managed. Training pipelines and data preparation still matter, but they do not carry the same minute-by-minute operational weight. The system either responds correctly and quickly at inference time, or it does not.

Inference, in this context, is not a lifecycle phase. It is the workload.

Automation has already wrapped itself around AI

Another signal of maturity is how a workload is controlled. In the case of AI, that control is overwhelmingly programmatic.

Across use cases such as accelerating automation, adjusting policies, and generating suggestions, API-driven automation consistently ranks as the preferred approach. Even respondents who are not currently using AI for automation still prefer API-driven mechanisms.

That consistency reflects an expectation that these systems will operate with minimal human intervention. They are designed to act, react, and adapt in real time. Inference becomes operational the moment it is embedded within those automated loops.

We have already crossed that threshold.

The multi-model reality reinforces the shift

Compounding this is the fact that organizations are not operating a single model. Running multiple models in parallel (often five to seven) is now common. Most adaptation techniques remain lightweight, relying on prompt engineering, retrieval-augmented generation, embeddings, or limited tuning.

This introduces familiar operational challenges: selecting the right model for a given request, implementing fallback logic, managing versions, balancing cost and performance, and enforcing policy across heterogeneous endpoints.

At that point, the system begins to resemble any other distributed application. It requires routing, orchestration, and governance. Inference becomes something that must be delivered, managed, and secured, not simply invoked.

Inference has become the center of gravity

Taken together, the pattern is clear.

Inference sits directly in the request path.
Observability feeds it with real-time context.
Automation governs its behavior through APIs.
Multi-model architectures increase the need for routing and control.

That combination establishes inference as the operational center of gravity. Not as a forward-looking prediction, but as a present-day condition.

The industry continues to talk about operationalizing AI as though it is still ahead. The data suggests otherwise. Organizations have already integrated inference into their daily operations, and they are treating it with the same seriousness as any other critical component in the application stack.

The real question is not whether AI will become operational. It is whether organizations recognize that it already has, and whether they are applying the same discipline, rigor, and architectural thinking to inference that they have long applied to everything else they run.

To learn more, read the full 2026 State of Application Strategy Report.

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help

Tags: AI, Automation, Observability, Operations, The State of Application Strategy, Office of the CTO

About the Author

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

Lori MacVittie is a Distinguished Engineer and Chief Evangelist in F5’s Office of the CTO with deep expertise in application delivery, automation strategy, and infrastructure. She is known for turning complexity into clarity whether she’s defining guardrails for AI agents, dissecting brittle multicloud architectures, or probing the limits of scalable systems. She brings more than thirty years of industry experience across application development, IT architecture, and network and systems operations. Before joining F5, she served as an award-winning technology editor. MacVittie holds an M.S. in Computer Science and is a prolific author whose publications span security, cloud, and enterprise architecture. She is also an avid tabletop and video gamer with unapologetically strong opinions about cheese.

More blogs by Lori Mac Vittie

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help