AI App Delivery Top 10: Incomplete observability

Industry Trends | June 18, 2026

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

If cloud-native apps were a handful of garden hoses, AI inference and agent systems are an industrial sprinkler array controlled by a toddler with a fire helmet and a juice box. You think you know where the water is going, but you don’t. By the time you realize it, something expensive is already soaked.

Inference workloads and agent-driven execution don’t behave like traditional services. They don’t follow predictable request paths, they don’t repeat workflows consistently, and they don’t fail politely. They shape their own runtime, redirect based on partial results, retry when they think it’s wise, and consume resources in highly variable bursts. If your observability strategy is built on averages, rollups, or 30-second dashboards, that’s operational malpractice.

Impact on performance

Inference latency isn’t a single metric, and it isn’t tied to a single tier. It’s shaped by model selection, token count, GPU queue depth, upstream data access time, and whatever prompt gymnastics an agent performs mid-flight. Agents make this even more chaotic by branching, chaining, or retrying without announcing their plans.

If you can’t see performance at the level of intent, token generation, model pathway, and execution lineage, you will blame the wrong component, optimize the wrong tier, or scale the wrong resource. Meanwhile, users (human or automated) will simply conclude: “AI is slow.”

Impact on availability

Traditional health checks only answer, “Did it respond?” AI workloads need, “Did it respond within expectations for this task?”

Inference systems rarely go down; instead, they degrade into polite but unusable. Slow answers, stale cache hits, or quiet hallucinations can all look like success unless you have visibility into task type, expected latency floor, and quality thresholds. With agents piling on decisions, retries become invisible loops that look like normal traffic right up until something collapses.

The operational nightmare here isn’t downtime, it’s incorrect, incomplete, or context-inappropriate success.

Impact on scalability

You can’t scale what you can’t see, and you definitely can’t cost-optimize what you can’t attribute. AI capacity planning must account for:

Token throughput
Model concurrency and the cost of model switching
GPU saturation
Agent retry multiplication
Cost per call
Execution stretch across chained steps

Without that level of visibility, you oscillate between over-provisioning (financial regret) and under-provisioning (support calls, angry PMs, escalation war rooms). Neither proves maturity; both prove observability debt.

Best practices

Complete observability for AI isn’t about collecting more logs, it’s about capturing faster, richer, behavior-aligned telemetry that understands what the AI was trying to do, what it actually did, and what it consumed to get there. That requires instrumentation that can explain not only what happened, but why, under what assumptions, and at what cost.

This means tracing execution lineage for agent-assembled workflows, elevating cost and latency to first-class operational signals, correlating model behavior to intent rather than endpoints, and incorporating agent-declared expectations into observability metadata. Metadata evolves from “request info” to “runtime contract” with latency budgets, retry preferences, data-sensitivity labels, agent priority tiers, and cost ceilings included at the call level.

As this evolves, we will see the rise of a high-speed telemetry plane: a purpose-built, sub-millisecond data stream capable of capturing per-token, per-step, and per-agent signals without melting storage or threatening SLOs. This telemetry plane will likely sit close to the inference layer, support semantic compression, and provide continuous, low-latency insight into cost, capacity, health, and intent alignment. Without this, AI observability will either overwhelm existing pipelines or arrive too late to be useful.

Systems will need to ingest, correlate, and reason over this telemetry in near real time, not batch mode, and use it to drive adaptive routing, agent throttling, selection of lower-cost inference paths, and early detection of runaway workflows before they become outages or invoices.

Delivery is different when AI is in the room

AI doesn’t break cleanly, predictably, or loudly and incomplete observability ensures you won’t notice until the business does.

If intelligence is the new application tier, then observability must shift from “What happened?” to: “What was supposed to happen, why did we take this path, and was it worth the latency, risk, and cost?”

When you can answer that in real time, you’re ready for AI.

Read more about the Top 10 Application Delivery challenges faced by organizations across the globe.

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help

Tags: ADC Top 10, AI, Application Delivery, Observability, Office of the CTO

About the Author

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

Lori MacVittie is a Distinguished Engineer and Chief Evangelist in F5’s Office of the CTO with deep expertise in application delivery, automation strategy, and infrastructure. She is known for turning complexity into clarity whether she’s defining guardrails for AI agents, dissecting brittle multicloud architectures, or probing the limits of scalable systems. She brings more than thirty years of industry experience across application development, IT architecture, and network and systems operations. Before joining F5, she served as an award-winning technology editor. MacVittie holds an M.S. in Computer Science and is a prolific author whose publications span security, cloud, and enterprise architecture. She is also an avid tabletop and video gamer with unapologetically strong opinions about cheese.

More blogs by Lori Mac Vittie

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help