Managing context windows in AI isn’t magic. It’s architecture.

Industry Trends | August 27, 2025

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

So let’s clear that up, because it turns out that understanding where context (session) lives in AI architectures is pretty dang important. And you know that because I used the word “dang” in a blog. Yeah, I really mean it.

If you’re building your own LLM-powered app and using the OpenAI API to do it, you’re not using ChatGPT. You’re talking to a model. A raw one. And if you think your app is going to behave like ChatGPT just because it’s calling GPT-4, you’re already off the rails.

ChatGPT isn’t a model; it’s a runtime.

People get this wrong all the time, because OpenAI branding hasn’t helped. The “ChatGPT” experience—the memory, the tone, the context, the way it gracefully tracks what you said six questions ago and doesn’t suddenly hallucinate a PhD in marine botany—isn’t magic. It’s architecture. And that architecture doesn’t come for free when you hit the API.

The ChatGPT app is a layered, managed experience. It stitches together:

Prompt engineering
Context management
Persistent memory
Guardrails and fallbacks
Summarization, truncation, and control logic

You don’t get any of that by default with the API. None.

When you call GPT-4 (or any other model, for that matter) directly, you’re feeding it a raw sequence of messages in a stateless format and hoping your context window doesn’t overflow or fragment along the way. It’s a blank model, not a digital butler.

The context window is on you.

In ChatGPT, the inference server manages the full context window. You type, it remembers (until it doesn’t), and you can rely on the app to mostly keep track of what’s relevant. There's a memory system layered on top, and OpenAI quietly curates which pieces get passed back in.

If you’re using the API? You are the inference server. That means:

You build the message stack.
You decide what to include.
You manage the token budget.
You truncate, summarize, or lose coherence.

And if you get it wrong (and you will, we all do early on) it’s on you when the model forgets the user’s name halfway through the session or starts inventing data you never fed it.

Memory isn’t a feature. It’s infrastructure.

ChatGPT has memory. You don’t. Not unless you build it. When you use the API, it’s stateless. You manage all context, memory, and flow. When you use the ChatGPT app, OpenAI does it all for you. It’s baked in.

The “memory” in ChatGPT isn’t just a note stuck to a prompt. It’s a system that stores user facts, preferences, goals, and constraints across sessions, and injects them contextually into the prompt at the right time. If you want something like that in your app, you’ll need:

A data store
Schema for memory entries
Control logic to determine when to include what
A way to keep it from bloating your token count

In other words: infrastructure.

This is why most homegrown AI apps feel brittle. Because they’re talking to an LLM like it’s a chatbot, not like it’s a tool in a system. The result? Lost threads, repetitive prompts, weird regressions, and user frustration.

So what?

If you’re building on the API, you’re building infrastructure whether you meant to or not. And if your users expect a ChatGPT-like experience, you’re going to have to deliver:

Persistent memory
Smart context compression
Turn-based state management
Guardrails, fallbacks, and injection logic

But that’s just the beginning. Because if you're building this on top of an enterprise-grade app delivery stack—and you are, right? RIGHT?—then the rest of the platform better show up, too.

This is where application delivery and security teams come into play.

App delivery: You’re not just serving HTML or JSON anymore

AI-native apps are stateful, chat-driven, and often real-time. Which means:

Session affinity suddenly matters again. You can't shard conversation state across stateless backends unless you're managing session tokens like it's 2009.
Latency is UX death. You're streaming multi-turn interactions, not serving static pages. Your load balancers and edge logic better know how to prioritize low-latency conversational flows.
Token cost = bandwidth + compute + cash. Efficient delivery matters now. Payload size isn’t just a network concern; it’s a billing event.

This isn’t a brochure site. This is a living interface. You can’t slap a CDN in front of it and call it a day.

Security: If you don’t own the prompt, you own the risk

LLMs are programmable via prompt. That means the prompt is now the attack surface.

Injection attacks are real, and they're already happening. If your input sanitization is lax, you’re letting users reprogram your AI on the fly.
Prompt manipulation can leak sensitive memory, derail workflows, or spoof intent.
Model output can be exfiltrated, manipulated, or poisoned if you don’t wrap it in policy enforcement and observability.

Security controls must evolve:

You need WAFs that understand AI traffic. Blocking bad JSON isn’t enough. You need guards on instruction sets, not just payload structure.
You need audit trails for LLM decisions, especially in regulated environments.
You need rate limiting and abuse protection, not just for API calls, but for prompt complexity and cost spikes.

And that’s not even touching the data governance nightmare that comes with injecting user records into context windows.

Build the right stack now before agentic AI runs amok.

Basically, ChatGPT is a polished product. The OpenAI API is a raw tool. Confuse the two, and your stack’s going to pop.

Build wisely. Build an enterprise grade app delivery and security stack to support your burgeoning AI portfolio. Trust me, it’s going to be even more important when you start building AI agents and then eagerly embrace agentic architecture. Because if you think context-drift and hallucinations are problems for end-user focused AI applications, just wait until you see what agentic AI can do to your business and operational AI applications.

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help

Tags: API, Application Delivery, Office of the CTO, AI

About the Author

Lori Mac VittieDistinguished Engineer and Chief Evangelist | F5

More blogs by Lori Mac Vittie

Featured Blog Posts

Introducing the CASI Leaderboard

Extranets aren’t dead; they just need an upgrade

Navigating higher education during a time of tightening budgets: How F5 can help

Related Blog Posts

Industry Trends | 02/18/2026

From packets to prompts: Inference adds a new layer to the stack

Inference is not training. It is not experimentation. It is not a data science exercise. Inference is production runtime behavior, and it behaves like an application tier.

Application Delivery,

The State of Application Strategy,

AI,

Thought Leadership,

Office of the CTO

Industry Trends | 02/02/2026

Compression isn’t about speed anymore, it’s about the cost of thinking

In the AI era, compression reduces the cost of thinking—not just bandwidth. Learn how prompt, output, and model compression control expenses in AI inference.

AI,

Application Delivery,

Office of the CTO,

Operations

Industry Trends | 01/07/2026

The efficiency trap: tokens, TOON, and the real availability question

Token efficiency in AI is trending, but at what cost? Explore the balance of performance, reliability, and correctness in formats like TOON and natural-language templates.

AI,

Application Delivery,

Office of the CTO

Industry Trends | 12/09/2025

Programmability is the only way control survives at AI scale

Learn why data and control plane programmability are crucial for scaling and securing AI-driven systems, ensuring real-time control in a fast-changing environment.

ADC,

AI,

API,

Application Delivery,

Automation,

Operations,

Office of the CTO

Industry Trends | 12/03/2025

The top five tech trends to watch in 2026

Explore the top tech trends of 2026, where inference dominates AI, from cost centers and edge deployment to governance, IaaS, and agentic AI interaction loops.

Industry Trends | 11/13/2025

The influence of inference: APIs, DPUs, and context chaos

2025,

Office of the CTO,

AI,

API,

Application Delivery,

Infrastructure,

The State of Application Strategy