The latest AI Pulse Survey by EY indicates 87% of senior leaders cite adoption hurdles, ranging from cybersecurity and data issues to missing policy and governance. Integrating generative AI (GenAI) into business requires both the desire and the ability to make it happen. The gap between the two is filled with operational complexity and demand for trust in this new technology.
Businesses have built and deployed software for decades, but GenAI presents a new challenge: control. Unlike traditional software, GenAI applications exercise autonomy and are non-deterministic. Hence, GenAI applications cannot be trusted to do only what was intended and never do anything unintended—at least not yet. Lack of control fetters full-scale adoption. The control challenge is better understood by examining the power of context, and that understanding is how businesses can expand traditional software design to adjust for GenAI.
With enough context, the uncertainty of GenAI applications becomes manageable because context bridges the gap between what GenAI applications can do and what the business intends. Context can be sourced from three domains: the GenAI application, the environment, and the business.
The context available inside GenAI applications differs significantly from that of traditional applications. Traditional applications pre-determine the allowed inputs as parameters for pre-defined operations. The same inputs will always produce the same results. In contrast, GenAI applications accept a much wider array of inputs into large language models (LLMs) which are, by design, non-deterministic in their outputs. The same inputs may or may not produce the same results.
Example of traditional application determinism:
This Python sample calculates a person’s age using birthdate and current date. Embellishments such as type handling and accounting for birthdates that have already been past in the current year are not included for brevity.
from datetime import datetime
from typing import Union
def calculate_age(birthdate:
str, current_date:
str) -> int
age = current_dt.year - birth_dt.year
return age
Given: birthdate = June 30, 1972, and current date = August 31, 2025
Age = 53 years.
Example of GenAI application non-determinism:
This application can suggest the probable age of a person using more generalized information without having to know the person’s birthdate or the current date.
Generative AI applications accept a wide variety of inputs for LLM inference.
Prompt: How old is a person who likes disco music and remembers who won Olympic Gold in the women’s 100M in Barcelona?
After being fed this precise prompt three separate times, the LLM answers given were as follows:
Response 1: 55 years old
Response 2: 40-60 years old
Response 3: 40s, 50s, or 60s
The Python function inside the traditional application accepts two inputs (birthdate and current date) and calculates the difference in years. It cannot deduce the age of a person or accept descriptions of the person’s preferences, habits, and so forth because it cannot synthesize or draw conclusions.
By comparison, the GenAI application can accept statements about popular music genres during childhood and common knowledge about major sporting events to deduce a response, but answers vary because small differences in the input, including rephrasing of a question or temperature settings (which affect the level of randomness in LLM responses), may vary. Even if the application code is unchanged between responses, responses may differ. This variation increases with the complexity of the prompt, the inclusion of additional input sources, and the behavior of routing components in the inferencing infrastructure.
Given this latitude, it is not hard to imagine cases where a GenAI application may deviate extremely far from its intended behavior without additional context. These two examples of context illustrate the extreme determinism of traditional software and the non-determinism of GenAI applications. Traditional software would not be expected to guess or predict a person’s age based on context, but GenAI applications are expected to tackle those types of non-deterministic questions routinely. If the application does not contain enough context, we must next look at the application environment.
The environment consists of the infrastructure and services surrounding the application. Back in July 1982 at the computer industry seminar CreativeThink, Alan Kay said, “People who are really serious about software should make their own hardware,” because he believed so strongly in the close tie between the two. I would like to extrapolate from that idea to include the deployment environment.
Applications do not live in a vacuum. All manner of infrastructure services combine to provide a place for applications to run. These services are external to the applications while simultaneously critical to their proper function, just like hardware. Taking this line of thought one step further, these infrastructure services are fastidiously monitored using a variety of observability solutions that collect and track rich telemetry data over time. This data describes how the application is performing and how the surrounding environment may affect it.
Environmental context can be broken down into two types: passive and active. All forms of telemetry (e.g., logs, metrics, traces, and events) can be considered passive because they are emitted and collected continually as infrastructure and applications operate. Alerts, notifications, and dashboards are active. They use rules to interpret the telemetry and communicate operational context. The tuning of rules brings operational meaning to the information, judging when tolerances from anywhere across the environment are being approached or exceeded.
GenAI applications challenge the use of rules approach because reliable telemetry does not yet exist to cover the seemingly infinite possible behaviors of an autonomous component. That is not to say innovation is not occurring. Two approaches are being attempted: Domain-based metrics for measuring GenAI success and LLM-as-a-Judge. The former uses alternate means to determine ground truth and then compare a GenAI result, capturing a tally of success and failure rates. The latter uses one LLM to judge the accuracy of another. Time will tell how far these divergent approaches go toward achieving true control over GenAI using environmental context.
Business intent is defined by the phrase, “Everything the business wants and nothing the business does not want.” It covers the situations application and environment might miss.
Recent news highlights a case where the GenAI component of an application was granted enough power to commit excessively damaging operations, such as deleting all customer records from a database. Here are some data points from that situation illustrating the gap between business intent and potential actions by GenAI applications.
What happened | The missing business context |
---|---|
AI coding tool deleted a production database during a code freeze. |
|
AI coding tool stated that database rollbacks were not supported and that no instances existed to restore from, but manual execution of a rollback did, in fact, work. |
|
This case study illustrates a few of the many potential gaps that arise between what GenAI can be allowed to do and what a business may intend for it to do.
Designing an application with a function to delete a record is completely reasonable, but allowing AI to delete all records without asking for permission is not. In traditional software, permission to delete records is restricted to reduce risk of business loss, but GenAI introduces autonomy that is not adequately controlled by role-based access controls. Business context helps here by outlining expected versus unexpected behaviors, which can be translated into specific requirements, including explicitly prohibited actions and required pauses for human intervention. It’s the full context that enables this degree of control.
Application architectures conceived prior to the arrival of GenAI are simply insufficient. The arrival of GenAI calls for a new architectural pattern that combines all three context domains together: the application, its environment, and the business intent. This explanation by Phil Schmid, a Senior AI Relation Engineer at Google DeepMind, is particularly helpful in understanding the vital difference brought by GenAI applications. It highlights that LLMs get context as part of their inputs, and it depicts how various GenAI application components have their own definition of, and use for, context.
Architecture and engineering for GenAI applications need to leverage all three context domains to produce outcomes that consistently align with business intent. Without a complete context picture, control is counterfeit, trust is non-existent, and full-scale adoption is unreachable.