AI Security Beyond the Model: What Enterprises Need to Care About — and Why

AI is moving fast, and most enterprises are securing only a fraction of their real risk surface. This NSS Labs research report, developed in collaboration with F5, AWS, and Microsoft, examines the threat vectors, governance gaps, and capability areas that matter most when AI moves from experimentation into production.

Authors

Bob Walder - Senior Analyst, NSS Labs
lan Foo - Chief Technology Officer and EVP of Product, NSS Lab

Contributors

This research was developed in collaboration with security, cloud, and AI leaders across the industry.

Cameron Delano - Sr. Solutions Architect, F5
Jeanette Hurr
- Global Solutions Architect, F5
Riggs Goodman III - Principal Solutions Architect, AI Security and Privacy, AWS
Raj Bagwe - Sr. Solutions Architect, AWS
Zachary Riffle - Security Architect, Microsoft

Executive Summary

Artificial intelligence has moved rapidly from experimentation to production deployment across enterprise environments. Generative AI models, conversational assistants, and increasingly autonomous agent-based systems are now embedded in customer-facing applications, internal workflows, software development pipelines, and decision-support processes. As a result, failures in AI systems already represent tangible business, legal, reputational, and operational risks, as well as a significant challenge to effective governance.

Much of the focus around AI security is on the AI models themselves, concentrating on how they are trained, what data they contain, and how they behave under controlled conditions. While security within the AI model is clearly very important, it addresses only a fraction of the risk surface. In production environments, the most consequential vulnerabilities will often appear outside of the model: in the data to which it is connected, the tools it can invoke, the permissions it inherits, and the governance practices (or lack of them) that constrain its behavior.

Failures in AI systems already represent tangible business, legal, reputational, and operational risks, as well as a significant challenge to effective governance.

This paper outlines those areas which should concern enterprises the most when evaluating AI security solutions and why those concerns matter. Without focusing on specific products or test mechanics, this paper presents a capability-driven view of AI security framed through the lenses of enterprise risk management, governance, and accountability. The aim is to provide concrete guidance to Chief Information Security Officers (CISOs), enterprise buyers, and Governance, Risk and Compliance (GRC) leaders on the questions to ask before real-world AI failures expose unwelcome answers under regulatory, legal, customer, or board-level scrutiny.

Introduction: Why AI security is an enterprise risk issue

Use of AI within enterprise systems is accelerating, providing significant challenges for those tasked with implementing the controls designed to manage it. Business units are deploying AI-enabled capabilities to improve productivity, automate decisions, accelerate software development, and differentiate customer experiences, and many of these systems are deployed well ahead of formal security, risk, or governance review. This pace creates a widening gap between innovation and accountability.

Securing the AI model itself is essential, but it is insufficient on its own. Most enterprises already apply controls directly at the model layer: curating training data; using safety tuning and alignment techniques; restricting who can access the model; rate-limiting usage; and relying on built-in content filters and refusal behavior. These measures shape what the model produces and reduce obvious misuse, but they do not address how AI systems operate in real enterprise environments where outputs interact with live data, trigger actions, and operate within complex authorization contexts.

In the context of AI security, Responsible AI (RAI) is the overarching governance and lifecycle framework, policies, and technical practices used to ensure artificial intelligence systems are safe, secure, resilient, and trustworthy throughout their lifecycle. It moves beyond just stopping attacks, to ensuring systems are robust against unintended harms, privacy breaches, and biases, while maintaining human oversight.

Core Pillars of Responsible AI in Security

When viewed specifically through a security lens, the RAI framework translates into several practical control domains: ensuring AI systems are resilient to errors, unexpected inputs, and intentional adversarial attacks; securing the training and deployment pipeline to ensure privacy and data protection; maintaining transparency and auditability; and establishing governance processes that define acceptable use and accountability.

Within the RAI framework, model-centric security capabilities proactively mitigate specific vulnerabilities that are unique to machine learning, such as:

  • Data poisoning: manipulating training data to corrupt the model
  • Model inversion/extraction: stealing proprietary models or leaking sensitive training data
  • Adversarial attacks: injecting small, invisible changes to inputs to force wrong decisions
  • Prompt injection: tricking generative AI into bypassing intended policy and instruction boundaries

These model security controls are necessary for baseline safety, but they operate largely without enterprise context and are typically opaque to independent or external validation. Enterprises often have limited visibility into how the controls are implemented or how they behave under edge conditions.

Within enterprise-level deployments, the AI model is only one component in a larger system that includes retrieval layers, connectors to enterprise data, identity and authorization signals, and tool invocation that can change state dramatically in downstream systems. The enterprise risk surface is dominated by these integrations and the surrounding AI system components. A well-aligned model can still be used unsafely if it is connected to the wrong data, if it is given excessive tool permissions, if policy enforcement is inconsistent, or if monitoring is insufficient to detect misuse.

This is where external runtime AI security controls (variously described as AI guardrails, AI firewalls, or AI protection systems) become essential. Runtime guardrails are specific, technical, real-time controls implemented outside of the model to enforce security policies and behavioral boundaries at the moment of application interaction where enterprise risk materializes. These controls observe and constrain AI behavior using system-level context, including user identity, data sensitivity, application state, and governance requirements. Runtime controls are where auditability, resilience, and accountability are introduced, forming the foundation of enterprise-grade AI security.

Model security reduces intrinsic model risk; runtime guardrails manage enterprise risk at the point of interaction.

Terminology varies across vendors and frameworks, but the practical point is simple: model security reduces intrinsic model risk; runtime guardrails manage enterprise risk at the point of interaction.

Why Model Security Alone Is Insufficient

At the lowest level, model security safeguards can block malicious prompts, filter disallowed content, and enforce simple usage rules. These capabilities are necessary, but many high-impact enterprise AI failures do not stem from a lack of rules, but from a lack of context, resilience, and visibility.

For example, an AI system may comply with all defined content policies while still disclosing sensitive information through aggregation, inference, or inappropriate context blending. Similarly, a system may enforce correct rules under normal conditions but fail unpredictably under load, partial outages, or dependency failures.

Scoping AI Security Responsibilities

Organizations consuming AI services face fundamentally different risks than those building custom models or training from scratch. Understanding the implementation scope determines which security controls are managed in-house versus which are managed by the provider

Five implementation scopes
  1. At the most basic level, consumer-facing AI applications like ChatGPT, Claude, or Gemini place nearly all security responsibilities with the provider. The provider manages model security, infrastructure protection, base guardrails, and data protection. The organization focuses on establishing acceptable use policies, controlling user access, and validating outputs for business context. Security efforts center on preventing employees from inputting sensitive data and monitoring for policy violations.
  2. Enterprise applications with AI features, such as Salesforce Einstein or Microsoft Copilot, represent a shared responsibility model. Providers manage model security, application-level controls, and integration security, while the customer manages role-based access, data classification, and workflow permissions. Security focus shifts to identity integration, data governance, and audit logging that demonstrates appropriate use.
  3. Organizations deploying pre-trained foundation models through platforms like Amazon Bedrock or Azure OpenAI take on significantly more security responsibility. While providers manage model security and infrastructure, the customer owns prompt engineering, retrieval architecture, output filtering, and tool permissions. Security focus expands to input integrity, output risk prevention, and comprehensive observability across the entire system.
  4. Fine-tuned models increase the security burden further for the organization. Providers manage only infrastructure and base model security, while the customer manages training data curation, fine-tuning safety, deployment controls, and all runtime security. Security efforts must address data poisoning prevention, model evaluation, and comprehensive guardrails across the development and production lifecycle.
  5. At the most complex level, self-trained models place nearly complete security responsibility on the organization. Providers manage only infrastructure, while all aspects of model development, training pipeline security, deployment, and runtime controls are owned by the customer. Security focus encompasses the complete lifecycle from data collection through production monitoring and incident response.
Understanding Shared Responsibility

The implementation scope fundamentally determines the security obligations for the customer. Consumer AI applications place most security responsibilities with the provider – the organization focuses on governing how employees use those services. Enterprise applications increase the customer’s responsibility for identity integration and data governance while providers secure the underlying infrastructure. Pre-trained foundation models shift significant security responsibility to the organization, including prompt engineering, retrieval architecture, and comprehensive runtime controls. Finally, fine-tuned and self-trained models place nearly all security responsibilities on the customer, from training data curation through production monitoring and incident response.

This shared responsibility model reflects how cloud security already works. Before proceeding with AI security planning, it is vital that the implementation scope is clearly identified. This determines which security capabilities must be built internally, which are managed by the provider, and where collaborative oversight is required. Without a clear understanding and adoption of scope, organizations may either over-invest in controls the provider already manages or leave critical gaps unaddressed.

The New Threat Model: Why Traditional Security Assumptions Break

Traditional application security is built on assumptions of bounded input, predictable execution paths, and deterministic outcomes. AI systems violate and often break each of these assumptions. They accept unstructured natural-language input, generate non-deterministic responses, and often maintain context across multiple turns of conversation. Even where the surrounding application is well engineered, the model introduces probabilistic behavior that is difficult to handle using classic threat modeling alone.
Modern AI deployments are also more complex than “prompt in, response out.” In real enterprises, AI systems are commonly connected to internal documents, databases, APIs, and external services. One common architecture is retrieval-augmented generation (RAG), where the model dynamically pulls in external content at runtime (such as policy documents, knowledge base articles, or customer records) in order to answer complex questions more accurately than the training data of the model alone would allow.

Many deployments will go still further, using autonomous or agent-based systems that can decide what steps to take next without direct human intervention or instruction. These AI agents determine their actions through interaction with external tools - programmatic interfaces that extend the AI's capabilities beyond text generation. Tools are discrete functions that allow AI systems to interact with external systems and perform actions, such as querying databases, calling REST APIs, executing code, accessing file systems, or invoking enterprise applications. Unlike retrieval-augmented generation (RAG), which pulls in contextual information to inform responses, tools enable AI systems to take action and change state in downstream systems.

Each tool represents a security boundary with its own permissions and “blast radius” - when an AI agent accesses a tool, it inherits the ability to perform any action that tool enables. While these capabilities increase efficiency and scale, they expand the attack surface in ways that are unfamiliar to many security teams and difficult to govern using traditional controls.

There is an old saying: “there is nothing new under the sun.” That applies equally to security threats as it does anything else in life, and that means that AI security engineers need to be just as cognizant of older attack surfaces such as buffer overflows and network-level evasions as they are AI-specific ones. At the same time, it is clear that threats to AI systems extend way beyond classic exploits.

They include instruction manipulation (prompt injection), context poisoning via retrieved content, data exfiltration through generated output, abuse of delegated authority in tool-enabled systems, and degradation attacks that undermine reliability rather than confidentiality. How many different ways could an AI encode or obfuscate an SSN number, for example? Moreover, how many ways could an AI be fooled into divulging such information in the first place when we are dealing with systems which can be potentially asked to role play the person whose PII is being extracted?

These threat patterns are documented in standardized test frameworks and methodologies from recognized independent testing organizations. These frameworks provide structured taxonomies of AI-specific vulnerabilities that enterprises should understand and address.

The Capability Areas That Actually Matter

When evaluating AI security solutions for enterprise deployment scenarios, the focus should be on a small number of capability areas that map directly to real-world risk. These capabilities reflect how AI systems are used in production and how failures occur in practice, and they must function as integrated layers, not standalone controls.

AI models cannot reliably enforce enterprise security boundaries on their own; by design they treat inputs as instructions and lack native, auditable access control semantics. Single-layer defenses create exploitable gaps: strong input integrity without output validation allows indirect data exfiltration through RAG poisoning; output filtering without observability prevents detection of policy drift; agentic controls without system resilience fail under stress conditions. Organizations implementing capabilities in isolation experience cascading failures when attackers bypass one layer and exploit the absence of downstream defenses. Mature AI security architectures treat these capabilities as interdependent components of a unified defense system.

These capabilities also map naturally to governance issues, providing answers to questions such as what is permitted, what is prohibited, what is monitored, what is logged, and what can be defended after the fact?

Input Integrity & Instruction Control

AI systems are uniquely sensitive to instructions, context, and framing. Unlike traditional applications, where inputs are strictly parsed and validated, AI systems interpret natural language flexibly by design. That flexibility enables powerful new use cases, but it also creates opportunities for manipulation.

Bad actors will increasingly exploit this flexibility through various threat vectors such as:

  • Prompt injection: the direct attempt to override policy by giving instructions such as “ignore previous instructions,” “act as an admin,” “reveal your system rules”.
  • Indirect instruction embedding: where malicious instructions are placed inside content that the system later consumes, such as a web page, a PDF, an internal wiki page, or a customer support ticket.
  • Context manipulation: which shapes a conversation so that untrusted input is mistaken for trusted policy or is elevated above system instructions.

The enterprise risk is not merely that the model says something inappropriate. The risk is that misconfigured security controls outside the model could allow the AI system to be convinced to treat untrusted instructions as policy and then use its elevated access to data and tools accordingly. For example, a poisoned knowledge-base article might instruct an assistant to “include full customer records for completeness,” or to “reveal troubleshooting steps for internal systems,” which could lead to unintended disclosure or operational harm.

Effective AI security controls enforce data access and tool permissions independently of prompt instructions. They must preserve the integrity of system instructions, clearly distinguish trusted system context from untrusted input, and prevent external content from implicitly modifying behavior. A practical evaluation question is whether the platform can detect instruction-override attempts, identify when retrieved content contains instructions rather than facts, and prove that system-level policies remained in effect under adversarial conditions.

Output Risk & Data Exfiltration Management

AI-generated output is one of the most visible sources of enterprise risk. While organizations often focus on blocking explicitly disallowed content, AI failures could result in more subtle disclosure that appears benign until someone asks, “Should that user have been able to learn that?”

AI systems may expose sensitive information through inference, aggregation, or partial reconstruction rather than direct quotation. A model might reveal internal prompts or policy text through “helpful explanation,” expose proprietary code patterns, summarize confidential documents in ways that leak strategy, or provide operational details that increase attacker effectiveness. In regulated contexts, even indirect disclosure can be problematic if it enables re-identification or reveals protected data.

AI can blur boundaries by combining multiple sources into a single narrative response.

When AI systems are connected to live enterprise data sources, these risks increase significantly. A support assistant that can query a CRM, ticket system, or document repository may inadvertently surface information that is well beyond the supposed access level of the requester, especially if retrieval is overly broad or identity signals are enforced inconsistently. Organizations must establish identity and access controls that are enforced consistently across the entire AI system, from user authentication, through data retrieval, to output generation. Without these controls, manipulation attempts will succeed. Even in organizations with strong access control, AI can blur boundaries by combining multiple sources into a single narrative response.

Output risk management requires more than simple keyword filtering. Controls should understand what data is being exposed, why it is sensitive, and under what conditions disclosure is acceptable. Enterprises should ask whether controls operate with the appropriate granularity (categories, sensitivity levels, jurisdictions), whether policies are role-aware, and whether the system offers defensible explanations when output is blocked or modified. From a governance perspective, the requirement is the ability to demonstrate that sensitive data controls exist and operate consistently.

System Resilience & Robustness

Not all AI failures are dramatic. Some of the most damaging failures occur quietly, as systems degrade under stress. Long-running sessions, malformed inputs (including those which are made impossibly large), repeated queries, or unusually complex tasks can undermine enforcement consistency without causing an obvious outage. In these situations, the system still works, but it may work unsafely or with reduced performance levels.

Degraded systems often fail open rather than closed. A safety classifier may time out and be bypassed; a retrieval step may return broader results than intended; rate limits may be applied inconsistently; or tool invocation may proceed even when a policy check fails. In agentic systems that chain steps together, small degradations can cascade into unsafe actions further downstream.

Resilience is not just an engineering quality; it is a governance issue. A control that only functions under ideal conditions is not a control. Security teams need to evaluate how AI security controls behave under non-ideal conditions and whether failures are detected early and handled predictably. That includes stress testing for volume spikes, malicious query patterns,

long-context interactions, and dependency failures such as identity provider latency, retrieval outages, or connector errors.

A mature AI security platform should help organizations understand and manage failure modes, not just attack modes. That includes safe defaults, clear error handling, consistent enforcement behavior when subsystems degrade, and operational telemetry that allows security teams to recognize degradation before it becomes a business problem.

Policy Accuracy & Trade-off Management

Security controls that are technically effective but operationally disruptive are rarely sustainable. In AI environments, this tension is acute because policy boundaries are often contextual rather than binary. The same request may be legitimate for one role and prohibited for another, for example, and a response may be acceptable when derived from public sources but unacceptable when derived from internal data. The “right” policy is ultimately a governance decision, not a purely technical one.

Overly restrictive controls can block legitimate business activity and encourage the use of “shadow AI” systems, while under-enforcement exposes organizations to legal and reputational risk. Striking the right balance is not just a tuning exercise, but also a governance decision informed by risk tolerance, business context, and accountability requirements.

AI security platforms must provide visibility into false positives and false negatives, explain why decisions were made, and allow policies to evolve under structured change control. If a system blocks a request, organizations need to know whether it was blocked because it contained sensitive data, because it appeared to be an instruction override attempt, because the user lacked authorization, or because it matched a high-risk pattern. Without that clarity, teams cannot tune controls responsibly or defend outcomes to leadership or external auditors.

From a GRC perspective, policy accuracy is also evidence of maturity. Controls that are opaque, unpredictable, or impossible to tune become liabilities because they cannot be audited or defended effectively. Conversely, controls that are tunable but lack discipline can be weakened through ad hoc exceptions. Mature platforms support structured policy management, such as versioning, approvals, role-based application, and reporting that allows leadership to see whether risk is being reduced over time.

Deterministic Verification for Security-Critical Decisions

Probabilistic AI guardrails detect most policy violations effectively, but certain security-critical decisions require mathematical certainty rather than statistical confidence. Authorization policies, regulatory compliance validation, and safety-critical infrastructure controls cannot tolerate the inherent uncertainty of machine learning systems.

Automated reasoning - using formal verification techniques - complements probabilistic guardrails by providing provable correctness for specific control points. Unlike ML-based detection which assigns confidence scores, automated reasoning proves mathematically whether a policy holds under all possible conditions.

Deterministic verification matters in the following situations:

  • Authorization decisions: Verifying that access control policies are correctly enforced across all user roles and data classifications
  • Compliance validation: Proving that regulatory requirements (GDPR data residency, HIPAA access restrictions) are satisfied before deployment
  • Safety-critical systems: Ensuring that AI systems controlling physical infrastructure or financial transactions cannot violate safety constraints

Organizations should identify which controls require proof versus detection: content filtering and general guardrails benefit from ML's flexibility; authorization enforcement and compliance validation require formal verification's certainty. Mature AI security platforms integrate both approaches - probabilistic detection for broad threat coverage and deterministic verification for controls that must never fail.

Organizations should identify which controls require proof versus detection: content filtering and general guardrails benefit from ML's flexibility; authorization enforcement and compliance validation require formal verification's certainty. Mature AI security platforms integrate both approaches - probabilistic detection for broad threat coverage and deterministic verification for controls that must never fail.

Agentic AI & Delegated Authority

As AI systems evolve from passive assistants to active participants in organizational workflows, a new class of risk emerges: delegated authority. Agentic AI systems can invoke external tools, call APIs, modify records, or trigger workflows autonomously. In effect, they operate as digital employees with broad access and high speed.

This raises familiar governance questions in an unfamiliar context. What actions is the system authorized to take, and under what conditions? How are permissions constrained? How is escalation prevented? How is separation of duties maintained? Without clear limits on autonomy, small mistakes can propagate rapidly across systems. A model that misinterprets a user’s intent may email the wrong audience, create a ticket containing sensitive information, approve an action that should have required review, or change a record in a way that triggers operational consequences downstream.

Effective controls must enforce least privilege, prevent unauthorized tool use, and support accountability. This includes: managing tool permissions explicitly; restricting high-impact actions (especially those that change system state); requiring confirmation for moderate-impact actions (such as drafting but not sending external communications, proposing configuration changes without applying them, or preparing but not executing data queries); and requiring human approval for sensitive operations (such as modifying financial records, changing access permissions, executing privileged infrastructure commands, initiating external communications on behalf of the organization, or approving transactions). It also includes detecting unsafe chains, where a sequence of individually permissible actions produces an outcome that should never occur without oversight.

A practical way to evaluate this capability is to ask whether the system can enforce graduated autonomy, perhaps allowing benign actions freely, requiring confirmation for moderate-impact actions, and prohibiting or escalating high-impact actions. Governance leaders should treat agentic AI as a delegation problem, not a model problem. If you would not grant a junior employee unilateral authority to take a certain action, you should not grant it to an AI agent either.

Observability, Audit & Forensics

When something goes wrong in an AI system, the most important question is not simply what happened, but whether the organization can clearly show how and why it happened. In enterprise environments, the ability to produce firm evidence of an issue is the difference between an incident that can be explained and closed, and an incident that becomes a prolonged governance problem.

Without sufficient visibility, security teams must rely on partial logs, assumptions, and intuition when responding to incidents. Governance efforts suffer because policies and controls cannot be demonstrated, defended, or improved if their behavior cannot be examined and measured after the fact. This is especially true for AI systems, where non-determinism and context can make outcomes difficult, if not impossible, to reproduce.

AI security platforms must provide detailed logging, clear attribution, and timely alerting, and enterprises should be able to reconstruct AI interactions end to end: the input received; the identity and authorization context applied; the retrieved content (where applicable); the system instructions in effect; the model output; and any tools or downstream actions triggered. When a control intervenes—blocking a request, modifying an output, or preventing an action—it should record why and under which policy.

From a GRC standpoint, auditability is essential. Organizations will be expected to demonstrate that controls existed, that they operated as intended, that exceptions were handled appropriately, and that changes were tracked over time.

Why AI Security Without GRC Will Fail

AI systems operate at speeds and scales that outpace traditional governance processes. When failures occur, organizations will be judged on their ability to demonstrate control, intent, and accountability. In that environment, AI security divorced from governance, risk, and compliance is fragile by design.

Regulators and boards will need to know whether risks were identified, whether controls were proportionate, and whether decisions can be explained. Security teams that cannot produce evidence will struggle to defend outcomes, regardless of intent.

Security teams that cannot produce evidence will struggle to defend outcomes, regardless of intent.

GRC functions play a critical role in translating technical controls into defensible enterprise practices. This includes defining acceptable use, documenting risk decisions, validating control effectiveness, and ensuring that changes are tracked, reviewed, and auditable over time.

Conclusion: Governing AI at Enterprise Scale

AI is no longer a peripheral technology; it is becoming core infrastructure, shaping how enterprises operate, compete, and make decisions.

As AI systems become embedded in business-critical processes, enterprises must move deliberately from experimentation to accountability. This requires clearer expectations of AI security vendors, including transparency around limitations, support for independent validation, and a commitment to iterative improvement rather than one-time assurance.

Accountability also requires internal alignment. Security, IT, legal, risk, and business stakeholders must agree on risk tolerance, escalation paths, and response strategies before incidents occur. AI security is not owned by a single team; it is a shared enterprise responsibility.

The organizations that succeed with AI will be those that pair innovation with discipline. Innovation alone does not create durable advantage; sustainable success comes from understanding, instrumenting, and governing risk as systems scale. Effective AI security is not about eliminating uncertainty, but about managing it transparently, responsibly, and in ways that can withstand scrutiny.

© 2026 NSS Labs. All rights reserved. The NSS Labs Terms of Service and Copyright and Quote policy can be found here: https://nsslabs.com/terms-of-service/

Download Part 2: Evaluating Enterprise AI Security: Questions Every Buyer Should Be Able to Answer

Part 2 translates this framework into action, giving security leaders, GRC teams, and enterprise buyers the specific questions to ask vendors, the red flags to watch for, and the evaluation criteria needed to make confident, defensible AI security decisions.




Deliver and Secure Every App
F5 application delivery and security solutions are built to ensure that every app and API deployed anywhere is fast, available, and secure. Learn how we can partner to deliver exceptional experiences every time.
Connect With Us
AI Security Beyond the Model: What Enterprises Need to Care About — and Why | F5