Evaluating enterprise AI security: Questions every buyer should be able to answer

Enterprise AI security is moving fast, and most organizations still struggle to evaluate vendor claims with rigor. This NSS Labs research paper, developed in collaboration with F5, AWS, and Microsoft, gives security leaders and enterprise buyers the specific questions to ask vendors, the red flags to watch for, and the evaluation criteria needed to make confident, defensible AI security decisions.

Executive summary

Enterprise AI security has a buyer problem. Organizations know AI security matters, but most lack a reliable framework for evaluating vendor claims, distinguishing meaningful controls from superficial ones, or holding vendors accountable to real-world evidence.

This paper — the second in a two-part series from NSS Labs, developed in collaboration with F5, AWS, and Microsoft — shifts from defining the problem to equipping buyers to act on it. Where Part 1 established why securing the AI model alone is insufficient, Part 2 translates that foundation into practical evaluation criteria across seven critical capability areas.

The core argument: AI security purchasing decisions are governance decisions, not just technical ones. Buyers should demand measurable evidence, not architecture diagrams. That means quantitative baselines for detection effectiveness, operational performance, audit requirements, and framework alignment — all validated against real-world conditions, not isolated test scenarios.

The seven capability areas the paper examines are input threat and instruction control, output and data exfiltration risk, system resilience under degradation, policy/filter governance, agentic AI and delegated authority, observability and forensics, and integration and interoperability. Each section pairs the core risk with specific questions buyers should put to vendors, characteristics of effective solutions, and concrete red flags to watch for.

The paper closes with two pointed mandates: demand independent validation before you buy, and treat that validation as an ongoing practice rather than a one-time checkpoint. Vendors who resist meaningful testing signal immaturity, not strength.

The bottom line is three sentences from the conclusion worth taking seriously: AI security purchasing is a governance decision with long-term consequences. Enterprises that ask the hard questions up front are better positioned to manage risk and avoid preventable failures. And AI security will not succeed through novelty or abstraction — only through accountability, transparency, and disciplined evaluation.

Introduction: From awareness to evaluation

Establishing quantitative evaluation criteria

Critical capability areas & questions buyers should be asking

Input threat, instruction control, & evasion detection
Output & data exfiltration risk management
Resilience, robustness, failure, and degradation handling
Policy/filter efficacy & governance alignment
Agentic AI, tool invocation security, & delegated authority
Observability, audit & forensics
Integration and interoperability

What buyers should demand before they buy

Preparing for independent validation

Conclusion: From claims to accountability

DOWNLOAD PART 1: AI security beyond the model: What enterprises need to care about — and why

Introduction: From awareness to evaluation

As AI increases penetration into enterprise systems in various forms, it opens up new threat surfaces forcing enterprises to consider AI security risk. Boards, regulators, and executive teams are coming to recognize that AI systems are introducing new categories of operational, legal, and reputational exposure, in addition to the expanding threat surface they create. As a result, AI security products intended to reduce these risks are now at front of mind for enterprise security teams.

That awareness does not guarantee readiness, however, and while enterprises increasingly accept that AI security is important, many struggle to evaluate effectively the claims made by vendors operating in this emerging space. Terminology is inconsistent, demonstrations are often narrowly scoped, and assurances are frequently grounded in architecture rather than real-world evidence. Buyers are left to make decisions without a clear way to distinguish meaningful controls from superficial ones, or just how effective the coverage really is.

“Failures in this area often occur without any obviously malicious prompt and instead emerge through the way untrusted content is introduced into the AI context.”

In our previous white paper entitled AI Security Beyond the Model, NSS Labs outlined why securing the AI model alone is insufficient, and why enterprise AI security must be treated as a system-level and governance challenge. The real security challenges emerge in the systems surrounding the model - where data is retrieved, tools are invoked, and automated decisions are executed.

This paper builds on that foundation by moving from understanding the problem to evaluating potential solutions and helping enterprise buyers formulate better questions when shortlisting AI security vendors. Consistent with AI Security Beyond the Model, this paper focuses primarily on runtime guardrails (the controls outside the model that enforce policy, protect data, and produce audit evidence) while recognizing that model security/Responsible AI reduces intrinsic model risk but cannot manage enterprise interaction risk alone. As AI systems become embedded in enterprise workflows, security decisions increasingly intersect with governance, risk management, and regulatory accountability. Organizations must be able to explain not only how AI systems function, but how risks are controlled, monitored, and audited.

The sections that follow attempt to reframe core AI security capability areas as evaluation criteria. Rather than prescribing specific technologies or architectures, they focus on what buyers should be able to understand, test, and defend when selecting AI security controls for production environments.

Establishing quantitative evaluation criteria

Evaluating AI security requires moving beyond qualitative claims to measurable validation. Buyers should establish baseline expectations that enable objective vendor comparison.

Detection effectiveness: Vendors should demonstrate detection capabilities against standardized test frameworks and methodologies from recognized independent testing organizations. Organizations should expect measurable accuracy rates on known attack patterns, with vendors disclosing test methodology, coverage limits, and false-positive tradeoffs. Sensitive data exposure prevention should distinguish between structured data types and contextual sensitive information, with detection accuracy remaining consistent across different foundation models and prompt complexity levels.

Operational performance: Buyers should establish latency thresholds that balance security with usability. Systems should demonstrate performance under production-scale load and not isolated test conditions. False positive rates also matter - excessive false positives create alert fatigue and drive shadow AI adoption. Vendors should be capable of providing production metrics over meaningful time periods.

Audit requirements: Regulatory compliance depends on comprehensive logging with sufficient detail to reconstruct incidents months later. Events should export to SIEM in standard formats with minimal latency. Retention requirements will vary by industry and use case.

Framework alignment: Vendors should map controls to established frameworks from recognized independent testing organizations. Buyers should demand vendors demonstrate these capabilities against standardized test sets and production traffic samples. Independent third-party validation provides reliable evidence of control effectiveness.

Critical capability areas & questions buyers should be asking

At the lowest level, model security safeguards can block malicious prompts, filter disallowed content, and enforce simple usage rules. These capabilities are necessary, but many high-impact enterprise AI failures do not stem from a lack of rules, but from a lack of context, resilience, and visibility.

For example, an AI system may comply with all defined content policies while still disclosing sensitive information through aggregation, inference, or inappropriate context blending. Similarly, a system may enforce correct rules under normal conditions but fail unpredictably under load, partial outages, or dependency failures.

Input threat, instruction control, & evasion detection

Most AI security controls claim to "filter prompts" or "block malicious inputs," but this framing often oversimplifies the problem. In most AI systems, risk comes not only from what users enter into a system directly, but also from instructions embedded in content consumed by the systems downstream. These can be documents, web pages, tickets, emails, or knowledge bases, the content of which needs to be trusted enough to be treated as authoritative context.

Authors

Bob Walder - Senior Analyst, NSS Labs
lan Foo - Chief Technology Officer and EVP of Product, NSS Lab

Contributors

This research was developed in collaboration with security, cloud, and AI leaders across the industry.

Cameron Delano - Sr. Solutions Architect, F5
Jeanette Hurr - Global Solutions Architect, F5
Riggs Goodman III - Principal Solutions Architect, AI Security and Privacy, AWS
Raj Bagwe - Sr. Solutions Architect, AWS
Zachary Riffle - Security Architect, Microsoft

INDEPENDENT TEST RESULTS | MARCH 19, 2026

Ready to see the evidence?

See how F5 AI Guardrails performed across 20,000 real-world attacks performed by independent third-party lab, SecureIQLab.

Get the efficacy report

Buyers should ask how a system distinguishes system instructions, developer intent, retrieved content, and user input at runtime. In architectures that use retrieval-augmented generation (RAG), this distinction is critical. If retrieved content can implicitly override system policies, the system is vulnerable to indirect prompt injection even if no overtly malicious query is ever submitted.

Enterprises should expect clear answers to questions such as:

How well can the system detect, evaluate, and block requests and instructions that are prohibited by policy?
How well can the system detect and prevent attempts to override intended system behavior through input prompts?
How well does the system handle ambiguous or borderline prompts where intent is unclear, while still minimizing disruption to legitimate use?
How effectively can the system prevent instructions embedded in content submitted from influencing the model in unintended ways?
How effectively can the system handle attempts to evade controls through unusual formatting, encoding, or transformations of the prompt?
How well does the system balance safety with legitimate role-based use cases (e.g., training simulations, red teaming, creative work) without creating easy bypass paths?
How effectively does the system prevent users (or external content) from influencing or overriding system/agent instructions, routing rules, tool-use constraints, or other "higher priority" controls?
Is the system able to effectively detect, identify, and regulate use of the model to prevent abusive or excessive (beyond policy) usage and cost?
Is the system able to show, in logs, which instruction source won when conflicts occur (system vs developer vs retrieval vs user)?

In practice, failures in this area often occur without any obviously malicious prompt and instead emerge through the way untrusted content is introduced into the AI context. An attacker may embed hidden instructions inside a document (a policy document, internal wiki page, or customer support article, for example) that is later ingested by the AI system through a complex multi-source retrieval pipeline. When the model retrieves that content, it may treat such embedded instructions as authoritative guidance rather than untrusted input. In enterprise environments where internal content is widely editable, this form of indirect prompt injection can be particularly difficult to detect.

“Failures in this area often occur without any obviously malicious prompt and instead emerge through the way untrusted content is introduced into the AI context.”

For example, it is not necessary, nor from a threat actor's point of view is it necessarily desirable, for a single changed source of information to raise a red flag, preventing serious damage from being inflicted immediately. Instead, minor changes to multiple sources of data may be a better strategy, resulting in subtle but persistent changes in system behavior over time, such as relaxing response constraints, exposing internal workflows, or prioritizing attacker-controlled context over system policies. The end result can be a larger-scale extraction of sensitive data.

Buyers should therefore evaluate whether AI security controls treat all retrieved content as potentially hostile, and whether instruction boundaries are enforced consistently regardless of where instructions originate.

Effective solutions distinguish user input from retrieved content with high accuracy. When instruction override attempts appear in user prompts, systems block them; when similar phrases appear in legitimate support articles, systems handle them appropriately. Detection works on semantic intent, not keywords, catching obfuscated attacks through encoding manipulation or role-play scenarios. Vendors need to demonstrate detection using the organization's actual RAG architecture with multi-source retrieval. Systems should provide production metrics over meaningful time periods showing detection rates, false positive rates, and latency.

Red flags include: reliance on generic prompt filtering; vague assurances that "the model handles it;" an inability to demonstrate enforcement of instruction boundaries under adversarial conditions; failure to provide production metrics over meaningful time periods; inability to demonstrate detection using in-house RAG architecture; and keyword-only detection without semantic intent analysis.

Input integrity is not about blocking malicious language, it is about preserving policy authority in dynamic, untrusted environments.

Output & data exfiltration risk management

AI-generated output clearly poses a significant risk given that a non-human agent may have pulled data from many different systems in order to formulate its answer, and that answer may reveal more information to external users than was originally intended. Many high-impact failures do not involve explicitly disallowed content, they involve instead subtle disclosure of information that should not have been revealed to a particular user, in a particular context, at a particular time.

An attacker or unauthorized user may ask a series of questions that may seem benign on the surface and that individually return permissible data but collectively allow sensitive information to be inferred or reconstructed. For example, an AI assistant connected to internal ticketing and documentation systems might summarize operational issues, personnel roles, or system configurations in ways that reveal internal architecture or regulated data indirectly. These failures are rarely caught by keyword filters. Instead, the risk emerges from aggregation and context. Buyers should assess whether output controls account for cumulative exposure and whether systems can explain why a response was restricted - an essential requirement for governance and auditability.

Buyers should expect clear answers to questions such as:

How do AI security controls reason about what actual information is being exposed contextually, rather than simply what key words or phrases appear in the output?
Can the system distinguish between public information, internal information, and regulated data?
How effectively does the system prevent the model from revealing sensitive data in responses, even when the user request appears legitimate or ambiguous?
What types of sensitive data does the protection cover (e.g., personal data, financial data, confidential corporate information), and how can it be tailored to the organization
Are controls sensitive to user role, jurisdiction, or data origin?
Can the system explain why an output was blocked or modified?
Can the system enforce controls on both requests and responses, including denial, redaction, and safe transformation with audit evidence?
What actions can the system take to prevent the model from outputting sensitive data (block, redact, transform, warn, require step-up review/approval)?
How well does the system prevent prompt tricks and evasion strategies that attempt to fool the model into disclosing data indirectly (e.g., asking for "examples," "templates," "test data," or "summaries" that contain real info)?
Can the system prevent and detect cross-session or cross-tenant data leakage, ensuring that information from one user's session, context, or retrieved data sources cannot appear in another user's responses?
How effectively does the system prevent the model from revealing system prompts, internal instructions, hidden policies, or operational configuration details?
How effectively does the system prevent exposure of credentials or secrets (e.g., API keys or tokens) that may exist in training data, system context, logs, connectors, or retrieved content?
How effectively does the system prevent the model from producing outputs that are harmful, derogatory, or strongly biased, especially in edge cases where a user is baiting the model?
How well can the system prevent the model from outputting large verbatim copyrighted passages or otherwise reproducing protected content?
If the model is connected to internal corporate content (e.g., documents, wiki or ticketing systems), can the system prevent it from reproducing restricted internal material beyond what the user is authorized to see?
How effectively can the system prevent the model from 'walking a user through' harmful workflows or producing operational artifacts (e.g., scripts, commands, or code) that could create security vulnerabilities or unsafe system behavior?

When AI systems are connected to live enterprise data sources, these risks increase significantly and output-related failures can arise when AI systems combine information from multiple sources into a single response.

Evaluating output controls requires understanding how mature solutions handle contextual data exposure. Context-aware systems understand semantic meaning, not just keywords. This should mean that structured PII is blocked while public information is allowed based on context. Role-based enforcement ensures different users see different outputs for identical queries based on authorization levels. Systems should detect cumulative exposure across sessions where individually benign questions collectively reveal sensitive information. Blocking decisions need to include specific policy references, data types, and authorization requirements. Integration with existing data classification policies eliminates duplicate configuration.

Red flags include: keyword-only filtering; opaque enforcement decisions; inability of the system to demonstrate role-based enforcement, e.g. same query returns same output regardless of user authorization level); inability of vendor to explain why specific output was blocked in human-readable terms; and the absence of audit trails. Enterprises should seek evidence that output controls operate with context, intent, and accountability.

Resilience, robustness, failure, and degradation handling

Enterprises are accustomed to evaluating security controls based on effectiveness under normal network conditions, but AI security demands a different mindset. Some of the most significant failures occur not when systems are attacked directly, but when they are stressed by volume, complexity, dependency failures, or unexpected patterns of interaction.

Key questions to ask would be:

How well does the system maintain availability and responsiveness during unusually long or complex user interactions (e.g., "marathon" sessions)?
Are controls applied consistently during long-running sessions or multi-step agent workflows?
How does the system prevent resource exhaustion or degraded service when the system receives bursts of repetitive, malformed, or nonsensical inputs?
What happens when AI security controls degrade?
Do policy checks time out, and does enforcement fail open or closed?
How does the system balance security and uptime during dependency failures, and is that behavior configurable and auditable?

Real-world AI systems operate under variable conditions, such as peak usage, upstream outages, slow identity providers, or degraded retrieval services. In such scenarios, enforcement components may fail partially rather than completely. For example, a policy evaluation service may time out, or a classification step may be skipped to preserve responsiveness. If these failures are not handled carefully, systems may default to permissive behavior without alerting operators, and in agentic workflows, this can allow actions to proceed without appropriate checks. These are not hypothetical edge cases, but common failure modes in distributed systems. Buyers should therefore insist on demonstrations of how AI security controls behave when dependencies degrade, and whether failure modes are predictable, visible, and aligned with enterprise risk tolerance.

“A resilient system should not simply block everything when something goes wrong, nor should it silently allow unsafe behavior.”

A resilient system should not simply block everything when something goes wrong, nor should it silently allow unsafe behavior. Instead, it should degrade predictably, report errors clearly, and continue to provide adequate protection as conditions deteriorate.

Well-designed systems should document explicit failure modes for every dependency. When services time out, systems should fail appropriately, log events, alert security teams, and display clear messages to users. Under stress testing, detection accuracy needs to remain high, latency must stay within acceptable thresholds, and requests must not be able to bypass security checks. Degradation needs to occur predictably with alerting, not silently, and circuit breakers must terminate runaway processes promptly. Vendors should be able to describe enforcement availability targets and failure behavior, and where enforcement is mission-critical, buyers may require contractual commitments.

Red flags include: an inability to describe failure modes; a lack of stress testing or assurances that "the system has not failed in practice;" silent degradation without alerting; lack of circuit breakers for runaway processes; inability to demonstrate that detection accuracy remains high under stress testing; and systems that allow requests to bypass security checks during degradation.

Policy/filter efficacy & governance alignment

AI security controls that are technically sophisticated but operationally misaligned rarely survive contact with the front line - the business. In enterprise environments, policy decisions are not typically binary or fixed and are ultimately governance decisions.

Buyers should ask:

How are policies defined, approved, versioned, and audited?
Can policies be scoped by role, data sensitivity, geography, or business context?
Can the system explain why a decision was made?
How well can the system enable policies that reflect organization-specific compliance posture (industry rules, internal policies, acceptable-use constraints), and how are those policies maintained over time?
How effectively does the system help prevent responses that could create legal or compliance exposure (e.g., advice or guidance that violates applicable rules/standards)?

Policy failures often emerge not from poor intent, but from drift between technical enforcement and business reality. For example, an AI system may correctly block access to certain data categories but do so in a way that prevents legitimate business workflows, leading users to bypass controls entirely through unsanctioned tools. Conversely, policies that are too loosely defined may allow sensitive information to be exposed in contexts that violate regulatory or contractual obligations. Over time, ad hoc tuning to address complaints can erode policy integrity and accountability. Buyers should therefore evaluate whether AI security platforms support structured policy governance, including documented approvals, version history, and the ability to review enforcement outcomes. Without this discipline, policy accuracy becomes subjective and difficult to defend after a security incident.

“AI security controls that are technically sophisticated but operationally misaligned rarely survive contact with the front line.”

Over-enforcement can create disruption and possibly tempt frustrated users to implement unauthorized "shadow" AI systems, while under-enforcement may result in regulatory and reputational risk. Both of these situations reflect governance rather than model failure.

Effective policy management should treat policies as code with version control, approval workflows, and audit trails. Policy changes should require documented approvals from security, legal, and business stakeholders before production deployment. Every enforcement decision should be traceable to specific policy clause, version, section, and authorization logic. Where possible, policy refinement should be data-driven from system dashboards providing information such as false positive trends, high-impact blocks and, where possible, tuning recommendations.

Red flags include: opaque decision-making; ad hoc tuning, or claims that policies are "self-learning" without accountability; absence of version control and approval workflows for policy changes; enforcement decisions that cannot be traced to specific policy clauses; and policy conflicts without documented resolution rules.

INDEPENDENT TEST RESULTS | MARCH 19, 2026

Ready to see the evidence?

See how F5 AI Guardrails performed across 20,000 real-world attacks performed by independent third-party lab, SecureIQLab.

Get the efficacy report

Agentic AI, tool invocation security, & delegated authority

As AI systems move beyond passive assistance and begin to act on behalf of users and other systems, a new category of risk emerges, known as delegated authority. Agentic AI systems can invoke other tools, modify records, and trigger complex workflows at machine speed.

Buyers evaluating agentic AI security should examine several key control areas:

What actions can the system take autonomously?
How are permissions constrained?
How well does the system enforce what an agent is, or is not, allowed to do when it is delegated authority to act on a user's behalf?
Can autonomy be tiered by risk or context, since without clear limits, small errors can propagate rapidly across systems?
How well do system controls prevent a prompt from steering the system into actions outside of intended boundaries (even if the user "asks nicely") when it is able to interact with or leverage external tools and agents?
Does the system effectively prevent or contain recursive loops or "runaway" behavior where an agent keeps acting without achieving the goal?
How effective is the system at preventing unauthorized tool use, API/action escalation, excessive autonomy/loops, and unsafe cross-tool chaining?
In multi-agent architectures, how effectively does the system prevent sensitive data retrieved or generated by one agent from being unnecessarily propagated to downstream agents or tools performing related tasks?
In multi-agent architectures, it is vital that authority propagates correctly across system boundaries. How effectively does the system preserve identity context across agent chains, ensuring downstream agents and tools operate strictly within the original user's permissions rather than elevated system privileges?

Delegated authority failures often occur when AI systems are granted broad tool access to enable automation without appropriate constraints. For instance, an AI agent tasked with resolving support issues might be able to query databases, update records, and notify users. A misinterpreted instruction or unexpected context could result in incorrect data modification or premature communication to customers. Because these actions occur at machine speed, errors can propagate before human oversight can intervene. In complex workflows, individual low-risk actions can combine to create high-impact outcomes. Buyers should assess whether AI security controls enforce least privilege, require confirmation for sensitive actions, and provide mechanisms to limit or revoke autonomy dynamically. Treating agentic behavior as a governance issue is essential to preventing silent escalation.

“Identity context preservation ensures agents inherit user permissions, not system-level permissions.”

In well-designed systems, these risks are mitigated through architectural controls. Properly constrained systems enforce least-privilege per tool with documented permission boundaries. Graduated autonomy should require confirmation for moderate-impact actions and human approval for high-impact actions, while identity context preservation ensures agents inherit user permissions, not system-level permissions. When users request sensitive operations, agents should retrieve only authorized information while blocking unauthorized access. Circuit breakers need to terminate runaway loops promptly and all agent actions should be logged fully, with reasoning, authorization checks, and outcomes.

These controls reflect an important distinction between architectural constraints, where the system technically prevents an agent from invoking certain tools or actions, and behavioral constraints, where the agent could perform the action but is instructed not to. Security-critical boundaries should rely on architectural controls wherever possible rather than behavioral instructions alone.

Red flags include: Warning signs of weak agentic governance include blanket tool access or reliance on model alignment alone to prevent misuse; absence of documented permission boundaries per tool; failure to preserve identity context ensuring agents inherit user permissions; inability to detect unsafe action sequences through chain analysis; and incomplete logging of agent actions without reasoning and authorization checks.

Observability, audit & forensics

When something goes wrong in an AI system, it is important to be able to demonstrate how and why it happened, and perhaps even reproduce it.

Buyers should ask whether AI interactions can be reconstructed end to end, including inputs, context, policies, outputs, actions, tools invoked, and data accessed. Effective systems should also support continuous monitoring and alerting, enabling security teams to detect abnormal behavior, unsafe action chains, or unexpected agent interactions before incidents escalate. Without sufficient visibility into how the system behaves, incident response becomes speculative and governance efforts lose credibility.

When AI-related incidents occur, organizations may be expected to provide evidence rather than explanations. Without detailed records of inputs, retrieved context, applied policies, outputs, and actions, incident response teams are forced to reconstruct events based on assumptions rather than facts, particularly problematic in AI systems, where behavior may not be reproducible due to non-determinism. For example, a regulator or internal auditor may ask why certain information was disclosed or why an automated action was taken, but without end-to-end observability, organizations cannot demonstrate intent, proportionality, or control effectiveness. Buyers should therefore evaluate whether AI security platforms provide the level of visibility required to support incident response, compliance reporting, and post-incident learning.

Comprehensive logging should enable end-to-end request reconstruction including original prompt, retrieved context with sources, identity/authorization context, policies evaluated with versions, model output, modifications made, final user response, and downstream actions. AI security events should be capable of integrating with SIEM systems with minimal latency, enabling correlation with network traffic and authentication events. Critical violations must trigger alerts promptly with sufficient context for immediate response, including unified dashboards to track security, performance, and quality metrics with trending analysis.

From a GRC perspective, systems that cannot be observed cannot be governed.

Integration and interoperability

AI security controls operating in isolation create visibility gaps and prevent coordinated response. Defense-in-depth architectures require controls that share telemetry and coordinate enforcement across security layers.

Identity and access management: Authorization boundaries must extend from user authentication through data retrieval to final output. Organizations should evaluate how systems integrate with identity providers and whether role changes propagate appropriately. Systems should maintain authorization context when AI agents act on behalf of users, with traceable permission chains; and access policies should inherit from existing RBAC models without duplicate configuration. At the retrieval layer, systems should enforce least privilege with documents filtered based on requesting user's authorization level.

SIEM integration: AI security events must correlate with broader security telemetry. To accomplish this, systems should export events to SIEM platforms in standard formats such as OCSF or CEF with minimal latency. Proprietary log formats prevent correlation with network traffic, authentication events, and data access patterns. Organizations should verify whether custom detection rules can be built on AI security events and whether alerts integrate with existing incident response workflows.

Compliance framework alignment: Solutions should map controls to established frameworks with specific control coverage and audit evidence, and compliance reports should be customizable for specific regulatory requirements. Generic compliance claims without specific control mappings are insufficient.

API and automation support: Modern security operations require programmatic control, with systems supporting policy-as-code workflows and infrastructure-as-code integration. Undocumented or frequently changing APIs create integration fragility.

What buyers should demand before they buy

Enterprises should demand evidence over architecture diagrams and glossy marketing materials, repeatability over slick demonstrations, and independent validation over self-assertion.

Buyers should expect vendors to explain limitations, support realistic testing, and provide artifacts that withstand scrutiny.

Security controls that cannot be tested or explained should not be trusted to protect business-critical AI systems.

Preparing for independent validation

Evaluating AI security is difficult, but that difficulty does not excuse the absence of validation. Independent testing remains one of the few ways enterprises can gain confidence that controls and features behave as claimed.

Buyers should expect validation approaches that reflect real-world conditions, including adversarial inputs, degradation scenarios, and governance requirements. Vendors unwilling to participate in meaningful validation signal immaturity, not strength.

Nor should validation be treated as a one-time exercise. AI security controls that perform well at initial assessment may degrade as threats evolve, model capabilities advance, and attack techniques emerge. The threat landscape changes rapidly, and AI security incidents may involve attack methods that have not been seen before. Organizations must therefore establish continuous validation requirements alongside initial testing.

Threat intelligence integration: Organizations should evaluate vendor update cadence for incorporating newly disclosed vulnerabilities from threat intelligence vendors and security research organizations. Vendors should demonstrate detection of recently published attacks, not just historical training data. Automated threat intelligence ingestion enables faster response than manual review processes.

Model evolution compatibility: Foundation model capabilities evolve rapidly, requiring security controls to adapt. Organizations should request model-specific detection effectiveness data across different foundation models and beware of generic claims without quantitative evidence which can hide blind spots. Vendors should provide explicit timelines for certifying new model versions and interim mitigations during certification gaps.

Buyer-driven validation: Organizations should continuously test controls rather than relying solely on vendor validation. Buyers should negotiate contractual rights for periodic red team exercises, continuous automated bypass testing, and A/B testing of policy changes in production. Vendors restricting customer testing or withholding granular telemetry signal immaturity.

Organizations should treat validation as an ongoing partnership rather than a one-time assessment to maintain effectiveness as the threat landscape evolves. In the absence of this practice, even well-designed controls can eventually prove insufficient against emerging attack techniques.

Conclusion: From claims to accountability

Buying AI security is not a technical decision alone; it is a governance decision with long-term consequences.

Enterprises that ask the hard questions up front are better positioned to manage risk, defend decisions, and avoid preventable failures.

AI security will not succeed through novelty or abstraction, but through accountability, transparency, and disciplined evaluation.

© 2026 NSS Labs®. All rights reserved. No part of this publication may be reproduced, copied/scanned, stored on a retrieval system, emailed, or otherwise disseminated or transmitted without the express written consent of NSS Labs ("us" or "we").

Evaluating enterprise AI security: Questions every buyer should be able to answer

Executive summary

Table of contents

Introduction: From awareness to evaluation

Establishing quantitative evaluation criteria

Critical capability areas & questions buyers should be asking

Input threat, instruction control, & evasion detection

Authors

Contributors

INDEPENDENT TEST RESULTS | MARCH 19, 2026

Ready to see the evidence?

Enterprises should expect clear answers to questions such as:

Output & data exfiltration risk management

Buyers should expect clear answers to questions such as:

Resilience, robustness, failure, and degradation handling

Key questions to ask would be:

Policy/filter efficacy & governance alignment

Buyers should ask:

INDEPENDENT TEST RESULTS | MARCH 19, 2026

Ready to see the evidence?

Agentic AI, tool invocation security, & delegated authority

Buyers evaluating agentic AI security should examine several key control areas:

Observability, audit & forensics

Integration and interoperability

What buyers should demand before they buy

Preparing for independent validation

Conclusion: From claims to accountability