How CISOs can measure AI security risk

F5 Research | June 11, 2026

Louise ScullySenior Manager, PMM, GenAI Security | F5

AI security risk is becoming harder for CISOs to manage through policy alone. As organizations adopt more AI models, AI applications, and agentic workflows, security leaders need clearer ways to assess which models are appropriate for use, which vendors may introduce risk, and how AI systems behave when exposed to real-world threats.

That is where measurement matters. F5’s AI security benchmarks, including the Comprehensive AI Security Index (CASI) and the Agentic Resistance Score (ARS), help organizations evaluate AI model risks, compare AI security posture, and understand how models and agentic systems perform under adversarial pressure. When combined with ongoing AI threat intelligence from F5 Labs, these benchmarks give CISOs a stronger evidence base for model selection, vendor governance, runtime protection, and continuous AI risk assessment.

“CISOs cannot wait until deployment to measure AI risk. The strongest decisions happen earlier in the lifecycle.”

This matters because many AI security decisions are made long before a model reaches production. By the time an AI system is live, the organization may already have selected the model, approved the vendor, defined the architecture, mapped the data flows, connected tools, and granted system access. Runtime controls remain essential, but they are stronger when they are built on better upstream decisions.

For CISOs, the question is no longer only how to secure AI once it is deployed. It is how to make AI risk measurable across the full lifecycle, from procurement and model selection to deployment, monitoring, and continuous testing.

AI risk starts before runtime

AI security is often discussed in the context of runtime protection: monitoring prompts, controlling outputs, and preventing misuse once AI systems are in production. Those controls are critical, but they are only part of the AI security lifecycle.

AI risk is not only a runtime problem. It is also a procurement problem, a vendor governance problem, and an architecture problem. If security evaluation happens only after deployment, CISOs may be left managing risks that could have been reduced, challenged, or avoided earlier in the process. The goal is not to slow AI adoption. It is to help organizations move faster with clearer visibility into the risks they are accepting.

What the recent research shows

Each month, F5 Labs publishes AI Security threat intelligence alongside the CASI and ARS leaderboards to track how model, agentic, and ecosystem risks are evolving. May 2026 F5 Labs research points to a broader shift in how CISOs need to think about AI risk. The findings are not just technical signals for researchers; they are practical lessons for security leaders responsible for approving models, reviewing vendors, setting policy, and protecting AI applications as they move into production.

The first lesson is that model choice is now a security decision. The May CASI analysis showed a sharp difference in security posture across the Qwen model family, including a 78-point difference between Qwen3-2B and Qwen3-4B. For CISOs, the takeaway is clear: knowing the model name is not enough. Different sizes, versions, configurations, and deployment choices can carry different risk profiles.

A smaller or lower-cost model may look like an attractive substitute if it performs well enough for a specific task, but if that model introduces a materially different security posture, security teams may need to challenge or stop the swap. It is easier to stand behind the decision with objective testing, comparative scores, and a clear view of how one model performs against another under adversarial conditions.

The second lesson is that AI safety behavior can fail in ways static policy will not catch. May’s threat intelligence highlighted adversarial techniques that continue to test the limits of model safeguards, including jailbreak approaches designed to bypass intended controls. The CISO takeaway is that AI security testing needs to evolve as attacks evolve, because a model that appears safe under one test may fail under another.

The third lesson is that AI supply-chain risk is now part of AI security. May’s threat intelligence pointed to risk across AI-adjacent development workflows, package ecosystems, CI/CD pipelines, registries, and AI-enabled developer tooling. The common pattern is trust: trusted packages, trusted automation, trusted registries, trusted tools, and trusted execution paths. For CISOs, this means AI risk does not only come from prompts, outputs, or model behavior at runtime. It can also come from the vendors, tools, packages, pipelines, and automation layers that support AI applications.

Together, these lessons point to the same conclusion: CISOs need to move from AI policy to AI measurement. Policies define what should happen, but measurement helps security teams understand what is actually happening, what risk is being introduced, and where decisions need to change.

Model risk is outpacing governance

As AI adoption accelerates, organizations are under pressure to move quickly, reduce costs, and support more use cases across the business. Teams are comparing models based on performance, latency, availability, ease of deployment, and price. Capability and cost do not tell the full story. A model that looks like a practical swap from a business perspective may introduce a different level of AI security risk, even when it comes from the same model family.

A vendor may say they use a known model. An internal team may say they are switching to a smaller model for efficiency. A product team may assume that models within the same family behave in broadly similar ways. From a security perspective, those assumptions can be risky, particularly when the model is connected to sensitive data, customer-facing experiences, business-critical workflows, or agentic systems that can take action on behalf of users.

CISOs need evidence that goes deeper than the model name. They need to understand how a model performs under adversarial pressure, how it compares with alternatives, and whether a proposed change materially affects the organization’s risk posture.

CASI makes model risk measurable earlier

CASI gives organizations a way to compare the baseline security posture of AI models before they are deployed. For CISOs, this turns model security from a subjective conversation into a measurable input for decision-making.

In practice, CASI supports several governance workflows. It helps security teams assess whether a model is suitable for a particular use case, recognizing that a model used for low-risk internal summarization may not require the same level of scrutiny as one connected to customer data, business-critical workflows, or autonomous agentic systems. It can also strengthen procurement and vendor governance by giving security teams a clearer basis for asking which models are being used, how they have been tested, and what evidence supports those choices.

If the business wants to adopt a lower-cost model, benchmarking can help determine whether that change is acceptable, whether additional controls are needed, or whether the risk outweighs the benefit.

Threat intelligence reveals ecosystem risk

The recent threat intelligence is important because it shows that AI security risk is not limited to the model itself. The most relevant pattern for CISOs is that risk is expanding into the systems around AI: developer workflows, package ecosystems, CI/CD pipelines, registries, automation layers, and AI-enabled tooling.

Developers install packages, agents execute tasks, CI/CD pipelines publish code, registries distribute dependencies, and AI-enabled tools are given access to local environments and workflows. If those trust paths are not governed, tested, and monitored, risk can be introduced before an AI application ever reaches production.

For CISOs, the lesson from this threat intelligence is that AI governance needs to include the ecosystem that supports AI development, not just the model selected or the runtime controls placed around it. Vendor reviews should ask about model sources, development pipelines, package governance, tool permissions, and dependency management. Architecture reviews should look at what agents and AI-enabled tools can access, what they can execute, and where human approval is required. CASI helps evaluate model-level risk before deployment, while ARS helps assess how agentic systems behave under adversarial pressure.

ARS measures agentic resilience

As organizations move from AI models to AI applications and agentic workflows, the measurement challenge becomes more complex. A model may appear acceptable in isolation, but risk can change when that model is connected to tools, APIs, data sources, business logic, or systems that allow it to take action.

ARS extends measurement into this operational layer by helping organizations assess how agentic systems perform under sustained adversarial pressure. These systems are not simply generating outputs; they may make decisions, call tools, retrieve data, trigger workflows, or influence business processes.

That means AI security testing needs to account for how systems behave over time and under attack, not only how an individual model performs in a controlled evaluation. ARS helps CISOs understand where agentic systems may be more resilient, where additional controls may be needed, and how AI applications behave when exposed to real-world adversarial techniques.

From policy to measurable action

The next step is to embed measurement into AI decisions from the start. AI security needs to become part of the full lifecycle, from model evaluation and vendor review to architecture design, deployment, runtime protection, continuous testing, and ongoing monitoring. This lifecycle view matters because AI risk changes over time. New attack techniques emerge, models are updated, vendors change their underlying systems, and agentic workflows introduce new paths for misuse.

For CISOs, this means AI security cannot be a one-time approval exercise. It needs to become a continuous discipline, with measurement and testing embedded across the lifecycle.

Making AI risk defensible

CISOs are being asked to support AI adoption while managing a risk landscape that is still evolving. They need to enable innovation, but they also need to answer difficult questions from boards, regulators, customers, and internal stakeholders.

Which models are approved for use? Which vendors may be introducing AI risk? Which use cases require additional testing? When is a lower-cost or faster model an acceptable substitution, and when does it create too much risk? What evidence supports those decisions?

These are not purely technical questions. They are governance questions, procurement questions, and business risk questions. Increasingly, they require objective evidence.

The organizations that mature fastest will be those that move beyond informal AI reviews and begin embedding measurement into the way AI decisions are made. That means evaluating models before they are selected, testing systems before they are scaled, and continuously reassessing risk as models, threats, vendors, and use cases change.

CISOs cannot wait until deployment to measure AI risk. The strongest decisions happen earlier in the lifecycle. By combining CASI, ARS, and ongoing AI threat intelligence from F5 Labs, CISOs can make AI security risk more visible, more defensible, and more actionable across the full AI lifecycle.

To learn more, explore the F5 Labs research.

Featured Blog Posts

Inference: The most important piece of AI you’re pretending isn’t there

How does SecOps feel about AI? Part 2: Data protection

Tags: AI Security

About the Authors

Lee EnnisSr Manager, Data Science | F5

Lee Ennis leads the AI data science and research team at F5, where he focuses on developing advanced AI security solutions that help enterprises safely deploy and monitor large language models (LLMs) and generative AI systems. His team combines expertise in machine learning, adversarial testing, and model assurance to identify vulnerabilities and strengthen trust in AI-driven environments.

More blogs by Lee Ennis