AI adoption is accelerating faster than any technology before it.
What started as a few large models and vendors has proliferated into a vast ecosystem of open-source and commercial AI models, each with their own advantages and risks. With millions of models to choose from, enterprises adopting AI need transparent risk insights that show exactly what threats each model brings into their environment.
Following F5’s acquisition of CalypsoAI, we are excited to introduce the Comprehensive AI Security Index (CASI) Leaderboard to give AI and GRC leaders detailed insights into the different risk compositions of the most prominent AI models. Founded in 2018, CalypsoAI has been a pioneer in AI security research, creating one of the largest AI vulnerability libraries and regularly updating it with 10,000+ new attack prompts each month. From this foundation, the leaderboard testing holistically assesses base model and AI system security, focusing on the most popular models and models deployed by our customers.
These tools were developed to align with the business needs of selecting a production-ready model, helping CISOs and developers build with security at the forefront. The leaderboards cut through the noise in the AI space, distilling complex model security questions into five key metrics:
CASI is a metric developed to answer the complex question: “How secure is my model?”. A higher CASI score indicates a more secure model or application. While many studies on attacking or red-teaming models rely on Attack Success Rate (ASR), this metric often overlooks differences in impact of each attack. Traditional ASR treats all attacks as equal, which is misleading. For example, an attack that bypasses a bicycle lock should not be equated to one that compromises nuclear launch codes. Similarly, in AI, a small, unsecured model might be easily compromised with a simple request for sensitive information, while a larger model might require sophisticated techniques like autonomous and coordinated agentic AI attackers to break its alignment. CASI captures this nuance by creating distinctions between simple and complex attacks, and establishing a model’s Defensive Breaking Point (DBP); the path of least resistance and minimum compute resources required for a successful attack.
Standard AI vulnerability scans provide a baseline view of model security but only scratch the surface in understanding how an AI system might behave under real-world attacks.
To address this gap, we leverage F5 AI Red Team, a sophisticated red-teaming technology commanding swarms of autonomous AI agents which simulate a team of persistent, intelligent threat analysts. These agents probe, learn, and adapt—executing multi-step attacks designed to reveal critical weaknesses that static tests often miss.
This rigorous testing process produces the AWR Score, a quantitative measure of an AI system’s defensive strength, rated on a scale of 0 to 100. A higher AWR score indicates that a system requires a more sophisticated, persistent, and informed attacker to compromise it. This benchmarkable number, derived from complex attack narratives, is calculated across three critical categories:
Our team at F5 Labs has a detailed analysis on the latest trends observed in our September testing. For in-depth insights into the techniques, vulnerabilities, and exploits on the rise, check back each month to stay up to date on the latest trends in AI security.
The AI attack surface will continue to evolve, and F5 is committed to empowering organizations with the insights they need to adapt AI security in stride. As is the case with any new technology, AI will always carry with it a “non-zero” degree of risk. The first step to comprehensive AI security is understanding where risks exist, and the CASI Leaderboards will continue to shape that understanding as the AI model landscape continuously shifts.
Interested in more insights? The same agentic red-teaming we use to evaluate base models can be applied and tailored to your AI environment for even more in-depth insights with F5 AI Red Team.