This article is from F5’s Office of the CTO which shares perspectives and analysis of the technology and trends that are a part of digital transformation.
Security automation’s promises are laudable and include reducing manual work, improving mean time to know and remediate for detection programs, and reducing a junior hire’s required technical knowledge to help address the talent issues programs still face. Many security automation vendors are especially bullish on that last promise, investing in low-to-no code user experiences that allow a junior security hire with limited programming ability to develop workflows without understanding each underlying system. For example, automating account deprovisioning for employee offboarding to ensure full coverage without digging into the details of single sign on, active directory, the email provider, etc.
The return on this investment is typically less than what the security program expects though. Low-to-no code user experiences are sufficient for many first- and second-year problems, especially if an enterprise has a lot of low hanging technical debt, but inevitably custom development will be necessary to address environment specifics like atypical vendor integrations, third-party appliances without APIs, and in-house developed applications. Detection teams will likely have to customize earlier than IT security as their data sources, detection logic, and downstream actions grow more complex more quickly. The automation workflows themselves will have to be maintained as the broader environment changes over time, requiring modifications and change management processes.
So, while the security program may not need to assign as much mid-to-senior security talent to these first-tier workflows, they will need to invest in a similarly expensive and challenging software or staff engineering talent market to build and support the security automation itself. This is not to trivialize the benefit of reducing first-tier rote workflow hours. However, we should acknowledge that this optimizes the return on workforce hours as their time allocation is shifted to higher order, higher impact work which should also drive higher employee satisfaction and retention.
This is not so different than the tooling renaissance that operational teams went through in the past decade as part of the DevOps conversation, finding new ways of improving both speed and quality of their work product (infrastructure as code, continuous integration and deployment, configuration management, etc.) but at a higher cost for this specialization. The business benefited in terms of product velocity and quality, and by attracting employees with more modern and interesting toolchains in their day-to-day work, but the business typically did not see a meaningful benefit in its aggregate workforce economics. Security automation is much earlier in its market journey and only in the past few years have security leader consumers begun to realize that the economic benefits of security automation are short of the promises.
Part of the reason for this is that many leaders confuse “automated” with “autonomous.” The perceived workforce benefits the leader bought with automation were based on the belief that it would not require the same expensive “care and feeding” from their people, allowing them to do more with less in addition to improving the speed and quality of their processes. Instead, they swapped expensive security talent for similarly expensive and difficult to retain software engineering talent.
Automation is an additive capability, not a replacement. It is another technical system with a lifespan that requires its own technical expertise. To achieve more of the promised workforce benefit, security programs must advance from automation to “autonomous automation,” finding closed-loop workflows that may safely run autonomously to proactively decrease technical risk. These are workflows that do not require human intervention, are imbued with knowledge of what should and should not happen in an environment, and are focused on discrete inputs and outputs that are safe to link without putting the environment's availability and integrity at risk.
Two examples of this:
- The Identity and Access Management (IAM) risk of impactful Account Take Over (ATO) may be reduced by autonomously revoking granted access that has not been used for some time (ex., access to an AWS S3 bucket), meaning the IAM surface area is quantitatively reduced over time. The organization’s IAM risk should only grow when it decides to grant new access, which may also be autonomous for self-service, only requiring manual review from security for sensitive access.
- Your production environment has an observability system in place that understands what third party services your applications connect out to, and your security program has a desire to restrict network egress to mitigate malware and data exfiltration traffic. Your security automation, whether built into the observability system or the egress firewall, should baseline this egress traffic and automatically implement a policy that denies all traffic (deny by default) except that which it has observed or was configured to expect.
This approach aligns with the “assume breach” principle that is a critical aspect of Zero Trust but is typically difficult for security practitioners to translate into practice1. A practical view of “assume breach” is that over time the likelihood of some security event occurring in a system, whether you classify it as a breach or incident or something else, will rise to 100% regardless of how much you invest in reducing the likelihood of that event (likelihood never gets to 0%). It follows then that security investment must include reducing or containing impact since you are guaranteed a risk will materialize. For example, despite security awareness training and email security controls you are guaranteed that an employee will click on a malicious link in an email, so you must ensure when this happens the adversary does not have a short path to their objective, such as direct access to sensitive data on the employee’s laptop, active VPN link to the production environment, a domain controller, etc.2
“Autonomous automation,” or systems working together to proactively reduce surface area and breach impact, should realize greater returns on the security program’s workforce as less time is spent on impactful security events (assume breach), security assessments take less attention as there’s less impact, and systems are coordinating basic workflows themselves. We should acknowledge that this requires vendors to implement “autonomous automation” in their products to avoid creating a new expensive “autonomous automation” specialty in the defender’s workforce. Improving the incremental workforce cost reduction is nice, but the workforce’s quality of life improvements and business’s security impact reduction, or improved cyber resiliency, is the practical strategic outcome for these investments.