Jared Reimer is the founder and CTO of Cascadeo. His current areas of focus include large-scale distributed systems, cloud architecture and migration, deployment automation, predictive analytics, and development of Cascadeo’s next-gen managed services platform technology. Prior to Cascadeo, Jared worked extensively in leadership roles at telecommunications and Internet services and hosting providers. He holds a master’s degree in computer science and engineering from the University of Washington. Jared has served in board and advisory roles to the Seattle Internet Exchange, one of the world’s largest free IXPs, since its inception twenty years ago.
Every layer of the stack has been compromised, methodically and with astounding results. Even “air-gapped” systems, such as those in Iran targeted by the Stuxnet cyber-weapon, are no longer guaranteed to be immune to remote intrusion. Individual whistleblowers (most notably, Edward Snowden) and numerous unintentional leaks of sensitive NSA data have further demonstrated that even the most sophisticated security systems can be undone by one human making the wrong decision at the wrong moment. There is no amount of money or effort that can be spent fully securing electronic data or systems—period. Anyone who claims to be selling a silver bullet might as well be selling snake oil.
This insight can be difficult for many information security professionals to accept, as the history of the industry has been filled with processes, audits, certifications, and vendors all aiming at preventing what we now know to be inescapable reality. Shifting an organization’s thinking from yesterday’s perimeter-focused security model can be a very tough sell, especially for those ultimately held accountable when things go wrong. The CxO who refuses to accept today’s reality, however, is doubly at risk of being caught unprepared and overwhelmed when a security incident occurs, and thus should be particularly mindful of the situation as we now know it to be. Repeating the same infosec approach used a decade ago is functionally identical to burying your head in the sand and hoping things don’t go wrong on your watch.
Many of the most significant issues with conventional IT SecOps are self-inflicted wounds, borne not of malice but of ignorance and complacency. Virtual machines and bare-metal servers are usually permanent fixtures that run perpetually, rather than being replaced programmatically. This means that compromised systems are likely to remain compromised, often silently, for a very long period of time. Anti-malware/IDS systems are typically looking for a signature or known pattern, rather than looking for deviations from the system’s normal operating profile. This means that these systems are only effective against problems they have seen before, and are easily defeated by “zero day” exploits—which we now know are quietly stockpiled and deliberately weaponized by our own government and many others. Employees generally assume the systems haven’t been breached until there is some reason to believe otherwise. Non-technical (and even some technology) executives tend to overestimate the infrastructure, skill set, compliance, etc. of their infosec teams and infrastructure, and are routinely surprised by predictable negative outcomes.
If you fully accept that you cannot keep the most high-value data secure, you begin to think differently about where, how, and when you store and share that data. Buying anti-malware software and firewalls isn’t nearly enough—if anything, it provides a false sense of security and often enables the wrong behavior. Static security perimeters are no match for the nation-state hackers and other non-governmental adversaries who are motivated to get your data or exploit your infrastructure for other purposes. The situation we are all faced with dictates a very different strategy.
Perhaps the four most important activities for executive leadership involved with security are:
1. Accept that breach is inevitable and design for it. This is often accomplished through decomposing complex environments into isolated microservices. By doing so, you limit the extent of information exposed in a breach. The goal is to fully understand the blast radius for any potential compromise, and to have a predetermined response to it rather than being blindsided by it. Cascading failures are to be avoided at all costs. Microservices should be developed, deployed, scaled, and operated independently of each other, and should have a healthy degree of mutual distrust—even behind the firewall.
2. Frequently redeploy infrastructure via end-to-end automation and configuration management. With frequent automated deployments, operational expenses and downtime decrease while business agility increases. This approach solves most disaster recovery (DR) and business continuity (BC) challenges implicitly and, as an added bonus, enforces best practices around configuration management. Most importantly, it dramatically limits the potential for long-term compromise. This is diametrically opposed to “lift and shift” virtual machine migrations to cloud and requires real discipline to implement and enforce. It is as much a cultural change as it is a technical one, requiring ongoing participation by developers, operations personnel, and leadership.
3. Leverage machine learning and predictive analytics. Most enterprise IT infrastructure follows predictable patterns in terms of the many time-series metrics, Netflow data, and event logs produced. Rather than defining thresholds and hoping they cover all the bases, it is far more efficient to use machine learning to understand the normal patterns in these workloads and to flag anomalies—rather than waiting for an outage or incident before alerting after the fact. Accomplishing this means a commitment to instrumenting every layer of the infrastructure and pipelining that data into a common analytics framework, rather than implementing siloed point solutions for different aspects of the environment. This, too, is an ongoing program requiring human oversight and skilled talent, and not a one-shot deployment of an application or appliance.
4. Get serious about cloud/SaaS governance, and provide great self-service options. Prohibitions against moving data outside of the perimeter (for example, uploading to Google Drive, Google Docs, Dropbox, Slack chat, S3 buckets, etc.) are generally not enforceable. Merely documenting a policy prohibiting self-service in an employee handbook is no excuse for providing viable alternatives proactively. Giving people the tools they need to do their jobs greatly reduces the risk of employees “going rogue” and using unauthorized cloud or SaaS services. While data loss prevention (DLP) tooling can be useful, it is not an adequate replacement for proper planning, governance, and training as to the use of third-party SaaS, IaaS, and other services. Ubiquitous encryption and “bring your own device” policies make surveillance of data movement nearly impossible to achieve consistently. Again, the correct posture is to assume breach and plan for it in order to avoid a sudden crisis scenario. Banning the use of all external SaaS and cloud-based services is not realistic for most companies; proper governance and approved tooling greatly reduces the risk of a well-intentioned employee unintentionally causing a large-scale breach.
There are many more aspects to modern infrastructure security, but the four points identified above cover the most important ones today: full-stack telemetry, short-lived infrastructure, proper cloud governance, and severely limiting the blast radius. By discovering anomalies (rather than looking for past signatures and patterns), eliminating permanent fixtures in favor of frequent redeployment, providing great self-service SaaS/cloud options, and decomposing solutions into discrete microservices, the impact of a breach often can be immediately quantified and contained.