5 Questions to Ask to See If Your Security Team is Cloud Incident Ready

The F5 2019 State of Application Services Report noted that more than half (53%) of respondents were more confident about protecting applications on premises than in the public cloud (38%). It is normal to be uneasy about cloud security. Security in the cloud is a double-edged sword: it can render traditional security measures impotent, but it can also enable powerful new techniques. IT security teams need to know what they’re doing since the same old, same old won’t cut it. Immature cloud practices lead to expensive security breakdowns in the form of critical application outages, bloated operational costs, and data breaches.

But what makes cloud so different? Well, for one, there’s nothing tangible to touch–everything in the cloud is software glued together via application programming interfaces (APIs) that use tons of automation. Many security tools rely on network pathways and endpoints that are monitored or gated, but in the cloud, all software objects are online. Your mistakes are more noticeable to attackers because there is no perimeter to hide behind. Important data stores and applications can quickly become Internet-exposed via the slightest misconfiguration.

Access control and visibility, which are already a challenge for traditional IT and security teams, take on the highest importance. A good security program must include access control, role management, and logging of users. Just as you do on premises, so should you do in the cloud. This means applying these principles to the connections between differing public cloud infrastructures, as well as connections with APIs that interface with your application. Invisible processes, including security functions, also need to be tracked and configured correctly.

How can you be assured this is all being done correctly?

Talk to the tech folks about their cloud security strategy. Cloud security should be an embedded attribute of the overall cloud strategy, which itself should be one aspect of the entire IT strategy. Strategies for IT, security, and cloud should not simply be silos bolted together, but part of a holistic plan addressing every part of IT.

Pay attention to how the plan was derived and to the underlying principals—if you hear “We can do it ourselves,” this is a warning sign that suggests the plan was based on apprehension rather than strategy. To be effective at cloud security, you need to move at the speed of your business.

The business often expects the cloud providers to take care of security requirements. However, providers use a “shared responsibility model” for security, which means they do some, and you do some. The different security roles and responsibilities of the business and cloud providers need to be very clear.

Given this background, let’s look at ways to manage risk. Security people call these controls, which are things like firewalls and passwords, but we’re going to speak at a strategic level about control objectives—this is a more abstract way that focuses on outcomes without getting down into the weeds of the implementation.

There’s a lot to figure out here, so to keep it simple and actionable, we’ve formulated this into five questions to ask the IT and security teams. Use your natural executive baloney detector—probe for details on their answers.

Question 1: Can you describe our attack surface and how have you reduced it to the bare minimum?

Your attack surface consists of all of your stuff that can be seen and touched by attackers–it’s your visible silhouette on the target range that is the Internet. So, naturally, smaller is better. But most organizations, cloud or otherwise, do not have an accurate and up-to-date picture of their attack surface. How do you get that? Well, as we’ve said more than once, to secure your network, you must know your network.

It is very easy to start using the cloud. But using the cloud without a strategy in place can lead to shadow IT and cloud sprawl, where everyone is putting stuff up, building their own infrastructure, and generally smearing apps and data around like mud tracked all over the carpets. This is far from optimal from a security standpoint. In order to have security in a cloud environment, you need visibility on all these moving parts. How can you get the visibility you need when nothing is local or, worse, things are spread between multiple clouds and/or on-premises (a.k.a. hybrid) cloud? The good news is that since everything in the cloud is code, every configuration, server, network, account, and data file can be found by reviewing that code. The bad news is that doing this by hand is inefficient, ineffective, and slow. This process needs automation, especially since you should be doing this continuously. If your organization uses manual inventory and audit processes for cloud resources, a big red flag is waving in your face.

Once you have an automated audit process in place, the obvious thing to do with the information gleaned from that process is to minimize exposure and reduce the attack surface. You should be looking for unexpected ingresses and misconfigurations. Any deviations from your expected baseline—whether they are abandoned projects, pilots promoted to production, maintenance/debugging, temporary changes left around, unchanged defaults, or something else unexpected—should be flagged and automatically corrected. Cloud-native tools (supplied by your cloud provider) are a good place to start, but often you still need to go deeper and get more intelligence. Remember, apps are complicated.

Question 2: How are we managing access control?

Security people speak of AAA (“triple-A”), for Authentication, Authorization, and Accounting. This is a fancy way of saying that you know who everyone is, you know what they can do, and that you record everything they do. This goes beyond making sure everyone has a good password. At the highest level, it functions as a single place where administrators can go to create and suspend all logins, manage their rights, and see what every account is doing.

You (hopefully) already do this for your on-premises systems. It’s worth exploring the possibility of extending these efforts to the cloud, as managing a separate access control infrastructure in the cloud means duplicating efforts of employees requesting access, IT operations managing accounts and permissions, and significantly increasing your security team’s access review requirements. Access control touches everyone in the company, and it is the most time-consuming control objective. You don’t want to have to do this over and over, so plan carefully.

Question 2a: Tell me about our authentication systems.

To begin: all of your systems need authentication. This sounds obvious, but when we are talking about cloud security, it has greater significance than might be obvious at first. In the context of zero-trust and perimeters that look like fractals, there are very few systems, if any, that do not need some kind of authentication. As more and more environments rely on APIs and webhooks to tie together disparate services into a packaged experience, more of those component services connect with one another over the same Internet that attackers use. So, when we say all your systems need authentication, it really is a recommendation that your team go and review exactly what is connected to the Internet and ensure that some form of authentication is in place for each of these systems.

Question 2b: Tell me about our authorization systems.

In the case of user authorization, it is important to balance speed and agility with security. Some cloud deployments look like the early days of on-premises endpoint sprawl, in which every single user in the company “needed” full admin privileges in order to do their job. The capabilities that the cloud offers—and the enormous amounts of money that those capabilities represent—make these kinds of quick workarounds look tempting. It is easy to clone user accounts and accompanying permissions for the purposes of scaling up development or testing work. However, malware, especially ransomware, has taught organizations why that is a bad idea.

There is also the question of machine authorization. Very few servers or services should require admin privileges to perform their roles. As we noted in our Application Protection Research Series piece on APIs, restricting API permissions to specific HTTP methods is a start. It is even better to define acceptable sequences of actions and flag deviations from those sequences for review.

Question 2c: Tell me about our accounting systems.

Accounting is probably the aspect of access control that is most neglected. Accounting aids in situational awareness, forensics, and auditing. The added visibility also helps shape the design of other controls, as well as the implementation of least privilege, since it tells you, in practice, who is actually connecting to what assets.

The first step of access control accounting in the cloud is to take responsibility for accounting rather than assuming it will be taken care of by the vendor. Cloud vendors do not necessarily have the capability to log all of the pertinent information you need to run your systems, and if they do, it is not necessarily turned on. Verify that you’re getting the information you need as early as possible—ideally while vetting vendors or setting the cloud architecture up for the first time.

Next, you must configure your logging to suit your needs. Every organization is unique, so no logging solution is going to capture what you need out of the box—a large number of alarms are either false or insignificant. We recently spoke to a security team that told us their anti-phishing tool had a 95% false positive rate. Finding the right balance between false positives, false negatives, and analysis time can be difficult, and can take time to get right.

In general, budgeting time for access control accounting is important. This will be both a significant one-time project and an ongoing maintenance task, as both systems and needs will evolve over time.

Question 3: How do we mitigate the most likely threats?

There are lots of tools that can be helpful in security for cloud environments. Many of them are good. None of them will automatically solve your problems. We geeks love our toys and tools, but make sure that you choose your technological gadgets and processes specifically to match both what you are running and what is coming after you. In 2018, we found that the top threats to web applications were attacks against the access tier, such as phishing or credential stuffing, and injection.¹ In addition to minimum baseline controls, therefore, ensure that robust access controls are in place (as noted above), and be sure to have a robust vulnerability management program in place that prioritizes injection flaws.

One of the key challenges of adapting existing environments to the cloud—and all the various changes that come with that concept—is that traditional controls and practices may not apply. It is a bad idea to just take some of the old fashioned “data center” tools, move them into the cloud, and expect them to work as effectively as they did in their original environment. Instead, the key is to take a big-picture approach. Focus on what you were trying to achieve with the previous control—not the specific technology or process. There are controls that can yield the same objective in ways that are more suited to the cloud.

There are also new capabilities that can work in your favor and raise standards of observability and efficiency. Much of what used to be manual can now be automated, which means that as cloud environments scale up and begin to sprawl in terms of complexity and configurations, there are scalable, automated security processes that can mitigate the risks. Detecting changes and throwing away systems that deviate from a baseline should be an entirely automated process.

The F5 2019 State of Application Services Report also found that nearly half of organizations (48%) with a digital transformation initiative are troubled by the difficulty of achieving consistent security for applications distributed among multiple cloud platforms.² If your environments are “hybridized,” meaning any combination of multi-cloud or a mixture of on-premises, virtual private cloud (VPC), and public cloud, you need to make sure the control objectives are met adequately in all environments. This may entail different controls and tools in each environment—which is why it is so important to build from control objectives and not just specific control implementations.

Question 4: What do we do when systems or security controls fail?

Despite what you’ve been told, automation will never replace humans and will never be able to deal with every real-world scenario. The same is true for security defenses. Given the size, complexity, and transitory nature of most large IT deployments, it is wise to prepare for failures. Even before the rise of the cloud, organizations cobbled together components as cheaply and quickly as possible to meet business needs. As organizations grow, computing environments grow organically as well. That organic growth precipitates an increased likelihood of bugs, incompatibilities, and misconfigurations. Expect technology to fail. Expect people to fail. Even expect your processes to fail. This stance is called assume breach, and it requires that you treat security failures as inevitable.

What does this mean in practical terms? First off, when asking your IT or security team this question, the two wrong answers are “it’ll never happen” or “we don’t know.” The basic principle of assume breach should inform all design, policy, and defense plans. This means looking for faster detection, better containment, and quicker recovery processes.

This assumption of failure also extends to partners, especially the cloud provider. Clear delineation of responsibility and liability needs to spelled out for when things go wrong. This also means that key third parties should be carefully assessed. One of the worst things to do is carelessly presume that nothing will ever go wrong and, therefore, not plan for failure. Things will go wrong, and without a deep, well-tested response strategy, the problem will be compounded. Which brings us to the next question:

Question 5: What kinds of response plans are in place?

In the event of a security incident, when everyone is running around with their hair on fire, a plan that is tailored to your environment and, most of all, practiced by your response team can make all the difference. First, be sure to listen for vulnerability or breach reports from outside your organization. Many “hackers” will reach out to you before going public, given the opportunity.

Next, having an up-to-date and precise inventory is key to a successful response to a security incident, as it will give you an idea of the required remediation tasks and an understanding of where to begin. This applies not just to your own systems, but to those of vendors and partners as well.

Question 5a: Tell me about our Incident Response Strategy.

Responses to critical vulnerabilities need to be particularly fast in the cloud, since elasticity and automation could scale up your attack surface exponentially. Because of the need for rapid action, creating playbooks with coordinated responses can help. It is a good idea to develop specific playbooks for known scenarios such as ransomware, natural disasters, or zero-day attacks, as well as to build general response algorithms for less cut-and-dried incidents. These playbooks should provide guidance for critical decisions as more information about a particular security incident becomes available. These include decisions such as whether to keep the system live versus shutting it down to prevent more damage, when to contact law enforcement, when to release a statement, and what to include in each statement at each point.

The playbook should also include resources and workflows that will aid in forensics and investigation. This is the point when the logs that we discussed above will earn their keep.

Question 5b: Tell me about our disaster recovery strategy.

Disaster recovery in the cloud should be straightforward. As more and more of the overall environment turns to code, it should be easier to reset to default and fail forward. This will, of course, only work in the moment if you’ve tested your failure modes, so build your environments with the intent of doing routine failover and backup tests.

As we noted above, there is no argument for keeping virtual machines or containers that deviate from standard configurations.³ If a virtual machine, container, or service looks unusual, it is best to throw it away and start again.

Question 5c: Tell me about our Public Relations Strategy.

Your security incident response playbook should include a public relations response strategy. This should entail not just looping in legal and communications teams, but also should describe how the PR response will evolve over time as the gravity and cause of the incident becomes more clear. It is important to be certain who is authorized to speak to customers, other employees, the press, and law enforcement. Be sure to make clear who has executive authority over the incident and ensure that messaging goes through individuals with that authority.

These days, as the number of breaches grows inexorably, it is important to be factual and clear when communicating about incidents. Vagueness or obfuscation, either of technical details or impact, tends to get a very cold reception. We are in favor of transparency about breaches—even to the point of radical transparency—in an era when customer frustration about privacy remains dormant right up until the point that risk is realized. Of course, that’s easy for us to say; you should also consult with executive leadership and legal when choosing the level of transparency that is right for you.

Conclusion

To sum up: from a security standpoint, the cloud is neither better nor worse—it’s different. It presents opportunities to improve practices and tools that, in all honesty, should have been shelved years ago. It also presents difficulties in visibility, and vagueness surrounding boundaries of responsibility.

Applying existing controls to this new environment is not going to be terribly successful. Listening to the hype and assuming that the cloud has it all sorted for you is even more dangerous. However, if you use the scalability, automation, and integration capabilities that it offers to reimagine new solutions to your old control objectives, it is possible to simultaneously enjoy the business and operational benefits of the new capabilities, and bring your risk management in line with a standard that, in the past, only the best few were able to maintain…at least until the cloud providers change everything and we have to start over.

The renowned security and cyberintelligence researcher, the Grugq, has noted that “if you want to improve anything at all, tighten your feedback loops.”⁴ The manner in which your organization chooses to engage (or not engage) with cloud capabilities will determine whether your new environments bring you more information or less. Since your developers are almost certainly going to the cloud in some form already, it is past time to ask these questions and start gathering as much feedback about your operations as you can.

Footnotes

¹ Between 2017 and 2018, attackers largely shifted away from SQL to other forms of injection, particularly formjacking. For a deeper discussion on emerging injection tactics and modern architectures, see https://www.f5.com/labs/articles/threat-intelligence/application-protection-report-2019--episode-3--web-injection-attacks

² https://www.f5.com/state-of-application-services-report/interactive-report-2019

³ Increasingly, data centers no longer consider bare metal servers, or even entire shipping containers of servers, worth fixing. They just let them fail in place and remove them when the entire unit is no longer sufficiently productive to be worth the operational costs. Whether that approach to hardware and raw materials is sustainable in the long run remains to be seen, but for the moment even bare metal is disposable.

⁴ https://medium.com/@thegrugq/opaque-at-both-ends-bb3e2d6e0d58