Difficult security incidents are unique and valuable opportunities. They are the sort of testing you can’t buy: real-world, un-simulated, and direct. No pen-test or code review is going to do what a serious incident will. They are priceless jewels, but only if you use them for all they’re worth.
Capturing that value is only possible if precise and comprehensive notes about what went wrong are available. During the heat of an incident, this is often the last thing that responders are concerned with, but it is critical to improving both the incident response process and improving your organization’s resiliency and security.
This requires a willingness to confront the weaknesses in your organization and a dedication to addressing those weaknesses. It takes finesse, managerial buy-in, and top-notch communication. It also requires a much more expansive view of what areas incident response should inform, including cross-team communication, service level agreements, vendors, law enforcement, internal coding standards, and access to data within your organization.
In this article, we’ll look at where this value can be captured, using the incident response model from the National Institute of Standards and Technology (NIST).1 While the process your organization uses may differ from the one described below, the general method of capturing gaps and driving improvements should be clear.
Steps of Incident Response
NIST breaks incident response into four steps, each with several activities and sub-steps. At a high level, these are:
- Detection and Analysis
- Post-incident Activity
While these steps are performed in order, each informs the others. The Post-incident Activity is used to review each of the preceding steps. This takes the form of updating policies or identifying areas of improvements in the other steps. For example, by updating tooling, improving monitoring, and adding or reconfiguring controls, the incident response process is improved.
However, these improvements usually remain focused on the incident response process itself. They often fail to address some of the most important gaps encountered, because these gaps are not technical in nature.
These gaps include cross-team communication and collaboration, access to resources and expertise from those other teams, and the delivery of requirements to the teams responsible for infrastructure, operations, and development.
Let’s look at each phase in turn.
This phase includes identifying contacts for both internal and external resources, preparation of technical tools for incident response, forensics and evidence gathering, and defining incident reporting mechanisms, as well as gathering baselines and documenting the environment.
This information quickly becomes out of date. New vendors are on-boarded without being added to the process documentation. Staffing changes mean that contacts may now have different roles. Reorganization of departments can mean that entirely different groups are now responsible for systems involved in the incident.
Delays in the incident response process often come from being unable to identify the correct people to talk to about a given system, or the inability to reach those people in a timely manner. In some cases, even when the correct resource is reached, they do not have the necessary level of access to perform the needed task or may not understand their role in the incident response.
During an incident, it is critical to make a note of where communication broke down so that these gaps can be addressed. It doesn’t matter that the correct person was eventually contacted—the time it took to get the appropriate individual engaged is the matter of most importance.
Detection and Analysis
This phase is where the skills of incident responders have the most effect, and it is typically where their attentions are focused. While their skill at detecting, analyzing, and rapidly identifying the source of attack is usually highly developed, there are often delays due to lack of access or visibility into the environment.
For example, a network-based attack may be best analyzed by collecting data at a router managed by a network operations team. It may be that the incident responders have logs from this device, but do not have access to run a packet capture on the device without engaging the network ops team.
If the responsible team cannot be contacted quickly, an important opportunity to gather actionable data may be delayed—or missed entirely.
Make note of where the speed of response slowed due to having to deploy a new sensor or gain access to a log. Consider the possibility of granting restricted access to deploy these measures to the incident response team in the future.
No less important is reviewing how the incident was initially detected and if it could have been detected earlier or easier by some other means. Frequently, attacks become a problem because detection methods proved inadequate or because a key piece of data was missing from monitoring. Improving where, how, and what data is routinely collected can make handling future attacks much easier.
While the vast number and types of incidents and attack vectors make it difficult to provide specific recommendations, a general approach to improving visibility is often fruitful. Asking not only how a specific incident could have been detected earlier but also how similar incidents could manifest is a valuable exercise. If your organization has any members who are skilled with offensive techniques, this is a place where they can provide a great deal of value.
Containment, Eradication, Recovery
This phase is where an attack is stopped, remediations and defenses are deployed, and recovery is performed on affected resources. It is the phase that involves the most contact with other teams, both internal and external to your organization.
Internal teams such as operations and network staff, service owners, and developers may all need to be called in to implement new defenses, develop and install patches, and deploy new configurations. Management will have to sign off on proposed changes, downtime, and new purchases. Communications and marketing teams will need to help develop plans for telling customers what happened. Vendors may need to be brought in to provide support, configuration, or equipment replacement. Finally, law enforcement may need to be involved to gather evidence.
Whenever this much cross-team communication and collaboration takes place, there are bound to be issues with information sharing, chain of command, authorization, and responsibility. Production timelines will be impacted and damage to the continuity of business operations is a possibility.
While remediation efforts happen, keep detailed notes on what accelerated and eased the process and what impeded it. Care must be taken to avoid blame, of course, and an emphasis on working together during a serious security incident is the best approach.
Incident response is almost never solely the activity of the official Incident Response team. With a severe enough incident, nearly every part of a company will be involved, and each should know what their responsibilities are in an incident. Regular training and review are critical for keeping the whole organization ready to respond, and may—as an unanticipated advantage—encourage a spirit of collaboration and cross-team communication that will prove beneficial in non-emergency situations.
Post Incident Activity
Traditionally, this phase is where the entire incident is reviewed, gaps are identified, and changes are recommended.
However, without good notes from each of the preceding stages, important details are missed, and the pace of improvement is slowed. It is critical that this step be done as soon as possible after the conclusion of the previous phase, while the events of the incident are fresh in everyone’s mind, when resources are still assigned, and when the context for the notes and records that were hopefully created during the previous phases still exist and can be addressed with a sense of urgency.
Should this step be delayed, process improvements will be given less priority as the business returns to normal operations. Important details will fade from memories.
If the full value of an incident is to be gained, this phase cannot become an afterthought. It must be given an equally high priority as the others, and the incident should not be considered closed until the outputs of this phase are reviewed, implemented, and tested.
Further, a robust incident response process will create detailed and informative documentation to be delivered to senior staff, addressing specifically the cause of the incident, its effects, how it was managed and addressed, and where the process worked and failed. This should be compiled by the incident response team, but with input from all the other teams involved. Buy-in from other teams that support the suggested changes goes a long way to help prioritize improvements that will help the entire enterprise.
Practice Makes Perfect
Even in the most mature organizations, incident response is only effective if it is regularly drilled and tested. The incident response team itself may be highly practiced, but equally important parts of the organization almost certainly are not. Practice can be done with small activities, such as by reaching out to other team members and asking for a specific document, by requesting logs from a specific machine, or even engaging in informal conversations that provide insight into role changes and newly deployed devices.
If All You Do is Fight Fires, That’s All You Will Ever Do
Ultimately, no process is perfect, and no environment has visibility into every system. There will always be moments in incident response where a tool will fail, a log will be unavailable, or an important detail will be missed. However, with detailed notes during incidents, training across the whole organization, well-established and practiced lines of communication, and a spirit of collaboration, organizations can prevent fires—as well as fight them.