Hawaii’s recent missile alert debacle had security professionals shaking their heads and asking, “How could an error like this possibly have happened?” On January 13, 2018, nearly 1.5 million residents of Hawaii feared for their lives when they received text messages from the Hawaii Emergency Management Agency (HIEMA) warning them of an inbound missile attack.
Thankfully, this alert was a mistake and there was no real danger, but the incident raises a far broader question: how many of our critical systems are this vulnerable to human error, poor software design, and insufficient security controls, all of which were factors in the HIEMA incident?
Many of the real-world systems we depend on—air traffic control systems, public power and water utilities, financial and healthcare systems, to name just a few—are considered “too big to fail.” Yet many are destined to fail because no matter how much we’d like for technology and its human operators to be perfect, that has never been the case, nor will it ever be. What new technology has even been free of defects, even after years of testing and usage? Just think about all the product recalls that happen every year in the auto industry alone. This is why certain industries (particularly those involving public safety) have far more stringent regulatory compliance requirements than others. One of the highest fines comes from the North America Electric Reliability Corporation (NERC), which can issue fines against a utility company of up to $1 million per day for security non-compliance.1
One reason many systems are destined to fail is that our insatiable demand for new and better technology has made systems (IT and otherwise) overly complex—some say too complex for humans to manage. Often these systems are so tightly coupled and interdependent that one small slip-up can quickly spiral out of control. There should be no room for error in these systems, yet errors do occur—a lesson that HIEMA learned the hard way.
We cannot afford mistakes with critical systems and infrastructure. So, what can be done to minimize them?
Adopt an “assume breach” approach.
There was a time when some organizations believed they were too small or too insignificant to be breached. Today, with so many organizations moving to web-based apps, and hacking tools being cheap and easily accessible, it’s not a question of if but when your systems will be breached, whether through failures in technology, human behavior, or both. Every organization is a potential target of attackers, and especially those that manage public safety systems and critical infrastructure.
Avoid startup-like behavior.
Security teams would also do well to apply the “precautionary principle” borrowed from environmental science: assume that a new process or product is harmful until it has been tested. In technology, we typically do the opposite. Startups, in particular, are known for putting new software and systems out there in the real world and then trying to fix them after they fail. For years, the developer’s mantra at Facebook was “move fast and break things.”2 Fortunately, Facebook and others have matured beyond this philosophy but sadly, it’s almost an expected part of the culture of tech startups.
Using new technologies to save money and provide better service is fine, but government and civilian organizations that run our critical infrastructure should not operate like tech startups. They have an obligation to provide services that citizens can (and do) depend on for their lives and well-being. They should stay away from bleeding edge technology and technology-driven processes that prioritize speed3 and time to market above safety and security. Before being put into production, critical infrastructure systems need to be thoroughly tested, and the entire process needs to be verified before going live.
Train, practice, test.
When it comes to public safety and critical infrastructure systems, self-teaching or even classroom-style training is not enough. The people operating these systems—and making critical decisions—need extensive training with simulated practice sessions, and they should be tested on their proficiency. In 2016, the Public Company Accounting Oversight Board (PCAOB)4 recognized this need when it advised Sarbanes-Oxley (SOX) auditors to start testing the effectiveness of the control owners, not just the control. The plan was to start giving out significant deficiencies to organizations who assigned control ownership to people who were not adequate control owners, however we are not sure how prevalent this plan is in current SOX auditing. Operator training should also include incident response in the event that things go wrong. Make sure rapid detection and reversal procedures are in place and tested, as well.