Hawaii’s recent missile alert debacle had security professionals shaking their heads and asking, “How could an error like this possibly have happened?” On January 13, 2018, nearly 1.5 million residents of Hawaii feared for their lives when they received text messages from the Hawaii Emergency Management Agency (HIEMA) warning them of an inbound missile attack.
Thankfully, this alert was a mistake and there was no real danger, but the incident raises a far broader question: how many of our critical systems are this vulnerable to human error, poor software design, and insufficient security controls, all of which were factors in the HIEMA incident?
Many of the real-world systems we depend on—air traffic control systems, public power and water utilities, financial and healthcare systems, to name just a few—are considered “too big to fail.” Yet many are destined to fail because no matter how much we’d like for technology and its human operators to be perfect, that has never been the case, nor will it ever be. What new technology has even been free of defects, even after years of testing and usage? Just think about all the product recalls that happen every year in the auto industry alone. This is why certain industries (particularly those involving public safety) have far more stringent regulatory compliance requirements than others. One of the highest fines comes from the North America Electric Reliability Corporation (NERC), which can issue fines against a utility company of up to $1 million per day for security non-compliance.1
One reason many systems are destined to fail is that our insatiable demand for new and better technology has made systems (IT and otherwise) overly complex—some say too complex for humans to manage. Often these systems are so tightly coupled and interdependent that one small slip-up can quickly spiral out of control. There should be no room for error in these systems, yet errors do occur—a lesson that HIEMA learned the hard way.
We cannot afford mistakes with critical systems and infrastructure. So, what can be done to minimize them?
Adopt an “assume breach” approach.
There was a time when some organizations believed they were too small or too insignificant to be breached. Today, with so many organizations moving to web-based apps, and hacking tools being cheap and easily accessible, it’s not a question of if but when your systems will be breached, whether through failures in technology, human behavior, or both. Every organization is a potential target of attackers, and especially those that manage public safety systems and critical infrastructure.
Avoid startup-like behavior.
Security teams would also do well to apply the “precautionary principle” borrowed from environmental science: assume that a new process or product is harmful until it has been tested. In technology, we typically do the opposite. Startups, in particular, are known for putting new software and systems out there in the real world and then trying to fix them after they fail. For years, the developer’s mantra at Facebook was “move fast and break things.”2 Fortunately, Facebook and others have matured beyond this philosophy but sadly, it’s almost an expected part of the culture of tech startups.
Using new technologies to save money and provide better service is fine, but government and civilian organizations that run our critical infrastructure should not operate like tech startups. They have an obligation to provide services that citizens can (and do) depend on for their lives and well-being. They should stay away from bleeding edge technology and technology-driven processes that prioritize speed3 and time to market above safety and security. Before being put into production, critical infrastructure systems need to be thoroughly tested, and the entire process needs to be verified before going live.
Train, practice, test.
When it comes to public safety and critical infrastructure systems, self-teaching or even classroom-style training is not enough. The people operating these systems—and making critical decisions—need extensive training with simulated practice sessions, and they should be tested on their proficiency. In 2016, the Public Company Accounting Oversight Board (PCAOB)4 recognized this need when it advised Sarbanes-Oxley (SOX) auditors to start testing the effectiveness of the control owners, not just the control. The plan was to start giving out significant deficiencies to organizations who assigned control ownership to people who were not adequate control owners, however we are not sure how prevalent this plan is in current SOX auditing. Operator training should also include incident response in the event that things go wrong. Make sure rapid detection and reversal procedures are in place and tested, as well.
Design intuitive user interfaces.
Nowhere is it more important for developers to design foolproof, well-tested user interfaces than in systems that control critical infrastructure. The user interface used by HIEMA employees was blamed in part for the false missile alert. Although the general public has not seen the actual interface (the agency only released a mock-up of it, shown here), it reportedly did not clearly delineate “drills” from actual alerts. And there were no warnings built into the system asking the operator to confirm the decision to send an actual warning. Here, a simple pop-up screen saying, “Are you sure…?!” would likely have prevented this calamity.
Enforce separation of duties.
This a core principle of security operations says that no one person should ever have control of a critical process from beginning to end. Instead, certain tasks can only be completed by two or more authorized personnel. This could be as simple as someone standing over an employee’s shoulder and approving an action or having that second person physically “push the button” themselves. This practice was clearly not in place in HIEMA, and it probably isn’t in many organizations, especially those with limited security staff or informal (undocumented or lax) security policies and procedures.
Separate your development, testing, and production systems.
If it’s not obvious, separation should apply to application development, too. Applications should never be developed, tested, or changed on production systems. It’s too easy to make mistakes and push changes that could bring the system down, introduce critical security flaws, or have a negative effect on features that might impact safety or revenue. If possible, have someone other than the developers move code into the production environment to ensure processes are followed and no undocumented fixes are put into production. Failure to separate these environments has had such significant adverse impacts on organizations over the years that it is a standard security control in virtually every form of compliance regulations today.
Train employees in basic security procedures.
Most employees will never have the same security “mindset” that security professionals have themselves, but every organization should provide basic security training for all employees. It was discovered in the days following the HIEMA incident that employees had Post-It notes stuck on their monitors with passwords clearly written on them. This shouldn’t happen in any organization, much less one that manages a public safety system. At a minimum, organizations should require employees to take annual security awareness training that covers topics like password strength and security, safe web browsing and use of Wi-Fi networks, recognizing phishing and other scams, avoiding malware, protecting company confidential data—the list goes on. Training should include a testing component.
We’d all like to believe that what happened at HIEMA was just an outlier, but is it really? There’s no way to know how many other critical systems and infrastructure could suffer the same misfortune. But because we don’t know, these organizations need to do all they can now rather than later—before the attackers infest these critical systems and do real damage. The rest of us should be doing the same.