Last week, a bug introduced in a routine product update for the endpoint detection and response product Crowdstrike Falcon prevented Windows computers around the world from starting without resorting to a clunky and manual process. The incident became the largest information technology outage in recent memory, taking down airlines, airports, 911 call centers, hospitals, and other businesses around the world. Bad updates happen. Crowdstrike is working the problem.
Early communications from Crowdstrike CEO George Kurtz claimed two things that we dispute: that this was “not a cybersecurity incident” and that Crowdstrike customers “remain protected.” This post is not about semantics. It matters that executives understand why those statements are untrue and unhelpful.
Supply Chain Risk
As highlighted by the SolarWinds hack of 2020 (in which software relied upon by business and government fell victim to an issue with a security tool), software supply chain problems can wreak havoc on systems, leading to loss of availability, system breaches, data theft, and other security incidents. The Crowdstrike update caused a prototypical supply-chain vulnerability case study in which organizations running Falcon found that the tool they use to control access to machines became the cause of the loss of those machines’ availability. Organizations must become adept at handling software supply chain incidents, as these have become more frequent and often have quite high impact criticality.
Availability is Security. Lack of it is an Incident.
When a system, or a service, or an application, or data that you intend to be available as part of your business is not available, that is a lack of availability. The US National Institute for Standards and Technology Computer Security Research Center (NIST CSRC) defines a computer security incident as:
“An occurrence that results in actual or potential jeopardy to the confidentiality, integrity, or availability of an information system or the information the system processes, stores, or transmits or that constitutes a violation or imminent threat of violation of security policies, security procedures, or acceptable use policies. See cyber incident. See also event, security-relevant, and intrusion.” [Emphasis added]
Unfortunately Phrased
While it was almost certainly Kurtz’ intention to state that the incident was not a cybersecurtity attack — almost certainly to differentiate the self-inflicted foot-wound that caused the Crowdstrike event from the nation-state sponsored attack that led to the SolarWinds debacle - to tell executives that this is not a “cybersecurity incident” may have led to some executives diminishing the impact of the issue, and not authorizing use of incident response procedures that can help companies improve how they handle incidents. That is a terrible outcome.
And make no mistake: Kurtz’s implication that Crowdstrike Falcon customers “remained protected” was just flat out wrong: you’re not protected when the tool you use to protect machines renders them entirely unusable.
If You Do it Right, IR Makes You Better At IR
Companies with more mature, rehearsed, organized, and well-managed incident-handling capabilities can move faster to triage, mount a response, solve the issues, communicate to internal and external stakeholders, determine root causes, and iteratively improve their incident handling based on novel conditions. Simply put, companies that handle incidents better have fewer incidents and lower overall incident impact. Investment in incident handling doesn’t just make you handle the next incident better; it makes you better handle every incident you will ever have.
Root Cause Analysis Informs Vulnerability Management
Firms that did run a formal IR on the Crowdstrike fiasco would almost certainly enjoy findings about their vulnerability and patch management procedures that might not have been obvious previously. Certainly, better-organized patch management that begins rollouts of patches on lower-criticality resources to observe if any problems arise would have fared better in this situation. There are countless other lessons here that pertain directly to how organizations manage their vulnerability and patching programs that can make them more resilient in the future, provided they note the lessons and iterate their procedures to reflect these observations and conclusions.
Communications Integrity
Some have said that, at the level of Crowdstrike, it’s hard for the CEO to communicate properly in an incident because the communications and legal people won’t let them. We couldn’t disagree more strongly. And the heartfelt, if long, post from CSO Shawn Henry demonstrates that Crowdstrike executives absolutely can take responsibility and apologize. Our only comment would be to get quicker to the part where they are earning back trust and stay a little less focused on your personal story.
No incidents are so bad that they cannot be made worse by dishonest, inaccurate, misleading, or self-serving communications. Your communications, at the worst of times, are the measure of your company’s integrity and its value.
Some have said that, at the level of Crowdstrike, it’s hard for the CEO to communicate properly in an incident because the communications and legal people won’t let them. We couldn’t disagree more strongly. There are no incidents so bad that they cannot be made worse by dishonest, inaccurate, misleading, or self-serving communications. Your communications at the worst of times is the measure of your company’s integrity and its value. And remember, Kurtz is the Crowdstrike CEO. If he’s listening to bad comms advice and bad legal advice, it’s because his comms and legal people are telling him things he wants to hear.