Equifax was really a DevOps problem
If you have not heard it yet then all your news feeds must have been turned off or you must have been on a silent retreat. It was recently announced that Equifax will have to pay up to $700,000,000 as a result of their enormous data security breach. It was one of the largest security breaches in US history. The cause of the breach was a failure to implement DevOps correctly. What went wrong, and how can you prevent a similar disaster happening to your company? Side note: at the time I am writing this Capital One is experiencing its own disastrous security breach but I’ll save that story for another day.
How a lack of DevOps caused the disaster
If you have ever flown on a commercial airline you know that the cockpit has two seats. One for the pilot and one for the co-pilot. I personally like knowing that there is a level of redundancy in the cockpit ensuring that every human action is double-checked. The aviation industry figured out a long time ago that there is no room for error when you are carrying precious cargo and every measure must be taken to ensure safety. But even then problems can occur if you are holding on to a cultural legacy that burdens your ability to perform (implied - you need more DevOps). If you read Malcolm Gladwell’s book, Outliers, then you are already familiar with Gladwell’s account of what happened to Korean Air. As in the case with Korean Air’s problems back in the ’80s and ’90s, companies need to open up their communications channels, eliminate technical debt, and start taking DevOps seriously if they want to avoid a total disaster. Korean Air fixed their problems and are now ranked as a world-class airline, and you can fix your DevOps problems too.
So how did it happen?
The short answer is a server was not patched properly. The longer answer is that this happened because a standard DevOps process was not being followed and the train went off the rails. All software needs to be updated on a continuous basis. All software is written by humans and humans make mistakes. In this particular case, the software in question was open-source and you can find out about the Apache Struts CVE-2017-5638 vulnerability that caused this by clicking here. Does this make open-source bad? Absolutely not, 80% of commercial enterprises are using at least some open-source software and there are many reasons to do this. However, this is no reason not to have proper open source management and security tools. You would think that Equifax would have noticed some anomalies due to this breach but that didn’t happen either. Why? Because the device that was used to monitor network traffic was inactive for 19 months because of an expired security certificate. Once the expired security certificate was updated the monitoring device began to function and eventually noticed the traffic related to the breach.
Wizards are not Super Heros
It would be hard to find someone who doesn’t love a story where someone with supernatural talent or power comes in and saves the day. We feed endlessly on these types of movies - Hello Avengers. You probably know an IT wizard that seems to have all the answers and holds the keys to the kingdom. That’s alright in fairy tales but when it comes to running a commercial IT organization there is no room for IT wizards. In fact, they are downright dangerous. IT wizards are not superheroes. They are human and they will eventually make a mistake. In the case of Equifax. Equifax's former CEO, Richard Smith, actually had the nerve to blame a single IT person for failing to patch the server. He must have thought he was an infallible IT wizard. Really, were there no co-pilots in the organization to double-check that all the proper procedures and policies we being followed. Even Batman had Robin as a fail-safe. Where were all the tools that are specifically designed to prevent these disasters. This is bad engineering and a blatant disregard for a solid DevOps process.
The House Oversight and Government Reform Committee have released a staff report after a 14-month investigation into the Equifax breach that has been well documented as one of the largest security incidents in U.S. history, affecting over 148 million consumers. As most readers are already aware, Equifax failed to patch a known vulnerability in Apache Struts, a very commonly used open source Java web framework.
The committee sifted through over 122,000 pages of documents and interviewed three former Equifax employees, who were directly involved with Equifax’s IT teams, to help their investigations and create the report which you can read in full here.
The report findings also describe an “execution gap between IT policy development and operation”. This information paints a picture of Equifax running siloed engineering, operations and security teams that do not integrate operational or security practices into their development workflows or application life-cycles.
These are the core issues which DevSecOps aims to solve, by shifting much of the security testing left allowing vulnerabilities in code and open source libraries to be identified and fixed as early and as quickly as possible. The security responsibilities are therefore shared across *all* developers, and security tests are baked into all stages throughout the development workflow. This is a stark contrast to how Equifax relied on a single member of staff.
The Solution: Integrating Security into the Development Workflow
Let’s use good DevSecOps practices to show how the Equifax breach via the vulnerable Struts library could have been prevented by a development team, rather than relying on an individual. There were two days between the Apache Struts vulnerability disclosure and the first exploit on the Equifax application. Let’s work our way from development through to production to see where this security issue should have been identified and fixed.
Development: If the application code was still being actively developed, development teams would be locally developing, building and testing the application. Integrating security testing to identify vulnerable dependencies would flag issues via notifications in IDEs and builds, making it clear to whole teams of developers that the new vulnerability exists as well as offering automated remediation advice via pull requests or directly in IDEs.
CI: Any new build run by a CI server would automatically test application dependencies via a CI server plugin or a CLI invocation as a task. This would immediately flag the new vulnerability, breaking the CI job and forcing a remediation action before continuing.
Monitoring: Whether or not applications are being actively developed or not, if they’re running in production they should be actively monitored. Any new vulnerabilities that are disclosed could will then be dealt with immediately. Notifications would be sent to development teams to fix the vulnerabilities via preferred channels, such as automatic PRs, emails or slack messages along with the required upgrades necessary to eliminate the issue.
Runtime: Using run-time security tools, any abnormalities in behavior or vulnerable function invocations would immediately be flagged allowing teams to react to security incidents as they happen.
The report produces five key findings from the security incident, as documented by the committee, as follows:
- Entirely preventable. Equifax failed to fully appreciate and mitigate its cyber security risks. Had the company taken action to address its observable security issues, the data breach could have been prevented.
- Lack of accountability and management structure. Equifax failed to implement clear lines of authority within their internal IT management structure, leading to an execution gap between IT policy development and operation. Ultimately, the gap restricted the company’s ability to implement security initiatives in a comprehensive and timely manner.
- Complex and outdated IT systems. Equifax’s aggressive growth strategy and accumulation of data resulted in a complex IT environment. Both the complexity and antiquated nature of Equifax’s custom-built legacy systems made IT security especially challenging.
- Failure to implement responsible security measurements. Equifax allowed over 300 security certificates to expire, including 79 certificates for monitoring business-critical domains. Failure to renew an expired digital certificate for 19 months left Equifax without visibility on the ex-filtration of data during the time of the cyber attack.
- Unprepared to support affected consumers. After Equifax informed the public of the data breach, they were unprepared to identify, alert and support affected consumers. The breach website and call centers were immediately overwhelmed, resulting in affected consumers being unable to access information necessary to protect their identity.
Thousands of Companies are at Risk for Similar Features
Now anyone who says "this won’t happen to me" is overlooking an alarming fact. Companies are breached every day and just because you don’t hear about it in the news every day doesn’t mean you are safe.
Earlier this year, Tesla’s Cloud was hijacked and used to mine cryptocurrency, exploiting a vulnerability in the company’s Kubernetes cluster. A mountain of FedEx data was recently exposed, affecting 119,000 individuals. The Equifax breach garnered international attention after an estimated 145.5 million Americans were jeopardized. In other news, we’ve reported on how the Vine Docker registry fiasco was hacked, leaving an embarrassing PR trail in its wake.
What is common across all these scenarios? They arguably could have been avoided with basic safeguards underpinned by a healthy dose of DecSecOps. In hindsight, let’s see what specific strategies developers can adopt to avoid such horrendous leaks.
Basic DevSecOps Can Prevent Vulnerabilities
DevSecOps is described as placing security at the forefront of every action. Within the world of cloud tooling, this means instilling protection at every point of the build life cycle. You would think this equates to advanced container and orchestration security, powerful access management and establishing hardened oversight for internal applications, right? Actually, many modern exploits simply involve a lack of basic password protection.
Case Study No. 1: Tesla Cloud Cryptojacking
In mid–2018, Tesla’s Amazon servers were hijacked by malware and used to mine cryptocurrency for rogue agents. Similar cryptojacking has transpired at Gemalto, Aviva and others. According to RedLock’s report, the hackers infiltrated the Kubernetes console, which was not password-protected. In one pod, they found access credentials to Tesla’s AWS environment. The breach exposed sensitive telemetry data, though was limited to “internally used engineering test cars only,” according to Tesla.
How to Prevent
Password protection for Kubernetes administrative consoles is a readily apparent lesson here. All cloud accounts, even if used internally, must be better-equipped and access credentials sequestered. Surely, instilling a “security by design” mentality can help limit the number of such consoles left accessible.
Case Study No. 2: Vine Registry Breach
In 2016 a white hacker by the alias avicoder was able to infiltrate Vine, the video-sharing social network. The hacker used tools to discover subdomains and behind one—
https://docker.vineapp.com—found an open Docker private registry that housed the Vine source code. Such a vulnerability could have been used to collect user details or inject malware for malicious purposes. Thankfully, no users were compromised, and the Twitter Bug bounty program awarded avicoder a handsome sum for the discovery.
How to Prevent
In this particular incident, there was nothing at fault with Docker or Docker containers. The service (which was meant to be private) was simply left public and devoid of access controls. The lesson is that URLs on the world wide web are just that—publicly exposed to the world. It does not matter if no documentation links to them; such sub domains are easy to discover with crawlers and will be exploited if found.
Case Study No. 3: Equifax Exploits
In the Equifax breach, millions of customer records were stolen. The Equifax report notes that hackers exploited a “website application vulnerability.” Further reports detailed that the 2017 breach utilized flaws in Apache Struts, likely through a zero-day exploit, according to the Apache Struts Project Management Committee.
How to Prevent
Some note that the deserialization of untrusted data inherent in Apache Struts applications leaves some major vulnerabilities and potentially malicious code execution.
Containers, due to their short lifespan, are not persistent, therefore making consistent hacking harder. They can also isolate functionalities so as to decrease attack vector. Regarding the Equifax breach, FP Complete speculates that container usage could have lessened the impact:
“… it’s likely that such a breach would have been more difficult with containers, and that if successful, it would have been less persistent, not as widespread, and mitigated sooner.”
Some view containers as a formidable force if armed with behavioral analysis and firewalls for network connections. Still, others note flaws in assuming containers are safer.
Regardless of architecture, In DevSecOps, security for customer data isn’t drafted into the background; rather, it is imbued into the entire tooling process and holds equal footing within each build.
Instilling a DevSecOps Culture
Instead of a build-now, secure-later attitude, DevSecOps seeks to elevate security into every decision. With each build, security is melded with greater forethought regarding the repercussions for how data is treated.
Admin consoles for Kubernetes clusters must be armed and password-protected, and the same goes for private GitHub repositories or Docker registries. Any traversal of data through cloud SaaS services must have authorization in place. A most obvious lesson is to password protect those “hidden” registries with discoverable URLs.
Bounty programs for white hackers continue to lead to successful bug detection. At least for these institutions, there is less evidence of compromised user data and remediation is quickened.
Much of this can be distilled by due diligence and through breeding a culture of security. While that’s a little fluffy for some, it apparently still requires repeating for institutions that continually de-privilege these basic safeguards.
How correcting DevOps can prevent these disasters
Applying FP Complete's rapid DevOps success strategies is the best way to achieve the productive results in the shortest time possible. FP Complete offers various levels of its DevOps Success Program which address successful ways to apply and maintain a secured DevOps environment and prevent disasters.
- Continuous Integration
- Continuous Delivery
- Infrastructure as Code
- Monitoring and Logging
- Communication and Collaboration
- DevOps Steps & Stages Success Strategies
Wasting time will only increase your chances to suffer a disaster! Get started today and enroll in one of FP Complete's Success programs to ensure you have done all things possible to prevent an attack on your organization!
What this teaches us all
Infrastructure tends to work quietly, receiving less maintenance and fewer upgrades than it requires, until one day a terrible problem happens. Whether it’s a blackout, a bridge falling down, or a cloud exposing private data, the problem suddenly appears — and as with Equifax, the problem may cost hundreds or thousands of times the price of prevention.
DevOps is not trivial or obvious, but not unusually hard either. Like all engineering, it requires adequate attention, early enough, in the right places. At FP Complete we got into automated cloud DevOps in 2012 first to manage our own systems, and since then we’ve helped dozens of other companies with their open-source cloud-based systems, gathering and sharing expertise continuously. Helping other companies use state-of-the-art tools and techniques is what we’re all about.
Free and inexpensive tools are available but need to be integrated and used right. Lots of best practices are easy to copy but slow and costly to work out from scratch. Through this blog, our free webinars, our knowledge-spreading Success Programs, and our custom engineering consulting, we are available to help companies strengthen and accelerate their DevOps quickly and with less pain. With our help or without it, FP Complete urges you to invest in shoring up your deployment automation, your log analysis, your system monitoring and response, your security programs, and your other DevOps work.
Treat your engineering infrastructure seriously, and invest in automation and security before problems manifest. Don’t be the next Equifax.
Take Action! File a Claim.
Was your data breached? Check here https://eligibility.equifaxbreachsettlement.com/en/eligibility
Upcoming GovCloud Webinar!
Save a seat at FP Complete's upcoming webinar, DevOps, FedRAMP Complaince, and Making the Migration to Govcloud Successful, presented by Senior FP Complete Engineer, Jason Boyer which airs Wednesday, November 20th, at 10am PST!
Learn many successful complaint strategies involved in the confusing and difficult process of migrating your DevOps to the government designated cloud.