Data analytics systems are the cutting edge of modern corporate computing. While many people may feel they are behind the “state of the art” they read about, the truth is these are projects we’re implementing currently for prominent companies in life sciences, finance, healthcare, Internet services, and aerospace, They have a lot in common with each other, and likely even with your computing environment.
That’s the truth on the ground. Meanwhile, we are constantly seeing buzzwords in the tech media, as writers struggle to help everyone understand what’s going on out there. Big Data (BD) and Business Intelligence (BI) get talked about a lot -- but a lot of people are unclear on what they mean. So let’s cut through the clutter and look at what these projects are really about: who are they for, what problems do they solve, what do they have in common, and how are they different.
My friend and Deerfield schoolmate Doug Laney, a distinguished analyst on Gartner’s data team, famously defined big data as having volume, velocity, and variety.
Raw data points are available to us all in a volume no one really anticipated, as nearly every object and action in an enterprise is tracked. Recently the CTO of a hospital group described to me their world, in which 22,000 medical devices are putting out logs of data all the time. The volume of information on hand is overwhelming. How do we move it around and store it? How does one make sense of it?
Adding to this is a non-stop stream of new data points, coming in at high velocity. Our friend and client Tom Doris, founder of financial analytics firm OTAS and now a LiquidNet executive, says many stock analysts use their systems to organize the millions of new data points emerging every day from each of several stock markets.
This seems hard enough, and then you add in the enormous variety of data sources relevant to making good decisions. The delightful Chris Mackey, CEO of our client Mackey RMS, focuses on organizing the extensive research and collaboration that goes into major decisions for hedge funds and the like. How do you choose a course of action when crucial data is in a dozen different formats, on numerous different servers, behind a range of APIs and addresses?
Big data is the ultimate “be careful what you wish for” scenario. Do you wish you knew what was going on? Okay -- now what would you do if you knew almost everything that was going on? It’s like buying the daily output from a gold mine. No human, without massive machine assistance, can extract most of the value from that torrent of gold ore.
The idea of business intelligence predates computers, but has been made much more important -- and more useful -- by the vast amount of data we now have. BI is the system of making better decisions through better decision-support systems.
These systems can be as simple as reporting and charting software, or as elaborate as machine learning and artificial intelligence. And they rely on organized streams of input data -- which don’t even have to be “Big” to be extremely useful. In fact, a lot of BI involves digesting the complexity of the raw data, bringing it down to human-usable tools like dashboards, metrics, and exception detection. Many BI systems are hierarchical -- presenting decision-makers with a summary of the current situation, and features to filter or explore the data to learn more about any part.
Our client Seattle Cancer Care Alliance, for example, provides life-saving treatments at several leading cancer-care institutions. From the start, they provide outstanding care to a great many patients. But wouldn’t it be even more exciting to constantly learn from the outcomes of all these treatments, to see which therapies are working best for what sorts of cases -- and then to use this knowledge to deliver the best possible course of care for every future patient? While a typical analysis might only involve thousands of patients in total -- hardly enough to sound like Big Data -- the caliber of insight that must be provided is exceptionally high.
For a very different example, consider the project we’re working on right now with a multi-billion-dollar manufacturing company. As is typical these days, their big expensive machines have a computer on board that constantly logs their performance. But a lot of this data just goes into storage, with no one looking at most of it. What they want is to understand the leading causes of breakage and downtime, and gradually eliminate these -- through offline analysis to discover best practices for maintenance, and near-real-time analysis to improve operations plans during the workday -- making their operators into computer-assisted super-operators. That’s business intelligence, turning available data into better decision-making.
As you’ve probably guessed, BD and BI aren’t competing approaches -- they are IT architectures that play well together, with Business Intelligence as essentially a layer on top of Big Data.
We find that most companies already have good IT organizations in place, with the skills to develop new software when need, and to integrate existing Commercial Off-the-Shelf (COTS) tools when available. The problem, then, isn’t lack of building blocks. Anyone can obtain or write a program to input a table of data and graph it, or compute subtotals. The problem is how to put these building blocks together, and especially, how to scale up trivial solutions to production scale.
DevOps is another jargon term in constant use -- including here at FP Complete. It means the engineering that happens *after* you’ve written some code but *before* your end user receives the final results on-screen. Devops is a set of tools and best practices for scaling up: from a data analysis that runs one time, on one user’s machine, to a system that runs all the time, on a reliable and scalable and secure cloud-based system, to support everyone who needs the answers. If you’re still using manual processes and mysterious “IT wizards” to scale up your analyses from the laptop to the data center, you’re not going to reach Big Data scale or achieve much Business Intelligence. DevOps is a proven set of techniques and technologies for integration, deployment, scale-up, and continuous operations.
DataOps is a newer concept -- it’s “DevOps for data.” Just as numerous tools can clean up and scale up your analytics apps, a parallel set of tools can clean up and scale up your actual data feeds. DataOps includes data cleansing, schema enforcement, storage and replication, warehousing and repositories, metadata management, version management, uniform API provision, security and monitoring -- all the tools and processes to turn your “pile” of data into an “answer factory” capable of responding to any reasonable query, and constantly ingesting and incorporating the latest data streams.
Cloud application architecture means designing your distributed system -- servers, apps, tools, work processes, jobs, and data flows -- into a sensible whole. These days, almost no one should be designing a major new IT system from scratch. If your company is mostly writing new virgin software code from a blank-screen start, you’re wasting work and losing time. Understanding best practices and existing IT architectures, and picking components from the existing inventory, will usually get you 80% of the way toward a good solution. Reuse makes all the difference! Cloud features and distributed, service-oriented architectures make building-block-style development productive and fast. Bug-resistant architectures, with clear separation of responsibilities, will allow you to break your IT system into pieces -- most not written from scratch -- each maintainable on its own schedule, and improvable at will.
The good news is that Big Data is not an all-or-nothing proposition, and neither is Business Intelligence. You can make stepwise progress on both, which is exactly what we encourage our clients to do.
Phase 1 will be BI with the limited portion of your data that’s already in good condition. It’s fairly straightforward to create new IT solutions -- I don’t say new apps here, because these solutions will using existing code for much of the work -- that will answer whatever you feel are the most pressing questions about your data. You are probably already doing some of this, without even calling it business intelligence. Most companies stay in Phase 1 for years, never really getting the answers they wish they had, but at least answering a few crucial questions with hand-built systems.
Phase 2 will be basic DevOps -- turning your IT work into an IT factory, in which any analysis that runs for *someone* can be turned into an analysis that runs for *everyone, all the time* -- maintainably, reproducibly, reliably, scalably , securely. Likely steps here include Version Control, Continuous Integration, Continuous Deployment, Automated Testing, Cloud Scalability, System Monitoring, and possibly Security Auditing. With many of these things implemented, you will see your BI productivity go way up, with new solutions coming online regularly and predictably.
Phase 3 will be basic DataOps, launched when you rapidly discover that the questions you really want answered require data that’s “somewhere around here” and not yet organized. You can expect to do an inventory of the many formal and informal data feeds you depend on, what format they’re in, how they arrive, how accurate they are, and how they are accessed. A set of automated systems will be set up to filter, correct, or “cleanse” these feeds, and then to make them available on high-powered, typically cloud-based, distributed data servers. A set of metadata or “tables of contents” will be set up to help your team locate and tap into the data sources needed to answer a particular query. Data sources will likely always be federated, with no one format conquering all, and with cloud services stitching up the differences. With DataOps implemented, you can expect to describe any reasonable question about “what’s really going on,” and if the data is present somewhere, a system that answers your questions will be feasible.
We find that mastery of data streams is more and more central to every industry. Whether you’re in financial technology (FinTech), aerospace, life sciences, or health care, your world is likely to look more and more like the world of secure Internet services and cloud computing. People in every industry tell us that this is where they’re going.
As automation increases, Big Data will become the norm, and we’ll soon just be calling it Data. Just as DevOps is becoming the norm for innovative IT groups, so will DataOps. IT departments will more and more resemble a two-sided “zipper,” marrying ever-improving data inputs with ever-improving software inputs, into ever-improving online solutions that run in their data centers and in the cloud.
It will be a long road, but realistically we can look forward to a future in which any question you have about your operations, your customers, your patients, your research, can be answered with real data -- reliably, reproducibly, and all the time.
Do you like this blog post and need help with industrial Haskell, Rust or DevOps? Contact us.