Monitoring and Logging In DevOps
- What’s the point of monitoring and logging every moment of your software build?
- How exactly do you do it?
- How will affect your bottom line?
FP Complete’s Definitive Guide to Monitoring & Logging In DevOps
Privacy and Security In DevOps
Sure, privacy and security are important, but… Why is it critically important? How do you make sure your software is secure, and each user’s privacy is protected? For answers to those questions and so much more, read on…
DevOps Monitoring and Logging-FP Complete’s comprehensive, easy to understand guide designed to help you understand how these two DevOps practices will help you quickly solve problems affecting your customers’ experience.
Meet FP Complete's Chairman, Aaron Contorer.
Aaron Contorer, is the Founder and Chairman of FP Complete, which focuses on helping companies use state-of-the-art devops and blockchain tools and techniques to produce secure, lightning-fast, feature-rich software, faster and more often.
Before founding FP Complete, Aaron was an executive at Microsoft, where he served as program manager for distributed systems, and general manager of Visual C++, the leading software development tool at that time. Also, Aaron architect-ed MSN’s move to Internet-based server software, served as the full-time technology adviser to Bill Gates, and he founded and ran the company's Productivity Tools Team for complex software engineering projects.
Let’s begin this presentation by stating cases where there is a lack of attention:
We don't have logging being used properly.
We're not monitoring those logs.
We don't have any way to know what constitutes a troubling sign or what to do about it.
Is this happening to you?
Here are some things to ask about your own organization regarding... Is this happening to me now?
Is this going to happen to me tomorrow?
Am I going to be the next Equifax?
First, ask yourself:
Do you have logs running at all?
Are there files you can turn to—or databases you can turn to—to say what are all the events that happened—or all the important events that happen—in my app in the infrastructure that runs that app, like the databases and networks and operating systems and so on?
If I wanted to know...
- Did something run out of memory recently?
- Was there a huge surge in traffic recently?
- Did something behave abnormally recently?
- Did some sub-routine get used 500 times more than it normally gets used?
- Is that being logged?
If no one is logging the things that are happening in your systems—if that's not turned on—then you know that you have a lack of awareness, and you'll find out far too late. You'll be like Equifax, finding out weeks after a break-in that somebody broke in. Or you'll find out the day after your users couldn't connect to the system that you just lost a bunch of business and a bunch of customers.
The problem is...
- How many logs do you have?
- How many pieces are in your distributed system?
- How can you possibly keep track of all of that?
That's where monitoring comes in. There's a system called log aggregation, where we bring that data together. And then log monitoring, where we have pieces of software that are simply configured to look for trouble signs in the data. People can monitor logs too, but a large distributed system may have millions of log records being made every day may have a very large system of millions an hour.
We need to have an automated system for monitoring the logs of events that are happening on our servers and infrastructure, and then is sending out rapid alerts when something looks wrong—whether it looks like a system is down, whether—and this is the most common problem—the workload of some components suddenly shut up to a bizarre and unprecedented level. That's a clear sign of a bug or a configuration problem or somebody maliciously messing with your system. There needs to be systems that are running for you, looking at all the different logs all-around your distributed system, and then rapidly responding with alerts that something looks wrong. Sometimes that can even be an automated response—for example, we've done systems for clients where we'll just bring up more servers automatically if the workload gets too high. Sometimes it's a non-automated response—like we see that there is a strange bug in one of our components and we're going to have to decide whether to roll back to yesterday's build. But without monitoring of real logs that contain real data, in an automated way, honestly, we don't know what's happening in our systems. So I'm going to suggest that monitoring and logging are a natural combination. They're like peanut butter and jelly. They're good on their own, but when you combine them, you've really got a delicious dish that you're going to want to have over and over.
Are you logging enough data?
Do your apps—your developers—code in your apps the generation of log events to monitor routine behavior and to monitor unusual things that have just happened. I don't mean just the app crashed, I mean every time a user orders something on your e-commerce site...or every time an administrator logs in...or whatever your application does. Every time a significant event happens... Is it getting logged? If not, there's no data to learn from. And if you haven't configured your monitoring tool to look at those logs then the data is there but you're not learning from it.
Tips for integration
These are a few practical tips for people who want to make progress on logging and monitoring.
- We want to make sure that your logs are not just going to separate files where they have to be looked at separately.
- You have to decide a central place—an error monitoring tool or log aggregate—that's going to look at all of your logs.
- And you want to have your developers create log entries, as mentioned.
- Log entries also should contain lots of data. one should be able to look at them and have a sense of what's actually happening.
- If a component is being overloaded, who's calling us and what are they asking us to do?
FP Complete customizes monitoring solutions
We advise anybody who wants to move forward on better log creation and log monitoring and alerting should look at open-source tools. The state of the industry in commercial apps is not too bad. There are some good commercial apps for doing these things, but they're terribly expensive, to be honest with you. And we find that our clients are usually happier with open source tools for logging and monitoring. The elk stack is a great place to start. Just be aware that most open source tools are partial solutions—that when you get them, there's a lot of configuration to be done. So the elk stack is a great example. There's a lot to do to get it working on your system. We really like Grifanna as a tool for looking at the logs and also for manual inspection of the logs. We'd recommend that you look into those tools or have someone in your team look into them, or have us look into them for you if you prefer, that's fine too.
Instruct your application developers to understand that logs are important, that somebody is going to be looking, and that they're not creating entries that describe what the system has been doing. They're going to make life hard for operations. The whole idea of DevOps is to integrate auditors, developers and operations early-on into a big solution that works for everybody.
We do like the alerting features from Elastalert and Amazon Web Services. You can also find some features on other clouds. So there's a lot of tools there to be used. We're happy to help you if you don't know where to look.
Also we would like to mention just one special tip: Good logs are so chock-full of information that you want to be careful with their security because they can start to leak information.
- What user just logged on and what features did they access?
- What product did they buy?
- What article did they read?
Things like that. So you can start to see almost too much information in logs. It's good to warn your application developers not to put on really private personal information in logs. But it's also—if you're really thorough—good to have somebody look at the logs from time to time and go back and tell those people... Hey, scrub this stuff, it's a little bit too much information if anything. That, to be honest with you is a good problem to have compared to where most people are.
Upcoming GovCloud Webinar!
Save a seat at FP Complete's upcoming webinar, DevOps, FedRAMP Compliance, and Making the Migration to Govcloud Successful, presented by Senior FP Complete Engineer, Jason Boyer which airs Wednesday, November 20th, at 10am PST!
Learn many successful complaint strategies involved in the confusing and difficult process of migrating your DevOps to the government designated cloud.