Monitoring systems that collect metrics usually support nice features like flexible graphing, and in some cases, even more advanced options such as trending and Holt-Winters Forecasting. But alerting is usually very primitive in comparison. Typical alerting systems do a static health check based on a preconfigured threshold, which we all know is never the right number—it’s just something that seems as reasonable as possible. The result: you get false positives when nothing’s wrong, and you don’t get alerts when something’s abnormal.
Why is this? The root cause is the primitive notion of healthy or sick. System health simply can’t be defined by a threshold (“85% CPU, oh noes!”). It needs to be based on three things: knowing how the system usually behaves, knowing how much the system is deviating from normal, and knowing whether the deviation is actually bad.
What if we could calculate normal behavior in real-time? If your thoughts jump to a Hadoop cluster and some kind of impressive Big Data processing, think again. There’s a way to do this that’s as simple as a couple of basic arithmetic operations on each incoming metric—just a few CPU cycles! It involves simple stuff you probably already know.
Here’s what’s really surprising: when you can actually measure normality as a number (instead of tri-valued OK/WARN/CRIT logic), you can do all kinds of useful stuff with it. Imagine tracking the normality as a metric itself, so you can quickly compare your system’s behavior to its historical performance, e.g. is the system behaving less consistently after the latest release was deployed. There’s a lot more you can do. This is really powerful juju.
Here’s an outline of the topics we’ll cover.
This presentation has a little bit of math, but you won’t need to think hard to understand it (trust me). I’ll show lots of pictures and explain everything with simple concepts like concurrency and how much work is queued up in a system. And I will share my slides.
Baron is co-founder and CTO of VividCortex, a provider of SaaS database administration tools. He is the lead author of High Performance MySQL and continues to research and publish under the O’Reilly imprint. He has created several open-source software tools, including Maatkit, and has authored features for MySQL and InnoDB. He is an Oracle ACE, and the founder of the worldwide OpenSQL Camp conference series. He holds a degree in Computer Science from the University of Virginia. Baron lives in Charlottesville, Virginia with his family.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at (203) 381-9245 or email@example.com
For media partnerships, contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Velocity contacts