Winning the Metrics Battle (finally)

Simon Hildrew (The Guardian), Nick Satterly (The Guardian)
Operations and Culture
Location: Buckingham Room
Average rating: ****.
(4.20, 5 ratings)

The Guardian didn’t set out to have a full time metrics and monitoring role, but when we interviewed someone who fitted that role we realised that was a missing piece in our strategy. The best thing we’ve done since is listen to him and become obsessed with metrics and monitoring.

Three years ago The Guardian Operations team had a legacy monitoring system and a newly deployed but inadequate metrics system. We knew the “right” thing was:

  • gathering metrics on everything
  • gathering metrics frequently
  • alerting on the same metrics that we graph
  • that everything included copious quantities of developer provided application metrics
  • to provide visibility to everyone

The problem was how to make that a reality.

In this talk we are going to tell the story of our mistakes and successes in trying to make this a reality. We’ll talk about the products we evaluated, those we tried and failed to use and those we have currently settled on (ganglia, graphite and some custom software). We’ll talk about the friction and roadblocks that operations and developers have come across and steps we’ve taken to reduce and eliminate them (including self service dashboards and a standard (automated) way of collecting application metrics).

The result of the work we’ve done is that we have massively improved our visibility across both our data-centres and in the cloud; we can troubleshoot and fix problems more quickly; developers are able to provide application metrics and they have an interest in watching them. All of which has brought developers and operations closer together.

Photo of Simon Hildrew

Simon Hildrew

The Guardian

Simon has worked at the Guardian for four years in multiple guises. For two of those years he led the ops team through a phase that saw the team grow to three times the size. Now he works on developing internal deployment, monitoring and management tools in an effort to make life easier for developers and operations alike.

Photo of Nick Satterly

Nick Satterly

The Guardian

Nick Satterly is a monitoring engineer in the web operations team for the Guardian. He has spent more than a decade designing and implementing enterprise monitoring systems at large telcos and financial institutions. He is now enjoying using those skills in an agile environment, using open source solutions.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at glombardo@oreilly.com

Media Partner Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Velocity contacts