The goal of this talk is to give you a reusable framework that you can take back to you company, preparing your operations department and your company as a whole for downtime events. Though it may be unexpected, downtime is inevitable. All you can do, beyond doing what you can to keep your systems as stable as possible, is be prepared for that downtime. That preparation is what separates companies that suffer potentially business-ending downtime, and those that come out stronger.
In examining past downtime events, we will develop a framework that you can adopt inside your company that includes pre-downtime preparation, intra-downtime communication, and post-downtime transparency. We will also describe the best approaches for getting this adopted inside your company.
As head of R&D at Webmetrics, Lenny Rachitsky has played a key role in the evolution of the external web monitoring space over the past decade. He is currently focused on extending the definition of monitoring beyond the IT department, and into business, social, and mobile.
Comments on this page are now closed.
For Velocity China sponsorship information for companies outside China, contact Yvonne Romaine at yromaine@oreilly.com.
For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com
Download the Velocity Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the Velocity Conference bulletin (login required)
View a complete list of Velocity contacts
Comments
Great Presentation one of the best so far at the conference. Great examples and very practical points with a good flow.
@bret: Appreciate the feedback. Hadn’t seen the Facebook dashboard before, really nice to see this and super glad you guys are further along then I thought. I’m going to try to a blog post about it to help people find out about this, will try to think through some specific advice that could take your dashboard to the next level.
BTW, added your dashboard to my collection: delicious.com/lennysan/heal...
First, great presentation. We have been circulating it internally here at Facebook, and your constructive criticism resonated with the team here. Clearly we need to promote it more, but have you seen our Platform status dashboard ? It contains the status of all known issue, latency and error graphs, and supports subscriptions via email and RSS. I agree we have a long way to go, but I think this dashboard (which existed at the time of the outage you mention in your preso) is actually quite valuable and addresses a lot of the issues you mention.
I will forward your slides to our corperate communication department as suggest to deal with crisis ;-)
Thanks Bjoern! I’d love to hear any sort of feedback on the framework, hoping to spark some discussion around it.
good, fresh presentation. and the best: the listener got something, he could take home like checklists.