Cover image of On-Call Nightmares Podcast

On-Call Nightmares Podcast

Being on-call in a tech team can lead to some interesting stories. On this podcast we'll talk to a variety of people from the world of technology, discuss their experiences in on-call and find out some nightmares they survived. Hosted by Jay Gordon - Twitter @jaydestro

Weekly hand curated podcast episodes for learning

Popular episodes

All episodes

The best episodes ranked using user listens.

Podcast cover

Episode 21 - Arup Chakrabarti - PagerDuty

Who wakes up the people who get woken up for on-call? The folks at PagerDuty are responsible for providing pager notifications to teams across the globe. In this interview I talk with Arup Chakrabarti who's dedicated to get you your alerts. Arup has been working in the space of software operations since 2007. He started out at as an Operations Engineer at Amazon, helping to reduce customer defects with multiple teams for the Amazon Marketplace. Since then, he has managed and built operations teams at Amazon and Netflix to help improve availability and reliability. He currently works at PagerDuty, where he is part of the Infrastructure Engineering group. twitter: https://twitter.com/arupchak https://pagerduty.com


25 Apr 2019

Rank #1

Podcast cover

Episode 9 - Charity Majors - Honeycomb.io

Infrastructure Week, Episode 2! Charity and Jay sit down for a discussion on her career and a deep dive into a database incident. You'll get some interesting thoughts on how monitoring has changed in operations. Charity is cofounder and CEO of Honeycomb.io, a startup aimed at debugging complex systems. (“It’s like strace for systems!”) Previously, Charity ran infrastructure at Parse and was an engineering manager at Facebook. She also worked with the RocksDB team to build and deploy the world’s first Mongo + Rocks in production. She likes single malt scotch. https://honeycomb.io https://twitter.com/mipsytipsy


31 Jan 2019

Rank #2

Similar Podcasts

Podcast cover

Episode 41 - JJ Asghar - IBM

On-Call Nightmares returns to talk to the man from Texas who represents Big Blue, JJ Asghar. JJ and I discuss his start as a 15-year-old in technology and how on-call has morphed over the years. JJ works at IBM on the IBM cloud as a Developer Advocate. He’s focusing on the IBM Kubernetes Service trying to make companies and users have a successful on boarding to the Cloud Native ecosystem. He lives and grew up in Austin, Texas. He enjoys a good strong stout, hoppy IPA, and some team building Artemis, madding Dwarf Fortress, Rimworld, or Factorio. He’s a member of the Church of Emacs, though jumps into Vim on remote machines. He usually chooses Ubuntu over CentOS, but secretly wants FreeBSD everywhere. He’s always trying to become a better Ruby developer, but experiments with Go, Python, and only when he has to, Node. A father and husband, if he’s not trying to automate his job away he’s always trying to convince his daughters to “be button makers not button pushers. http://www.github.com/jjasghar https://twitter.com/jjasghar https://www.deliveryconf.com


24 Oct 2019

Rank #3

Podcast cover

Episode 40 - Ryan Kitchens - Netflix

A big milestone, episode 40! This week I speak with Netflix SRE Ryan Kitchen about birds, DR and movies! Ryan Kitchens has been in a variety of positions in software over the past ten years allowing him to experience the good and the bad, the amazing and the bizarre. As an SRE with a film degree, he currently works at Netflix on the CORE team, focused on ensuring availability. The background of the team spans incident management and analysis, resilience engineering, and human factors & systems safety. https://twitter.com/this_hits_home


10 Oct 2019

Rank #4

Most Popular Podcasts

Podcast cover

Episode 3 - Chris Short - Red Hat

Chris Short has been a proponent of open source solutions throughout his over two decades in various IT disciplines including systems, security, networks, and DevOps engineering and advocacy across the public and private sectors. He currently works on the Ansible team at Red Hat. Chris is a partially disabled US Air Force veteran living with his wife and son in Greater Metro Detroit. Chris writes about DevOps and other topics at chrisshort.net. He also runs the DevOps, Cloud Native, and open source focused newsletter DevOps’ish. Twitter: ChrisShort Web: https://chrisshort.net, https://devopsish.com


27 Dec 2018

Rank #5

Podcast cover

Bonus Episode - Jay Gordon - "bits of //build, Overcoming Failure"

Bonus! ME!!! I spoke at Microsoft's community event "bits of //build" about overcoming failure. This is a culture talk I have been working on that really focuses on my personal road through failure and recovery. Thanks to all who sat in the room and took part. https://twitter.com/jaydestro


9 May 2019

Rank #6

Podcast cover

Episode 27 - Joseph Marhee - Packet

This week, I bring a friend from a past job to share his insights on observability and other aspects of a weird life in technology. This is one of my favorite chats because Joe is one of my favorite people in tech. "Customer-concerned Operations and Systems workers turned Cloud Native lab-rat at Packet, previously of DigitalOcean, IBM, Recurly, Platform9 Systems. Approach to Production engineering relies on an iterative combination of programmatically-led audits, collaborative remediation, and mental health check-ins to ensure the observability scheme is serving the organization and its workers, and not leaving burnt out engineers at the on-call rotation's mercy. " Transcript: https://aka.ms/AA5q31e https://twitter.com/jmarhee https://github.com/jmarhee https://www.packet.com/


13 Jun 2019

Rank #7

Podcast cover

Episode 12 - Baron Schwartz - VividCortex

Content Warning: This episode does contain some graphic description of the work done by an EMT - if you find this troubling you may want to check out another episode! On this episode, I speak with the CTO and founder of VividCortex on his life down on the farm and as an EMT. Baron gives us some insight into how that prepared him for his time on-call in different roles to ensure databases are fast and reliable. Baron is the CTO and founder of VividCortex, the best way to see what your production database servers are doing. Baron has written a lot of open source software, and several books including High Performance MySQL. He’s focused his career on learning and teaching about scalability, performance, and observability of systems generally (including the view that teams are systems and culture influences their performance), and databases specifically. Twitter: @xaprb Website: xaprb.com


21 Feb 2019

Rank #8

Podcast cover

Episode 42 - John Willis - Red Hat

The number 42 has a huge meaning for baseball fans. Jackie Robinson wore 42, Mariano Rivera wore 42 and now one of the greatest in DevOps, John Willis wears the On-Call Nightmares podcast episode #42! Learn from John's past, his present and his future at Red Hat. We got together at the 2019 DevOps Enterprise Summit in Las Vegas to chat about all things DevOps and a lil Yankees baseball (not much). By far one of the most important episodes of the podcast yet. John Willis has worked in the IT management industry for more than 35 years. Currently he is part of Red Hat's Global Transformation Office which will be focused on accelerating our customers digital visions while bringing holistic change across their technological AND social systems. He was formerly Director of Ecosystem Development at Docker. Prior to Docker, Willis was the VP of Solutions for Socketplane (sold to Docker) and Enstratius (sold to Dell). Prior to to Socketplane and Enstratius, Willis was the VP of Training and Services at Opscode, where he formalized the training, evangelism, and professional services functions at the firm. Willis also founded Gulf Breeze Software, an award-winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks on enterprise systems management and was the founder and chief architect at Chain Bridge Systems. https://twitter.com/botchagalupe Beyond the Phoenix Project - Audiobook https://itrevolution.com/book/beyond-phoenix-project-audiobook/ Maslach Burnout Inventory - https://www.mindgarden.com/117-maslach-burnout-inventory


31 Oct 2019

Rank #9

Podcast cover

Episode 30 - Tim Yocum - InfluxDB

Episode 30 is a waterfall of information you'll soak up and learn a ton from. Things get a bit wet and wild for Tim in this episode of On-Call Nightmares! A great discussion about a long history in tech, the things you just can't plan for and more. Tim is an engineering manager at InfluxData with over 20 years of experience. His technical interests include high-performance, scalable, fault-tolerant cloud infrastructure, interconnected hybrid architecture, containerization (c14n?) all the way down, and always winning buzzword bingo. Helping teams achieve their highest potential is his true calling, which often means planting ideas and staying out of the way. transcript: https://raw.githubusercontent.com/jaydestro/oncallnightmares/master/episode30.tim.yocum.txt https://twitter.com/tkyocum https://www.influxdata.com/ https://tky.io


11 Jul 2019

Rank #10

Podcast cover

Episode 13 - Damian Schenkelman - Auth0

Welcome back to another podcast about downtime! Once again we meet with another technologist who's building a new product and getting it out to the world. This time we meet Damian of Auth0 who's been working with his team to ensure identity services. Damian is an Software Engineer that loves to solve hard problems of any type, especially those related to making software and teams scale. He is a Director of Engineering at Auth0 helping make identity simple for developers. Before Auth0, Damian spent many years working for and at Microsoft on Azure, Media and patterns & practices related initiatives. He spends his spare time with family, friends, exercising and catching up on all things NBA. Twitter: @dschenkelman auth0.com


28 Feb 2019

Rank #11

Podcast cover

Episode 38 - Gene Kim - IT Revolution

Live from DevOpsDays Portland, I speak with Gene Kim, Author of "The Phoenix Project" and the upcoming book "The Unicorn Project."  When I started this podcast, one of my goals was to talk to Gene about his own experiences in IT, thankfully this trip to DevOpsDays in PDX helped that happen.  Cameos by Jennifer Davis, Matty Stratton, Jason Yee and Terri Haber! Gene Kim is a multiple award-winning CTO, researcher and author, and has been studying high-performing technology organizations since 1999. He was founder and CTO of Tripwire for 13 years. He has written five books, including “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win”, “The DevOps Handbook”, “Accelerate” and the upcoming “The Unicorn Project”. Since 2014, he has been the organizer of the DevOps Enterprise Summit, studying the technology transformations of large, complex organizations. https://twitter.com/RealGeneKim Transcript - https://aka.ms/AA6107c The Unicorn Project - https://itrevolution.com/book/the-unicorn-project/ DevOps Enterprise Summit Las Vegas - https://events.itrevolution.com/us/


12 Sep 2019

Rank #12

Podcast cover

Episode 31 - Jason Yee - Datadog

Datadog Dash was this week which meant I was lucky enough to catch up with my friend, Jason Yee. We discuss his time in tech, measuring everything and a lot more! Jason is a technical evangelist at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. Previously, he was the community manager for DevOps & Performance at O'Reilly Media and a software engineer at MongoDB. He's currently exploring the world while living as a nomad and would love to hear about where you live. transcript: https://raw.githubusercontent.com/jaydestro/oncallnightmares/master/episode31.jason.yee.txt https://twitter.com/gitbisect https://www.datadoghq.com/


18 Jul 2019

Rank #13

Podcast cover

Episode 28 - Jason Hand - Microsoft

This week my homie supreme, Jason Hand joins me on On-Call Nightmares. We talk monitoring, SRE and getting in the van. Jason has spent the last 5 years connecting with technologists around the world on ideas related to balancing system and service reliability with the speed and agility required in today's digital world. Previously at VictorOps, Jason authored four books on the subjects of Site Reliability Engineering, Post-Incident Reviews, and ChatOps and was named "DevOps Evangelist of the Year" in 2016 by DevOps.com. Co-organizer and emcee of the annual DevOpsDays Rockies conference, the Frontrange Site Reliability Meetup, Denver DevOps Meetup, and DevOps Road Trip, Jason enjoys connecting story tellers and actionable ideas with those who are hungry to learn. Co-host of the podcast "Community Pulse", Jason helps to bring together ideas and expertise as it relates to building community within tech (I.e. advocacy, evangelism). In his spare time, you'll find Jason soaking up the beautiful Colorado outdoors on a trail, lake, river, or mountain by day and enjoying craft IPA's and bluegrass music by night. Transcript: https://aka.ms/AA5q317 https://twitter.com/jasonhand


27 Jun 2019

Rank #14

Podcast cover

Episode 8 - Melissa Palmer - Veeam

Does this VM bring me joy? Melissa is Product Strategy Technologist at Veeam and an information technology infrastructure enthusiast, with a focus on virtualization, security, and emerging technologies. Melissa is a VMware Certified Design Expert (VCDX #236), and has held roles such as VMware Engineer, Systems Engineer, Solutions Architect, and Technical Marketing Engineer prior to joining Veeam. You can find Melissa on twitter @vMiss33 or at her blog https://vMiss.net.


28 Jan 2019

Rank #15

Podcast cover

Episode - 5 - Kolton Andrus - Gremlin Inc

Fear, Chaos and Pain Common subjects in the Christopher Nolan Batman films, especially when the Joker appears. How do we avoid the moments of fear, chaos and pain in real time? By preparing for it. Today we talk with Gremlin Inc founder and CEO Kolton Andrus. Kolton is co-founder and CEO of Gremlin. Previously, he was a Chaos Engineer at Netflix improving streaming reliability and operating the Edge services. He designed and built F.I.T., Netflix's failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a 'Call Leader', managing the resolution of company-wide incidents. Gremlin.com Twitter: @gremlininc


10 Jan 2019

Rank #16

Podcast cover

Episode 22 - Mike Julian - The Duckbill Group

This week I get a chance to speak to someone who just wants to save you some money on your cloud bills. Mike shares some great stories and gives insight to what he and Corey Quinn are working on at the Duckbill Group. Mike is the CEO of The Duckbill Group, a consultancy helping companies fix the horrifying AWS bill by both lowering the size of it and helping them understand where the money is going. Mike also hosts the Real World DevOps Podcast, is the author of O’Reilly’s Practical Monitoring, and editor/analyst at Monitoring Weekly. He was previously an SRE/DevOps Engineer/system administrator for companies such as Taos Consulting, Peak Hosting, Oak Ridge National Laboratory, and many more. Mike is originally from Knoxville, TN (Go Vols!) and currently resides in Portland, OR. Twitter: https://twitter.com/mike_julian https://www.duckbillgroup.com https://monitoring.love https://www.realworlddevops.com


9 May 2019

Rank #17

Podcast cover

Episode 24 - Nathen Harvey - Google

Live from ChefConf 2019, I talk with Nathen Harvey about outages, lunch and a life spent in technology. This was one of my favorite podcast interviews because Nathen is one of my major influences and mentors in what we do in Developer Advocacy and Relations in technology. He's taught me so much over the years and has done his best to check in with me during the tough moments, like another member of the on-call team might do during a rough incident. Nathen Harvey, Cloud Developer Advocate at Google, helps the community understand and apply DevOps and SRE practices in the cloud. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps, and is part of the DevOps Days conferences global organizing committee. Nathen is part of the Google DevRel team and can be found at the following links: https://twitter.com/nathenharvey https://linkedin.com/in/nathen


23 May 2019

Rank #18

Podcast cover

Episode 36 - Michael Stahnke - CircleCI

Live from DevOpsDays Chicago! I meet up with Ops Veteran, Michael Stahnke as we discuss his career in technology. From the weird days of AIX systems all the way till his time now at CricleCI, Michael has plenty of great stories. Special cameos by Jason Yee and Joshua Zimmerman (our laugh track). Michael Stahnke is VP of Platform Engineering at CircleCI. Prior to this role, he was at Puppet running engineering for Puppet Enterprise, Puppet Open source, and SRE. He is an author for State of DevOps Report in 2018 and 2019. Michael also helped get the Extra Packages for Enterprise Linux (EPEL) repository off the ground in 2005, is the author of Pro OpenSSH (Apress, 2005), is an organizer of Devopsdays Madison. You can find reach him @stahnma on nearly any service online. Transcript: https://aka.ms/AA5yha2 https://twitter.com/stahnma


29 Aug 2019

Rank #19

Podcast cover

Episode 39 - Daniel Bentley - tilt.dev

This week I speak with Dan Bentley of tilt.dev! Dan is a software engineer who's currently fixing microservice development as CEO of Tilt ( https://tilt.dev ). Before that, he was at Google for 11 years and then Twitter, working on tools for devs and tools for non-developers. He's opened for The Who and has checks from Donald Knuth. Transcript: https://aka.ms/AA64hk6 https://tilt.dev https://twitter.com/dbentley


25 Sep 2019

Rank #20