Google Cloud Platform1.17 млн
Опубликовано 2 октября 2018, 17:02
In the previous video, Liz and Seth discussed how to make systems observable and how observability helps us diagnose failing systems, but didn't cover what to do when an incident grows beyond the ability of one person to do it all. In this video, you learn about the most important part of the incident management process – humans.
In the stressful moments of systems failure, it is important to define clear, concise roles for all the humans involved in an incident. With too few people, you can quickly become overloaded with work, but with too many people, work may be duplicated (i.e. too many hands on the keyboard). Learn how SREs effectively manage incidents with clearly defined roles and responsibilities such as the operations lead, planning lead, communications lead, logistics lead, and more. Seth and Liz also discuss techniques for managing long-running and exponentially complex incidents.
Reach out to Liz and Seth:
twitter.com/lizthegrey
twitter.com/sethvargo
Watch more episodes from the playlist here → bit.ly/2PPL6f0
Subscribe to the Google Cloud Platform channel for more Cloud content → bit.ly/GCloudPlatform
In the stressful moments of systems failure, it is important to define clear, concise roles for all the humans involved in an incident. With too few people, you can quickly become overloaded with work, but with too many people, work may be duplicated (i.e. too many hands on the keyboard). Learn how SREs effectively manage incidents with clearly defined roles and responsibilities such as the operations lead, planning lead, communications lead, logistics lead, and more. Seth and Liz also discuss techniques for managing long-running and exponentially complex incidents.
Reach out to Liz and Seth:
twitter.com/lizthegrey
twitter.com/sethvargo
Watch more episodes from the playlist here → bit.ly/2PPL6f0
Subscribe to the Google Cloud Platform channel for more Cloud content → bit.ly/GCloudPlatform
Свежие видео
Случайные видео