Ten things we've learned from running production infrastructure at Google
Google's production infrastructure might be the most complex machine that humanity has built so far. It is constantly changing and evolving. Site Reliability Engineers (SREs) are the specialists to manage and improve the architectures, tooling, and operational procedures that enable Google to keep its products reliable, scalable, efficient, and agile. This talk will discuss a number of fundamental organizational principles that Google SRE has learned over the years.
What will the audience learn from this talk?
How to set up an organization to run production reliably and without burning out the team.
Does it feature code examples and/or live coding?
Prerequisite attendee experience level: