Ten Things We've Learned from Running Production Infrastructure at Google
Google's production infrastructure might be the most complex machine that humanity has built so far. It is constantly changing and evolving.
Site reliability engineers (SREs) are the specialists that manage and improve the architectures, tooling, and operational procedures that enable Google to keep its products reliable, scalable, efficient and agile. In this talk Christof Leng will discuss a number of fundamental organizational principles that Google SRE has learned over the years.
In this talk, you'll learn:
- How to set up an organization to run production reliably and without burning out the team