Monitoring and alerting enables a system to tell us when it’s broken, or perhaps to tell us what’s about to break. Because this book focuses on the engineering domains in which SRE has particular expertise, we won’t discuss these applications of monitoring here. System monitoring is also helpful in supplying raw input into business analytics and in facilitating analysis of security breaches. Conducting ad hoc retrospective analysis (i.e., debugging) Our latency just shot up what else happened around the same time? Building dashboards Dashboards should answer basic questions about your service, and normally include some form of the four golden signals (discussed in The Four Golden Signals). There are many reasons to monitor a system, including: Analyzing long-term trends How big is my database and how fast is it growing? How quickly is my daily-active user count growing? Comparing over time or experiment groups Are queries faster with Acme Bucket of Bytes 2.72 versus Ajax DB 3.14? How much better is my memcache hit rate with an extra node? Is my site slower than it was last week? Alerting Something is broken, and somebody needs to fix it right now! Or, something might break soon, so somebody should look soon. Push Any change to a service’s running software or its configuration.
0 Comments
Leave a Reply. |