I was able to attend Monitorama PDX 2017 this past week and had a blast. If you are interesting in monitoring, metrics, related data analysis, alerting, and of course logs then this is the conference for you. It struck me at Monitorama that many of us came of age in the pre-microservices (Service Oriented Architecture or SOA) world. But services in a SOA environment are different and should be monitored differently than what we may be used to.
Here’s my take on the basic tenets of monitoring infrastructure and best practices for Service Oriented Architectures. I’m an Operations Engineer doing visibility work for a fairly large client so this comes from the viewpoint of the caretaker of monitoring services. If you are a developer and don’t agree, let me know!
Designing systems at scale often means building some sort of “cluster.” These are common and important questions to answer:
One of the killer app features of Prometheus is its native support of histograms. The move toward supporting and using histograms in metrics and data based monitoring communities has been, frankly, revolutionary. I know I don’t want to look back. If you are still relying on somehow aggregating means and percentiles (say from StatsD) to visualize information like latencies about a service, you are making decisions based on lies.
I wanted to dig into Prometheus’ use of histograms. I found some good, some bad, and some ugly. Also, some potential best practices that will help you achieve better accuracy in quantile estimations.
I’ve updated Buckytools, my suite for managing at scale consistent hashing Graphite clusters, with a few minor changes.
In Go 1.5 we have the beginings of vendoring support. The easiest way to incorperate other projects into your Git repo is by using the following command. Short and dirty, but it works:
$ git subtree add --prefix vendor/gopkg.in/check.v1 \ https://gopkg.in/check.v1 master --squash