I’ve been experimenting with Thanos over the last few months to add long term storage of Prometheus metric data and a single query endpoint for a large Prometheus fleet. This looks like a really well designed solution to end the 30-ish day retention limits and to solve the oft asked question “What host name is my Prometheus box?”

Thanos requires that each Prometheus VM or machine in the fleet have a unique set of external labels. These are special labels that are added to metrics as they leave a Prometheus VM for any reason and help identify the source. As Prometheus is usually sharded by function or team and HA/DR is implemented by having a pair of identically configured Prometheus VMs in each shard, it is recommended to use two external labels:

• role: This identifies the function or team this Prometheus VM serves. I use monitor instead here, but same difference.
• replica: This is an index or FQDN of the VM and is used so that each VM in a single role is unique. Do your self a favor and use the FQDN here.

This was the model I deployed. Then came the headdesk moment! This didn’t completely uniquely identify all my Prometheus VMs, and Thanos said so, loudly. To Thanos, I had non-unique, but overlapping TSDB blocks as well as non-unique Prometheus VMs. I had already let each Prometheus VM upload data (those TSDB blocks) to Google Compute Storage (GCS). I found myself in a bit of a pickle.

In my Prometheus fleet, I shard by team usually, and teams have a pair of PROD Prometheus VMs and, at their option, a non-production Prometheus VM for their testing of rules and such. I had forgotten to account for this and the replica="0" version of the production and non-production VMs were conflicting. Thus, another bit of wisdom to use FQDNs in the replica external label.

Fixing this was two-fold. The first step was easy. Well, easier than the second step. First, each Prometheus VM needed to be reconfigured to have a third external label present. The Thanos Sidecar component also needed a good restart to find these changes.

• promenv: The environment of the Prometheus Role.

This was purposely not named environment. Users and teams monitor production and non-production targets from a single Prometheus Role. It turns out teams really do want production monitoring system quality for monitoring of non-production targets. So relabeling rules attach an environment label to each metric at scrape time to add the namespacing of the environment of the target. But this means that the environment of the Prometheus Role doesn’t indicate the environment of the target scrape jobs.

Secondly, each stored TSDB block in GCS had the old set of external labels attached by Thanos and these needed to be updated. Thanos stores these by adding them to the meta.json file in each TSDB block. So this isn’t too crazy. I ended up writing a bit of Go code to work this through.

This code is passed a GCS bucket name either with or without the gs:// prefix. The value for the promenv label is encoded into the GCS bucket name, so some regular expression magic validates the bucket name follows my patterns and finds the correct promenv value. If the -confirm holodeck safety flag is given, it will then find any meta.json object in the given bucket, parse it, and see if a promenv external label exists. If that label exists, the code moves on to the next TSDB block. If that label is absent, then it is added to the JSON, marshaled back into text, and uploaded. Backups of the metadata files are kept on local disk.

See my rewritemeta command source on Github!

\$ ./rewritemeta -confirm gs://bruce-thanos-lts-global-dev/
2019/05/26 21:59:13 GCS Bucket   : bruce-thanos-lts-global-dev
2019/05/26 21:59:13 Promenv Value: dev
2019/05/26 21:59:38 Found: 01DBV5YHD90P9YPEE955XPJX60/meta.json
2019/05/26 21:59:38 Writing backup to: 01DBV5YHD90P9YPEE955XPJX60-meta.json

The only caveat is that this code does not format the meta.json files after modification exactly in the same order that Prometheus and Thanos do. Its JSON so it will parse, but I did find it surprising that the data wasn’t in the same order. After running this on a bucket you will need to restart any Thanos Store and Compact components that are running against this bucket.