LinuxCzar

Python, Linux, and Operations

New Project: StatsRelay

Introducing StatsRelay, a proxy daemon for Statsd style metrics written in Go.

What does it do?

StatsRelay is designed to help you scale out your ingestion of Statsd metrics. It is a simple proxy that you send your Statsd metrics to. It will then forward your metrics to a list of backend Statsd daemons. A consistent hashing function is used with each metric name to determine which of the Statsd backends will receive the metric. This ensures that only one Statsd backend daemon is responsible for a specific metric. This prevents Graphite or your upstream time series database from recording partial results.

Why would you use it?

Do you have an application tier of multiple machines that send updates for the same metric into Statsd?

When you need to engineer a scalable Statsd ingestion service you need a way to balance between more than one Statsd daemon. StatsRelay provides that functionality. You can also use multiple StatsRelay daemons behind a UDP load balancer like LVS to further scale out your infrastructure.

StatsRelay is designed to be fast and is the primary reason it is written in Go. The StatsRelay daemon has been benchmarked at handling 200,000 UDP packets per second. It batches the metrics it receives into larger UDP packets before sending them off to the Statsd backends. As the string processing is faster than system calls, this further increases the amount of metrics that each Statsd daemon is able to handle.

When shouldn't you use StatsRelay?

In many cases you might want to run Statsd on each client machine and let it aggregate and report metrics to Graphite from that point. If each client only produces unique metrics names this is the approach you should use. This doesn't work, however, when you have multiple machines than need to increment the same counter, for example.

What's wrong with Statsd?

Etsy's Statsd tool is really quite excellent. Its written in NodeJS which, event driven it may be, is not what I would call fast. The daemon is a single process which only scales so far. Testing showed that the daemon would drop packets as it approached 40,000 packets per second as it would peg the CPU core it ran on at 100%. I needed a solution for an order of magnitude more traffic.

But, Hey! Statsd comes with a proxy tool!

New versions of Etsy's Statsd distribution do come with a NodeJS proxy implementation that does much the same thing. Similar to the Statsd daemon the code, in single process mode, would top out around 40,000 packets per second and 100% CPU. Testing showed that the underlying Statsd daemons were not getting all of that traffic either.

I checked back on this proxy after it had been developed further to find that it had a forkCount configuration parameter and what looked like a good start at a multi-process mode. I tested it again with my statsd load generator which produced about 175,000 packets per second, which was well inside the packets per second I needed to support in production. Setting the forkCount to 4 I found 4 processes each consuming 200% CPU and 2G of memory each. The code was still dropping packets.

At about 175,000 packets per second this Go implementation uses about 10M of memory and about 60% CPU. No packets lost.

Contributing

Fork the StatsRelay repository and submit a pull request on GitHub.

Things that need work:

  • Add health checking of the underlying Statsd daemons
  • Profile and tune for speed and packet throughput

Writing Documentation

Documentation is every IT professional's job. I keep a 3x5 notecard with a generic layout I use to write documentation and all of my IT related stuff follows this pattern. I'm actually tired of keeping up with the card, so I'm going to put it here.

I lifted and simplified this layout Tom Limoncelli's Ops Report Card section on documentation.

Overview or Summary

  • A summary of what this is.
  • Where does this service live?
  • Why do we need it?
  • Upstream documentation
  • Design (Perhaps its own section outright)
  • Diagrams of logic or data flow
  • Other moving parts that make the whole
  • Subject Matter Expert contacts

Common Tasks or Process

  • Common tasks needed for care and feeding.
  • If this is a documented process that process, step by step, goes here.

Deployment or Building

  • Do we build the software locally, how do we do it
  • How do we deploy more of these machines or replace busted ones
  • Where are our configs in Puppet/Chef/Ansible
  • Hardware Requirements

Pager Playbook

  • How might the system fail?
  • What does failure mean?
  • What risks do we run?
  • What to do to restore each service or part
  • What side effects happen when specific parts are down or malfunctioning

Disaster Recovery Plans

  • How is (or isn't) this system recoverable from a disaster situation?
  • What disasters have we planned for?
  • HA plans can fit here too
  • Steps that need to happen to recover
  • Risks
  • Service Level Agreement (either real/legal or social)

Notes

  • Any notes about the service
  • Things that don't fit well above
  • Future to do or improvements
  • Uncommonly needed tasks

Logging with Docker 1.0.1

Docker encourages its users to build containers that log to standard out or standard error. In fact, its now common practice to do so. Your process controller (uWSGI/Gunicorn) should combine the logs from all processes and do something useful with them. Like write them all to a file without your app having to worry about locking. Or, maybe even to Syslog.

Docker supports this practice and collects logs for us. In JSON to add missing timestamps and to work well with LogStash. But, what is a show stopping issue for us is that these files grow boundlessly. You cannot use the logrotate utility with them because the Docker daemon will not re-open the file. Well, unless you stop/start the specific container. Docker logging issues are an ongoing topic and this is clearly an area where Docker will improve in the future.

There are two other widely accepted ways of working around this:

  • Bind mount in /dev/log and off load logs to the host's Syslog
  • Mount a volume from the host or a different container where logs will be processed.

The second point is out. Same problem of not being able to easily tell the app to re-open files for log rotation without restarting the container.

Using /dev/log and off loading logs to the system's log daemon sounds like a good idea. The Docker host can provide this service arbitrarily to all containers. Containers need not deal with (much) logging complexity inside them.

This approach has multiple problems.

Off loading logs to the host's Syslog most likely means that you want to add some additional configuration to rsyslog which requires a restart of the rsyslog daemon. (Say, you want to stick your logs in a specific, app-specific file.) The first thing rsyslog does when it starts is (re-)create the /dev/log socket. At this point, any running Docker container that has already bind mounted /dev/log now has an old socket not the newly created one. In any case, rsyslog is no longer listening to any of the currently running containers for logs. Full stop. This method doesn't pass the smoke test.

What ended up working for me was using the network, but it added complexity to the Docker host. I'm managing Docker hosts with Ansible so this wasn't a huge problem. I'd rather tune my Docker hosts than alter each image and container. I set the network range on the docker0 bridge interface to a specific and private IP range. Now, my Docker hosts always have a known IP address that my Docker containers can make connections to. In /etc/default/docker:

DOCKER_OPTS="--ip 127.0.0.1 --bip 172.30.0.1/16"

I configured rsyslog on the host to listen for UDP traffic and bind only to this private address:

$ModLoad imudp
$UDPServerRun 514
$UDPServerAddress 172.30.0.1

I then built my image to run the process with its output piped to logger using the -n option to specify my syslog server. Guess what. No logs.

The util-linux in Ubuntu Trusty (and other releases) is 2.20 which dates from 2011-ish. The logger utility has known bugs. Specifically that the -n option is ignored silently unless you also specify a local UNIX socket to write to. This version of util-linux also does not have the nsenter command which is very handy when working with Docker containers either. (See here for nsenter.) This is a pretty big frustration.

The final solution was to make my incantations in my Dockerfiles slightly more complex for apps that do not directly support Syslog. But, it works.

CMD foobar-server --options 2>&1 \
    | logger -n 172.30.0.1 -u /dev/null -t foobar-server -p local0.notice

I promise I'm not logging to /dev/null.

Packages in Their Glory

I've been thinking about and wanting to write about packages for a long time. DEBs. RPMs. Pip for Python. CPAN for Perl. Galaxy for Ansible. Registry and Docker. Puppet modules from Puppet Forge. Vagrant Boxes. Every technology comes with its own distribution format and tool it seems.

My recent transition from RHEL to Ubuntu has made one thing very clear. This mess of packages is intractable. No package format is aware of the others yet they usually have dependencies that interconnect different package types. Pip has no knowledge of C libraries required by many Python packages to build. Us SAs usually end up crossing the streams to produce a working environment. Or we spend hours building packages of one specific type. (Only to spend even more time on them later.) The end result is often different package management systems stepping on each other and producing an unmaintainable and unreproducible system.

I've spent, probably, years of my career doing nothing but packaging. The advantages of packages are still just as relevant today as they were in the past. Its a core skill set for running large infrastructures.

Recently, I've just about given up trying to deal with packages. Throw-away VMs. Isolation environments. Images. Advanced configuration management tools. Applications with conflicting requirements. Does maintaining a well managed server even matter any more?

I believe it does. A well managed host system keeps things simple and the SAs sane. However, I believe that there should be a line drawn in the sand to keep the OS -- and tools that manage the OS -- separate from the applications running on that machine or VM. On the OS side of the line, RPMs or DEBs rule. Configuration management has an iron fist. Your configuration management and automation should also deploy your application containers. But now we find the line in the sand.

Your applications, its crazy requirements, as well as whatever abominable package management scheme needed to get the job done should live in Docker containers. Here, your configuration management is a git repo where you can easily rebuild your images. Here, we can use the tools we need that work the best for the situation at hand without causing harm to the host system or another application.

Perhaps Docker "packages" are, finally, the one packaging system to rule them all.

There's just one thing that itches. I know Fedora out right bans it. Packaging libraries with your applications means that when OpenSSL has a security vulnerability, you have to patch your OS -- and find everywhere else that library has been stuffed. Itch. Docker containers seem reasonable about this, but it still means rebuilding and restarting all containers. Itch.

Ansible and EC2

The last several months have been a deep dive into Ansible. Deterministic. Simplistic. Ideally push based. Uses standard YAML. (I've never been much for inventing your own language.) Most of this work has been with Amazon's EC2 service and Ansible's included dynamic inventory plugin for EC2 has been indispensable.

However, there's a lot more this inventory plugin could do. I've developed a series of patches, some of which I've submitted pull requests for. All of which can be had by cloning the master branch of my GitHub Ansible fork.

  • Do you have more than one AWS account? Need to manage machines in all of them? The multiple-aws-accounts branch teaches ec2.py to query multiple accounts given the proper IAM credentials.
  • Making groups from various tags and attributes in AWS is handy. But I wanted a way to just make arbitrary groups happen. The arbitrary_groups branch supports reading groups from the ansible_groups AWS tag on your EC2 instances.
  • Need additional information about the EBS stores mapped to your instnaces? The block_device_mapping branch exposes this as inventory variables from ec2.py.