This is a update of an old post so its back at the top of the blog. Original posting was 2015-08-19.

I’m considering swapping out Statsd with Bitly’s statsdaemon for better performance. But, because Bitly’s version only accepts integer data I wanted to analyze our Statsd traffic. I figured I’d use my friend tcpdump to capture some trafic samples and replay them through a test box for analysis. Also, figuring out what are our hot metrics is very handy.

# tcpdump -s0 -w /tmp/statsd.pcap udp port 9125

Wireshark confirmed that this was the traffic I was looking for. A spot check looks like I have good integer data. How to dump out the traffic data so I can at least run grep and other common unix tools on the text data?

The Tcpreplay tools look very powerful. However, it can’t replay TCP traffic at a server daemon because it cannot synchronize the SYN/ACK numbers with the real client. But this is UDP taffic! UDP does provide checksums for data integrity so after changing the IP and MAC address via tcprewrite I had packets that my Linux box dropped because the checksum didn’t match.

Back to my friend Wireshark:

$ tshark -r /tmp/statsd.pcap -T fields -e data > data

This dumps out newline separated dump of the data field of each packet which is exactly what I need. Just not as hexadecimal encoded binary data.

import binascii
import sys

for s in open(sys.argv[1], "r").readlines():
    print binascii.unhexlify(s.strip())

Finally, I have newline separated list of the Statsd metrics in the pcap data and can finally run grep!

$ python data | gawk -F: '/.+/ { print $1 }' | sort | uniq -c | sort -n

Now I also have a frequency distribution chart of the packet capture showing me what the most common metrics are.

Previous | Back | Next

comments powered by Disqus