Analyzing or Replaying UDP Statsd Data
This is a update of an old post so its back at the top of the blog. Original posting was 2015-08-19.
I’m considering swapping out Statsd with Bitly’s statsdaemon for
better performance. But, because Bitly’s version only accepts integer
data I wanted to analyze our Statsd traffic. I figured I’d use my friend
tcpdump
to capture some trafic samples and replay them through a test
box for analysis. Also, figuring out what are our hot metrics is very
handy.
# tcpdump -s0 -w /tmp/statsd.pcap udp port 9125
Wireshark confirmed that this was the traffic I was looking for. A spot
check looks like I have good integer data. How to dump out the traffic
data so I can at least run grep
and other common unix tools on the text
data?
The Tcpreplay tools look very powerful. However, it can’t replay TCP
traffic at a server daemon because it cannot synchronize the SYN/ACK numbers
with the real client. But this is UDP taffic! UDP does provide checksums
for data integrity so after changing the IP and MAC address via tcprewrite
I had packets that my Linux box dropped because the checksum didn’t match.
Back to my friend Wireshark:
$ tshark -r /tmp/statsd.pcap -T fields -e data > data
This dumps out newline separated dump of the data field of each packet which is exactly what I need. Just not as hexadecimal encoded binary data.
import binascii
import sys
for s in open(sys.argv[1], "r").readlines():
print binascii.unhexlify(s.strip())
Finally, I have newline separated list of the Statsd metrics in the pcap data and can finally run grep!
$ python unhex.py data | gawk -F: '/.+/ { print $1 }' | sort | uniq -c | sort -n
Now I also have a frequency distribution chart of the packet capture showing me what the most common metrics are.