Working with a large and consistent hashing Graphite cluster I came across
corrupt files. Corrupt files prevent
carbon-cache.py from storing data
to that specific metric database file. The backlog was starting to tank
the cluster. I whipped out
find and removed all zero-length files, as
that is a common corruption case.
find /opt/graphite/storage/whisper -depth -name *.wsp -size 0c -type f -delete
However, I had a few more cases that were not zero-length files. A quick bit of Google’ing did not find much. Usually, reading the header of the WSP file is enough to have the Whisper code throw an exception, so using that I wrote Whisper-FSCK.
It will scan your tree of Whisper files and look for corrupted ones. With
-f argument it will move those files out of the way.
Pull requests welcome!