Working with a large and consistent hashing Graphite cluster I came across corrupt files. Corrupt files prevent from storing data to that specific metric database file. The backlog was starting to tank the cluster. I whipped out find and removed all zero-length files, as that is a common corruption case.

find /opt/graphite/storage/whisper -depth -name *.wsp -size 0c -type f -delete

However, I had a few more cases that were not zero-length files. A quick bit of Google’ing did not find much. Usually, reading the header of the WSP file is enough to have the Whisper code throw an exception, so using that I wrote Whisper-FSCK.

It will scan your tree of Whisper files and look for corrupted ones. With the optional -f argument it will move those files out of the way.

Pull requests welcome!

Previous | Back | Next

comments powered by Disqus