Engineering Software, Linux, and Operations. The website of Jack Neely.    

Installing Cyanite: A Scalable Graphite Storage Backend

I’ve been experimenting with Cyanite to make my Graphite cluster more reliable. The main problem I face is when a data node goes down the Graphite web app, more or less, stops responding to requests. Cyanite is a daemon written in Clojure that runs on the JVM. The daemon is stateless and stores timeseries data in Cassandra.

I found the documentation a bit lacking, so here’s how to setup Cyanite to build a scalable Graphite storage backend.

  1. Acquire a Cassandra database cluster. You will need at least Cassandra 3.4. The Makefile tests use Cassandra 3.5. I used Cassandra 3.7 in my experiments which is the current release as of this writing. (Note Cassandra’s new Tick-Tock based release cycle.)

    Parts of the documentation indicated that Elasticsearch were required. That is no longer the case. Cyanite must store a searchable index of the metrics it has data points for so that it can resolve glob requests into a list of metrics. Example:


    This is now done in Cassandra using SASI indexes which enable CQL SELECT statements to use the LIKE operator. This is the feature that requires a more recent Cassandra version that you may be running in production.

  2. Clone the Cyanite Git repository. There are no tags or releases. However, the rumor at Monitorama 2016 is that Cyanite is a stable and scalable platform. So I just grabbed the master branch.

    git clone
  3. Create a Cassandra user depending on your local policy. Import the schema to initially create the keyspace you will use. The schema is found in the repository:


    Here, I altered the schema to set the replication factor I wanted. So I created my keyspace like this:

    CREATE KEYSPACE IF NOT EXISTS metric WITH replication =
    {'class': 'SimpleStrategy', 'replication_factor': '3'}
    AND durable_writes = true;

    I’m only replicating in a Cassandra database that lives in a single data center. No cross data center replication strategies here…yet.

  4. Install Leiningen. This is the build system tool used by the Cyanite project. Its very friendly seeming and installs locally into your home directory. This allows you to build JARs and other distributable versions of the code.

  5. I need to distribute code as Debian packages for Ubuntu. Fortunately, we have a target to build just that.

    $ cd path/to/cyanite/repo
    $ lein fatdeb

    This should produce artifacts in the target/ directory.

  6. Install the Cyanite packages. Configure /etc/cyanite.yaml to match your storage schema file (from and with the connection information about your Cassandra cluster.

    An example configuration with additional documentation can be found in the Cyanite repo.


    Here is a sanitized version of my config. This required some parsing of the source to find needed options.

     1# Retention rules from storage-schema.conf
     3  rules:
     4    '^1sec\.*': [ "1s:14d" ]
     5    '^1min\.*': [ "60s:760d" ]
     6    '^carbon\..*': [ "60s:30d", "15m:2y" ]
     7    default: [ "60s:30d" ]
     9# IP and PORT where the Cyanite REST API will bind
    11  port: 8080
    12  host:
    14# An input, carbon line protocol
    16  - type: carbon
    17    port: 2003
    18    host:
    20# Store the metric index in Cassandra SASI indexes
    22  type: cassandra
    23  keyspace: 'metric'
    24  username: XXXXXX
    25  password: YYYYYY
    26  cluster:
    27    -
    28    -
    29    -
    31# Time drift calculations.  I use / trust NTP.
    33  type: no-op
    35# Timeseries are stored in Cassandra
    37  keyspace: 'metric'
    38  username: XXXXXX
    39  password: YYYYYY
    40  cluster:
    41    -
    42    -
    43    -
    45# Logging configuration.  See:
    47  level: info
    48  console: true
    49  files:
    50    - "/var/log/cyanite/cyanite.log"
    51  overrides:
    52    io.cyanite: "debug"
  7. Cyanite should be startable at this point. You can test that it accepts carbon line protocol metrics and that they are returned by the Cyanite REST API.

  8. Package and install Graphite-API along with the Cyanite Python module. Graphite-API is stripped down version of the Graphite web application that uses plugable finders to search different storage backends as a Flask application. Python’s Pip can easily find these packages. This is a WSGI application so use what you would normally deploy these applications with. I use mod_wsgi with Apache to run this on port 80.

    A sample /etc/graphite-api.yaml to configure Graphite-API to use the Cyanite plugin and query the local Cyanite daemon.

     1# Where the graphite-api search index is built
     2search_index: /var/tmp/graphite-index
     4# Plugins to use to find metrics
     6  - cyanite.CyaniteFinder
     8# Additional Graphite functions
    10  - graphite_api.functions.SeriesFunctions
    11  - graphite_api.functions.PieFunctions
    13# Cyanite Specific options
    15  urls:
    16    -
    18time_zone: UTC

    My plan here is that I can deploy many of these Cyanite / Graphite-API machines in a load balanced fashion to support my query and write loads. They are completely stateless like any good web application so choose your favorite load balancing technique.

At this point you should have a basic Cyanite setup that is able to answer normal Graphite queries and ingest carbon metrics. You might want to use a tool like carbon-c-relay to route metrics into the Cyanite pool. You could point Grafana directly to the load balanced Graphite-API or use the normal Graphite web application (if you like the Graphite composer) and list the Graphite-API load balanced VIP as the single CLUSTER_SERVERS entry.

This should at least get you going with Cyanite as a Graphite storage backend. There will be much tuning and testing to transform this into a scalable system depending on your exact setup. I am just starting down this path and may have more to share in the future. Or it may blow up on me. Time will tell.

Update 2016/07/19: There are several other Graphite storage backends that I’m aware of. All are Cassandra based.

What am I missing?

 Previous  Up  Next

comments powered by Disqus