LinuxCzar

Engineering Software, Linux, and Observability. The website of Jack Neely.    

Installing Cyanite: A Scalable Graphite Storage Backend

I’ve been experimenting with Cyanite to make my Graphite cluster more reliable. The main problem I face is when a data node goes down the Graphite web app, more or less, stops responding to requests. Cyanite is a daemon written in Clojure that runs on the JVM. The daemon is stateless and stores timeseries data in Cassandra.

I found the documentation a bit lacking, so here’s how to setup Cyanite to build a scalable Graphite storage backend.

  1. Acquire a Cassandra database cluster. You will need at least Cassandra 3.4. The Makefile tests use Cassandra 3.5. I used Cassandra 3.7 in my experiments which is the current release as of this writing. (Note Cassandra’s new Tick-Tock based release cycle.)

    Parts of the documentation indicated that Elasticsearch were required. That is no longer the case. Cyanite must store a searchable index of the metrics it has data points for so that it can resolve glob requests into a list of metrics. Example:

    carbon.agents.*.metricsReceived
    

    This is now done in Cassandra using SASI indexes which enable CQL SELECT statements to use the LIKE operator. This is the feature that requires a more recent Cassandra version that you may be running in production.

  2. Clone the Cyanite Git repository. There are no tags or releases. However, the rumor at Monitorama 2016 is that Cyanite is a stable and scalable platform. So I just grabbed the master branch.

    git clone https://github.com/pyr/cyanite.git
    
  3. Create a Cassandra user depending on your local policy. Import the schema to initially create the keyspace you will use. The schema is found in the repository:

    doc/schema.cql
    

    Here, I altered the schema to set the replication factor I wanted. So I created my keyspace like this:

    CREATE KEYSPACE IF NOT EXISTS metric WITH replication =
    {'class': 'SimpleStrategy', 'replication_factor': '3'}
    AND durable_writes = true;
    

    I’m only replicating in a Cassandra database that lives in a single data center. No cross data center replication strategies here…yet.

  4. Install Leiningen. This is the build system tool used by the Cyanite project. Its very friendly seeming and installs locally into your home directory. This allows you to build JARs and other distributable versions of the code.

  5. I need to distribute code as Debian packages for Ubuntu. Fortunately, we have a target to build just that.

    $ cd path/to/cyanite/repo
    $ lein fatdeb
    

    This should produce artifacts in the target/ directory.

  6. Install the Cyanite packages. Configure /etc/cyanite.yaml to match your storage schema file (from carbon-cache.py) and with the connection information about your Cassandra cluster.

    An example configuration with additional documentation can be found in the Cyanite repo.

    doc/cyanite.yaml
    

    Here is a sanitized version of my config. This required some parsing of the source to find needed options.

     1# Retention rules from storage-schema.conf
     2engine:
     3  rules:
     4    '^1sec\.*': [ "1s:14d" ]
     5    '^1min\.*': [ "60s:760d" ]
     6    '^carbon\..*': [ "60s:30d", "15m:2y" ]
     7    default: [ "60s:30d" ]
     8
     9# IP and PORT where the Cyanite REST API will bind
    10api:
    11  port: 8080
    12  host: 0.0.0.0
    13
    14# An input, carbon line protocol
    15input:
    16  - type: carbon
    17    port: 2003
    18    host: 0.0.0.0
    19
    20# Store the metric index in Cassandra SASI indexes
    21index:
    22  type: cassandra
    23  keyspace: 'metric'
    24  username: XXXXXX
    25  password: YYYYYY
    26  cluster:
    27    - cas-000.foobar.com
    28    - cas-001.foobar.com
    29    - cas-002.foobar.com
    30
    31# Time drift calculations.  I use / trust NTP.
    32drift:
    33  type: no-op
    34
    35# Timeseries are stored in Cassandra
    36store:
    37  keyspace: 'metric'
    38  username: XXXXXX
    39  password: YYYYYY
    40  cluster:
    41    - cas-000.foobar.com
    42    - cas-001.foobar.com
    43    - cas-002.foobar.com
    44
    45# Logging configuration.  See: https://github.com/pyr/unilog
    46logging:
    47  level: info
    48  console: true
    49  files:
    50    - "/var/log/cyanite/cyanite.log"
    51  overrides:
    52    io.cyanite: "debug"
  7. Cyanite should be startable at this point. You can test that it accepts carbon line protocol metrics and that they are returned by the Cyanite REST API.

  8. Package and install Graphite-API along with the Cyanite Python module. Graphite-API is stripped down version of the Graphite web application that uses plugable finders to search different storage backends as a Flask application. Python’s Pip can easily find these packages. This is a WSGI application so use what you would normally deploy these applications with. I use mod_wsgi with Apache to run this on port 80.

    A sample /etc/graphite-api.yaml to configure Graphite-API to use the Cyanite plugin and query the local Cyanite daemon.

     1# Where the graphite-api search index is built
     2search_index: /var/tmp/graphite-index
     3
     4# Plugins to use to find metrics
     5finders:
     6  - cyanite.CyaniteFinder
     7
     8# Additional Graphite functions
     9functions:
    10  - graphite_api.functions.SeriesFunctions
    11  - graphite_api.functions.PieFunctions
    12
    13# Cyanite Specific options
    14cyanite:
    15  urls:
    16    - http://127.0.0.1:8080
    17
    18time_zone: UTC

    My plan here is that I can deploy many of these Cyanite / Graphite-API machines in a load balanced fashion to support my query and write loads. They are completely stateless like any good web application so choose your favorite load balancing technique.

At this point you should have a basic Cyanite setup that is able to answer normal Graphite queries and ingest carbon metrics. You might want to use a tool like carbon-c-relay to route metrics into the Cyanite pool. You could point Grafana directly to the load balanced Graphite-API or use the normal Graphite web application (if you like the Graphite composer) and list the Graphite-API load balanced VIP as the single CLUSTER_SERVERS entry.

This should at least get you going with Cyanite as a Graphite storage backend. There will be much tuning and testing to transform this into a scalable system depending on your exact setup. I am just starting down this path and may have more to share in the future. Or it may blow up on me. Time will tell.

Update 2016/07/19: There are several other Graphite storage backends that I’m aware of. All are Cassandra based.

What am I missing?

 Previous  Up  Next


comments powered by Disqus