Installing Cyanite: A Scalable Graphite Storage Backend

I’ve been experimenting with Cyanite to make my Graphite cluster more reliable. The main problem I face is when a data node goes down the Graphite web app, more or less, stops responding to requests. Cyanite is a daemon written in Clojure that runs on the JVM. The daemon is stateless and stores timeseries data in Cassandra.

I found the documentation a bit lacking, so here’s how to setup Cyanite to build a scalable Graphite storage backend.

Acquire a Cassandra database cluster. You will need at least Cassandra 3.4. The Makefile tests use Cassandra 3.5. I used Cassandra 3.7 in my experiments which is the current release as of this writing. (Note Cassandra’s new Tick-Tock based release cycle.)

Parts of the documentation indicated that Elasticsearch were required. That is no longer the case. Cyanite must store a searchable index of the metrics it has data points for so that it can resolve glob requests into a list of metrics. Example:
```
carbon.agents.*.metricsReceived
```
This is now done in Cassandra using SASI indexes which enable CQL SELECT statements to use the LIKE operator. This is the feature that requires a more recent Cassandra version that you may be running in production.
Clone the Cyanite Git repository. There are no tags or releases. However, the rumor at Monitorama 2016 is that Cyanite is a stable and scalable platform. So I just grabbed the master branch.
```
git clone https://github.com/pyr/cyanite.git
```
Create a Cassandra user depending on your local policy. Import the schema to initially create the keyspace you will use. The schema is found in the repository:
```
doc/schema.cql
```
Here, I altered the schema to set the replication factor I wanted. So I created my keyspace like this:
```
CREATE KEYSPACE IF NOT EXISTS metric WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': '3'}
AND durable_writes = true;
```
I’m only replicating in a Cassandra database that lives in a single data center. No cross data center replication strategies here…yet.
Install Leiningen. This is the build system tool used by the Cyanite project. Its very friendly seeming and installs locally into your home directory. This allows you to build JARs and other distributable versions of the code.
I need to distribute code as Debian packages for Ubuntu. Fortunately, we have a target to build just that.
```
$ cd path/to/cyanite/repo
$ lein fatdeb
```
This should produce artifacts in the target/ directory.

Install the Cyanite packages. Configure /etc/cyanite.yaml to match your storage schema file (from carbon-cache.py) and with the connection information about your Cassandra cluster.

An example configuration with additional documentation can be found in the Cyanite repo.

doc/cyanite.yaml

Here is a sanitized version of my config. This required some parsing of the source to find needed options.

 1# Retention rules from storage-schema.conf
 2engine:
 3  rules:
 4    '^1sec\.*': [ "1s:14d" ]
 5    '^1min\.*': [ "60s:760d" ]
 6    '^carbon\..*': [ "60s:30d", "15m:2y" ]
 7    default: [ "60s:30d" ]
 8
 9# IP and PORT where the Cyanite REST API will bind
10api:
11  port: 8080
12  host: 0.0.0.0
13
14# An input, carbon line protocol
15input:
16  - type: carbon
17    port: 2003
18    host: 0.0.0.0
19
20# Store the metric index in Cassandra SASI indexes
21index:
22  type: cassandra
23  keyspace: 'metric'
24  username: XXXXXX
25  password: YYYYYY
26  cluster:
27    - cas-000.foobar.com
28    - cas-001.foobar.com
29    - cas-002.foobar.com
30
31# Time drift calculations.  I use / trust NTP.
32drift:
33  type: no-op
34
35# Timeseries are stored in Cassandra
36store:
37  keyspace: 'metric'
38  username: XXXXXX
39  password: YYYYYY
40  cluster:
41    - cas-000.foobar.com
42    - cas-001.foobar.com
43    - cas-002.foobar.com
44
45# Logging configuration.  See: https://github.com/pyr/unilog
46logging:
47  level: info
48  console: true
49  files:
50    - "/var/log/cyanite/cyanite.log"
51  overrides:
52    io.cyanite: "debug"

Cyanite should be startable at this point. You can test that it accepts carbon line protocol metrics and that they are returned by the Cyanite REST API.
Package and install Graphite-API along with the Cyanite Python module. Graphite-API is stripped down version of the Graphite web application that uses plugable finders to search different storage backends as a Flask application. Python’s Pip can easily find these packages. This is a WSGI application so use what you would normally deploy these applications with. I use mod_wsgi with Apache to run this on port 80.

A sample /etc/graphite-api.yaml to configure Graphite-API to use the Cyanite plugin and query the local Cyanite daemon.
```
 1# Where the graphite-api search index is built
 2search_index: /var/tmp/graphite-index
 3
 4# Plugins to use to find metrics
 5finders:
 6  - cyanite.CyaniteFinder
 7
 8# Additional Graphite functions
 9functions:
10  - graphite_api.functions.SeriesFunctions
11  - graphite_api.functions.PieFunctions
12
13# Cyanite Specific options
14cyanite:
15  urls:
16    - http://127.0.0.1:8080
17
18time_zone: UTC
```
My plan here is that I can deploy many of these Cyanite / Graphite-API machines in a load balanced fashion to support my query and write loads. They are completely stateless like any good web application so choose your favorite load balancing technique.

At this point you should have a basic Cyanite setup that is able to answer normal Graphite queries and ingest carbon metrics. You might want to use a tool like carbon-c-relay to route metrics into the Cyanite pool. You could point Grafana directly to the load balanced Graphite-API or use the normal Graphite web application (if you like the Graphite composer) and list the Graphite-API load balanced VIP as the single CLUSTER_SERVERS entry.

This should at least get you going with Cyanite as a Graphite storage backend. There will be much tuning and testing to transform this into a scalable system depending on your exact setup. I am just starting down this path and may have more to share in the future. Or it may blow up on me. Time will tell.

Update 2016/07/19: There are several other Graphite storage backends that I’m aware of. All are Cassandra based.

https://github.com/EinsamHauer/disthene – A Cyanite compatible project written in Java aiming for performance.
https://github.com/raintank/metrictank – The friendly folks at RainTank appear to be cooking up a Go codebase for metric storage. A young project but lots of promise here.

What am I missing?

LinuxCzar

Installing Cyanite: A Scalable Graphite Storage Backend