Apache Cassandra 2.1 Features

Posted on

With Cassandra 2.1 in beta we thought it would be useful to put the latest and upcoming features under the microscope and see what is prominent in the next version. There are several things we are really excited about, read on to find out more!

Some of the features below have been introduced between versions, where they are already in a release version we have marked them appropriately. We already covered lightweight transactions in a previous article so check that out.

 

Rapid Read Protection (introduced 2.0.2) is one of those features for me that when it is described you say to yourself ‘Doh, it makes so much sense why didn’t I think of that?’.

Lets take a step back. Currently when I make a read request any node can act as coordinator (unless I am using a token aware client). As node states are exchanged in the cluster via gossip every second, there is a very good chance that the last known node state is valid. So given a replication factor of 3 for example, if the coordinator has a report via gossip that all nodes are up and it attempts to read from a single node it will usually get a response. However, if for some reason the node fails and the state changes during the request, the client will receive a timeout error. As in our case there are multiple replicas available which may still be able to satisfy the desired consistency level, why wouldn’t we try a different node for our data?

That is what rapid read protection does for us. If the coordinator gets a timeout failure from a node that has gone down it will try another node which has a replica of the requested data. This means the data can still be returned to the client. In respect of what the experience from the client would be, there will be a higher request time as the coordinator is making the request again to another node. But this should lead to less timeout errors on the client should a node be lost mid flight.

To enable do the following…

ALTER TABLE users WITH speculative_retry = '10ms';

(re-try the request on another replica if no reply is forthcoming after 10ms)

ALTER TABLE users WITH speculative_retry = '99percentile';

(re-try the request on another replica if no reply is forthcoming after the 99th percentile typical request latency)

 

User-defined Types are up next. Have you ever found yourself serializing / deserializing JSON blobs in fields in Cassandra, or a blob of Protobuf? This approach has a couple of awkward facets, especially around object versioning and how changes in the model are managed. Also from the perspective of using an ORM, blobs containing complex objects add complexity in how they need to be managed by the persistence layer. As the development of more comprehensive data providers for Cassandra (e.g. Spring Data), the definition and management of the data model via entities / POJOs needs to be as simple as possible for development.

For example, a number of years ago I produced an glitch events system for a television company. Any event had to be entered into the system with a number of comments about the event pertaining to different facets of the interruption to the program.

Using user-defined types, I could model this like so…

cqlsh:test21> CREATE TYPE comment ( user varchar, description varchar, created timestamp);
cqlsh:test21> CREATE TABLE events (station_id varchar, event_time timestamp, comments map<int,comment>, primary key(station_id, event_time));

cqlsh:test21> insert into events (station_id, event_time, comments) values ('1', dateOf(now()), {1:{user:'niall', description:'lost sound', created:'2014-01-01 00:00:00'}});

cqlsh:test21> insert into events (station_id, event_time, comments) values ('1', dateOf(now()), {2:{user:'mick', description:'fuzzy picture', created:'2014-01-01 00:00:00'}});

cqlsh:test21> select * from events;
station_id | event_time | comments
------------+--------------------------+----------------------------------------------------------------------------------------
1 | 2014-04-16 13:21:14+0100 | {1: {user: 'niall', description: 'lost sound', created: '2014-01-01 00:00:00+0000'}}
1 | 2014-04-16 13:21:50+0100 | {2: {user: 'mick', description: 'fuzzy picture', created: '2014-01-01 00:00:00+0000'}}

(2 rows)

The above is probably not the way to go with this unless you have a low number of comments per event, or unless they are immutable. Remember the 64KB limit for collection data types.

 

Counters have been updated while maintaining backwards compatibility. The previous implementation, while useful in many cases, was not considering failure proof in the sense that edge cases could have resulted in incorrect counts as counter operations are indempotent. The new counter implementation is simpler, repairable and generate less internode traffic.

 

Another example of a simple but powerful i/o enhancement is in the introduction of a flush directory. When cassandra memtables are flushed to disk, previously this was contending for iops with background compaction. The effect was this, while from a disk perspective sequential writes are efficient, if there are ongoing compactions where SSTables are being read for compaction purposes, both operations are contending for i/o.

If you are writing at a high velocity into your cluster and memtables are continuously being flushed to disk you don’t want this to be slowed down by background compaction trying to keep up. The other interesting possibilities are for using higher performance disk for the flush directory. As the docs say, if you only have two disks, share the flush directory with the data directory, not with the commit log.

 

Other features introduced are HyperLogLog for estimating overlap between SSTables efficiently, row cache improvements and secondary indexes on collections. Find out more in the CHANGES.txt file in the source distribution and from this deck from Jonathan Ellis.