Apache Cassandra 1.2 brings a whole heap of compelling features that really make this great NoSQL database more performant, easier to maintain and more cost effective at scale. Read on for my top 5 features 1.2 brings to the table.
Prior to 1.2, the recommended data volume per node in a cluster was between 300 – 500GB, to be fair you could definitely go higher than that but it is not recommended. In 1.2, bloom filters and compression metadata is now off-heap and in native memory. This means you are no longer limited by garbage collection problems with large heap configurations (Java7 has the G1 collector which might mitigate this, if anyone has tried this get in touch!). Estimated upper bound now for data per node is 10TB. When you consider server costs and storage costs, this is great news for use cases where data volume is high.
For anyone with experience adding new nodes to a Cassandra cluster or bringing dead nodes back into a cluster they will have seen the pressure these operations can put on a loaded cluster. When a node is brought online it must get replicas of the token ranges it has under management. This means it must communicate with one or more nodes to get these replicas. This can in certain circumstances lead to an unbalanced data distribution in a cluster as new nodes will be assigned smaller token ranges than existing nodes. Thus re-balancing your cluster can be an error prone (human), time-consuming and i/o intensive operation.
Enter vnodes! Each node in the cluster contains many (256 recommended setting) virtual token ranges which are assigned randomly and non-contiguously throughout the cluster. This means to bring a new node into the cluster, it will begin syncing replicas from many or all nodes in much smaller chunks rather than putting i/o pressure on one or two nodes (depending on replication) to get fully in sync. This has the added bonus of maintaining a balanced cluster all the time as vnodes themselves are distributed evenly throughout the cluster. If you have heterogenous nodes you can assign more vnodes to a larger H/W footprint. Configuration is easy, in a new cluster deployment, just set the num_tokens parameters.
In short, vnodes overcome some of the manageability constraints in earlier versions of cassandra. Having experienced first hand, what a manual and often error-prone process it is to add nodes piecemeal or add a dead node due to a h/w failure back into the cluster this is a great feature in my opinion. If you’ve ever looked at OpsCentre and seen three especially obese nodes compared to the rest you will also realise how easy it is to end up with an unbalanced cluster if you aren’t careful. Use it! For more information see here.
See the diagram for an illustration.
1.2 introduces new configuration options for reacting to disk failures. SSTables can be distributed over multiple disks on a single node. How Cassandra reacts to individual disk failures per node is now tunable. disk_failure_policy allows you to specify what behaviour you want the node to have in the event of a disk failure. You can tell that node to stop serving requests, but to remain up for troubleshooting, or tell it to ignore the failure (not recommended). the best_effort option will tell Cassandra to attempt to write the data to any available functioning storage.
Ever have one of those situations where your cluster seems to be having a strop and your application tier is grinding to a halt. You’ve run cfhistograms and are digging through a myriad of stats to figure out what has gone wrong (nine times out of ten its probably something you did or didn’t do, RTFM). Enter query tracing which allows you to profile individual queries or to probabilistically sample all queries sent to a particular node. This can be a great way to figure out exactly what is happening during your query. I predict this will save developer and administrators a tonne of pain in the future.
Ok, so CQL has been around since v0.8 but in more recent versions the syntax has matured and the client libraries are mature enough that it has surpassed Thrift based APIs in terms of its language features, ease of use and understandability. 1.2 introduced a binary protocol for CQL requests instead of running CQL queries via thrift. There are several advantages to this. Clients can register for push type events from the cluster, these events convey critical information about cluster state changes or schema updates, meaning the client does not have the poll the cluster for this information periodically. The other main advantage with the new protocol is it allows the use of non-blocking i/o. The Java driver supports futures, which are more more resource friendly from an applications perspective. While thrift supported a single request per worker thread, the new binary interface will allow many requests per thread.
In terms of language features 1.2 introduces collections such as sets, lists and maps which will suit some use cases where a column contains immutable collection data.
These recent developments in the CQL realm infer that in a couple of versions Thrift based clients will be a thing of the past! This should mean, less code to get up and running with Cassandra, and as convention takes over, a lower likelyhood that you get it wrong from the client perspective.
This article only scratched the surface of some of the cool new features in 1.2. Personally I’m excited to have vnodes/off-heap bloom/metadata features available to me as I have experienced first hand problems that can arise without these features. The great thing about Apache Cassandra is the contributors really focus on delivering features that the industry are crying out for. For details on how to upgrade see here.
Niall Milton is CTO of DigBigData and a certified DataStax Cassandra Developer/Administrator