The 1.2 release of Apache Cassandra brought some great new features which really focused on making life easier for both developers and operations staff operating clusters. Check out our blog post about why it was so great! In 2.0, the contributors are continuing their roll with the introduction of lightweight transactions and some great improvements to CQL.
You won’t get a better description than checking out Jonathan Ellis’ post here. I will add that it is worth checking out CASSADRA-5062 and having a read about the evolution of this feature. Also the paxos made simple paper is an excellent resource, see here. The addition of lightweight transactions has big ramifications for the adoption of Apache Cassandra in Enterprise. Adding a compare and set feature resolves one of the remaining reasons why companies run an RDBMS in tandem with Cassandra. While Cassandra is simple to scale compared to an RDBMS, previous versions did not satisfy use cases where more traditional transactional functionality was required.
In practice the new CQL syntax makes it easy to execute an insert or update while avoiding funky, complicated hacks on the client or using a third component such as Zookeeper for coordination of updates. I hope to add more to this soon and do some benchmarking on this feature. The early estimates are that there may be a 30% performance hit compared to non logged writes, but I haven’t found any more information on this. Consider it another tool in the box, but just don’t migrate all your writes to using it!
1. Conditional schema updates for keyspace or table creation similar to MySQL. This simplifies data loading scripts or logical flow through exception which is always messy.
2. Transactions as mentioned above.
3. Triggers interface. An interface has been exposed for implementing custom triggers on a table (CF). The example given is of an automatic inverted index update on insert. This in itself is rather handy. Other examples might be for auditing updates in a replayable transaction log in a separate table. Or for some form of state validation (albeit horribly inefficient in most cases) based on the current values in the table. Or for some sort of automatic tombstoning of data based on the given mutation. This feature could be heavily abused I’d imagine but as long as people are benchmarking their trigger code it will be a useful tool to move some of the complex client interactions into the server in a flexible and more performant way. For smaller datasets this presents some nice options. In a recent talk by Jonathan Ellis, he stated that the triggers interface will most certainly change in the coming versions (slides available here).
Over the past year or so, levelled compaction strategy is getting exercised for heavier workloads. The guys at Blue Mountain Capital noticed that at L0, if there are many files being written to this level at a given time, faster than they are being promoted to L1, reads slow down. In order to avoid this they implemented a hybrid compaction strategy which has since been adopted in Apache Cassandra 2.0. If the number of files in L0 grows to large cassandra will perform a SizeTieredCompaction on some of the files in L0, creating a larger file which is more performant for reads as the number of files to merge partition data from is less. Read more about this here. If you are interested the code is here. Anecdotally, using LevelledCompactionStrategy does keep many more open file handles on a node, this should help keep that number a little more sane.
Finally, a mention that Java7 is now required for Cassandra, as Java6 was EOL’d a year ago the team felt it was time to start taking advantage of some of Java7’s features. More about the internal changes found in 2.0 can be found here. I’m excited to see the evolution of this technology in a compelling direction.