Pages

Thursday, 11 May 2017

The Rise and Rule of Cassandra

The world of database is dark and full of terrors. We spent decades working happily upon relational databases, until realizing one day that relations are not enough. This paved way for NoSQL databases, or alternatively, any database that do not use tables. These new databases were shiny and cool, but could not match the massive power that Oracle and SQLServer wielded. This changed with the arrival of Cassandra.

When two scientists in Facebook in 2008 decided to build a database different from other NoSQL databases, they only wanted to have their own database. But little did they know of the impact they would have on the industry.

Cassandra was introduced with a sole objective: to solve the crisis of scalability. And it managed that beautifully. In fact, it has consistently been cited as the only NoSQL database that can take as many machines as could be added to it, without breaking a sweat. In 2012, a group of researchers from University of Toronto declared that as far as scalability goes, there is no match for Cassandra. But this has not been the only reason behind the vast popularity of Cassandra.

Cassandra prides itself as having no "single-point of failure", which implies that there is no single component whose failure can shut down the whole database. In a world where transactions are carried out every second, this feature of robustness is of vital importance. Many talk about decentralization, but nobody does it better than Cassandra.

But having a couple of advantages does not make you better, not in a world of ruthless competition. MongoDB, Redis and others would not be amused by a database who would take away their market with a couple of features. This is why Cassandra tried to achieve perfection. Its fault-tolerant and decentralized nature makes it extremely durable, thus being the perfect choice for those organizations who cannot afford to lose even an ounce of data. The throughput increase is linear with respect to growth in size, which makes it extremely desirable for databases which are growing constantly. After providing all these features, it is not a surprise that Cassandra is trusted by some of the biggest names in the industry, including CERN, eBay, Instagram, GoDaddy, Netflix and Reddit. In fact, Apple's deployment of Cassandra stores a whopping 10 PB of data across 75000 (and growing) nodes. Cassandra can give lessons on scalability to every other non-relational database.

That said, Cassandra is still not the most popular database around; it is not even the most popular NoSQL database right now. There are few inherent flaws that Cassandra needs to fix, including simplified deployment, simplified operational maintenance and an improved web interface, among other things. There is still the issue of low predictability of performance (which was partially reduced, but never solved) and the complexity of APIs in the client libraries which is nothing but unnecessary. But Cassandra is growing strong, and the time is not far when it will be a common name among all DB designers.

No comments:

Post a Comment