The world of database is dark and full of terrors.
We spent decades working happily upon relational databases, until realizing one
day that relations are not enough. This paved way for NoSQL databases, or
alternatively, any database that do not use tables. These new databases were
shiny and cool, but could not match the massive power that Oracle and SQLServer
wielded. This changed with the arrival of Cassandra.
When two scientists in Facebook in 2008 decided to
build a database different from other NoSQL databases, they only wanted to have
their own database. But little did they know of the impact they would have on
the industry.
Cassandra
was introduced with a sole objective: to solve the crisis of scalability. And
it managed that beautifully. In fact, it has consistently been cited as the
only NoSQL database that can take as many machines as could be added to it,
without breaking a sweat. In 2012, a group of researchers from University of
Toronto declared that as far as scalability goes, there is no match for
Cassandra. But this has not been the only reason behind the vast popularity of
Cassandra.
Cassandra prides itself as having no
"single-point of failure", which implies that there is no single
component whose failure can shut down the whole database. In a world where
transactions are carried out every second, this feature of robustness is of
vital importance. Many talk about decentralization, but nobody does it better
than Cassandra.
But having a couple of advantages does not make
you better, not in a world of ruthless competition. MongoDB, Redis and others
would not be amused by a database who would take away their market with a
couple of features. This is why Cassandra tried to achieve perfection. Its
fault-tolerant and decentralized nature makes it extremely durable, thus being the
perfect choice for those organizations who cannot afford to lose even an ounce
of data. The throughput increase is linear with respect to growth in size,
which makes it extremely desirable for databases which are growing constantly.
After providing all these features, it is not a surprise that Cassandra is
trusted by some of the biggest names in the industry, including CERN, eBay,
Instagram, GoDaddy, Netflix and Reddit. In fact, Apple's deployment of
Cassandra stores a whopping 10 PB of data across 75000 (and growing) nodes.
Cassandra can give lessons on scalability to every other non-relational
database.
That said, Cassandra is still not the most popular
database around; it is not even the most popular NoSQL database right now.
There are few inherent flaws that Cassandra needs to fix, including simplified
deployment, simplified operational maintenance and an improved web interface,
among other things. There is still the issue of low predictability of
performance (which was partially reduced, but never solved) and the complexity
of APIs in the client libraries which is nothing but unnecessary. But Cassandra
is growing strong, and the time is not far when it will be a common name among
all DB designers.