Friday, December 10, 2010

An Intro to NoSQL

What is NoSQL


For a quarter of a century, the relational database (RDBMS) has been the dominant model for database management. In the past, relation databases were used for nearly everything. Because of their rich set of features, query capabilities and transaction management they seemed to be fit for almost every possible task one could imagine to do with a database. But their feature richness is also their flaw, because it makes building distributed RDBMSs very complex. In particular it is difficult and not very efficient to make transactions and join operations in a distributed system.

This is why, there are now some non relational databases with limited feature sets and no full ACID support, which are more suitable for the usage in a distributed environment. These databases are currently called NoSQL databases. The need to look at Non SQL systems arises out of scalability issues with relational databases, which are a function of the fact that relational databases were not designed to be distributed (which is key to write scalability), and could thus afford to provide abstractions like ACID transactions and a rich high-level query model. All NoSQL databases try and address the scalability issue in many ways – by being distributed, by providing a simpler data / query model, by relaxing consistency requirements, etc.

The name first suggests that these databases do not support the SQL query language and are not relational. But it also means "Not Only SQL", which is not so aggressive against relational databases. This stands for a new paradigm: One database technology alone is not fit for everything. Instead it is necessary to have different kinds of databases for different demands. Most NoSQL databases are developed to run on clusters consisting of commodity computers and therefore have to be distributed and failure tolerant. To achieve this, they have to make different trade-offs regarding the ACID properties, transaction management, query capabilities and performance. They are usually designed to fit the requirements of most web services and most of them are schema free and bring their own query languages.

Why NoSQL

Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. Today, the situation is slightly different. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability.

Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

Cloud computing also has placed new challenges on the database. The economic vision for cloud computing is to provide computing resources on demand with a "pay-as-you-go" model. A pool of computing resources can exploit economies of scale and a levelling of variable demand by adding or subtracting computing resources as workload demand changes. The traditional RDBMS has been unable to provide these types of elastic services. For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that
come with relational databases.

No comments:

Post a Comment