Anything, I like to write.: Classification of NoSQL Databases

NoSQL databases can be broadly classified as:

1. Distributed vs. Not-distributed databases

Distributed databases take the responsibility of data partitioning (for scalability) and replication (for availability) and do not leave that to the client. Non-distributed databases leaves the responsibility of data partitioning and replication on the clients.

Table 1: Distributed and Non-distributed databases

Distributed	Not Distributed
Amazon Dynamo Amazon S3 Scalaris Voldemort CouchDb (thru Lounge) Riak MongoDb BigTable Cassandra HyperTable HBase	Redis Tokyo Tyrant MemcacheDb Amazon SimpleDb

2. Disk vs. Memory databases

An useful dimension is whether the database is memory-driven or disk-driven. This is important since in the latter case an explicit cache would be required, while in the former case data is not durable.

Table 2: Memory driven and disk driven databases

Memory	Configurable	Disk
Scalaris Redis	BigTable Cassandra Hbase HyperTable	CouchDb MongoDb Riak Voldemort

On one end of the spectrum is Scalaris which is entirely memory-driven, and Redis which is primarily memory oriented. Cassandra, BigTable, Hypertable, Hbase allow configuring how large the Memtable can get, so that provides a lot of control. CouchDb, MongoDb and Riak all use on-disk B+ trees, and Voldemort uses BDB and MySQL.

3. Data Model richness

On the basis of data model the various NoSQL databases can be grouped in following three groups.

3.1 Key-value Stores

These systems store values and an index to find them, based on a programmer-defined key. These data stores use a data model similar to the popular memcached distributed in-memory cache, with a single key-value index for all the data. Like memcached, none of these systems offer secondary indices or keys.

3.2 Document Stores

These systems store documents. The documents are indexed and a simple query mechanism may be provided. Document stores support more complex data than the key-value stores. The term “document store” is not ideal, because these systems store objects (generally objects without pointers, described in JSON notation), not necessarily documents. Unlike the key-value stores, they generally support multiple indexes and multiple types of documents (objects) per database, and they support complex values.

3.3 Column Stores

These systems store extensible records that can be partitioned across nodes. They are also refered as “Extensible Record Stores”. Their basic data model is rows and columns, and their basic scalability model is splitting both rows and columns over multiple nodes. Rows are split across nodes through conventional sharding, on the primary key. They typically split by range rather than a hash function (this means that queries on ranges of values do not have to go to every node). Columns of a table are distributed over multiple nodes by using “column groups”.

These may seem like a new complexity, but column groups are simply a way for the customer to indicate which columns are best grouped together. These two partitionings (horizontal and vertical) can be used simultaneously on the same table. The column groups must be pre-defined with the extensible record stores. However, that is not a big constraint, as new attributes can be defined at any time. Rows are not that dissimilar from documents: they can have a variable number of attributes (fields), the attribute names must be unique, rows are grouped into collections (tables), and an individual row’s attributes can be of any type.

Table 3: Classification of NoSQL databases based on data model

Key-Value store	Document store	Column-Store
Amazon Dynamo Amazon S3 Redis Scalaris Voldemort	SimpleDb Couchdb MongoDb Riak	Cassandra Google BigTable HBase Hyperbase

Anything, I like to write.

Tuesday, January 25, 2011

Classification of NoSQL Databases

No comments:

Post a Comment