Tuesday, January 25, 2011

Classification of NoSQL Databases

NoSQL databases can be broadly classified as:

1. Distributed vs. Not-distributed databases

Distributed databases take the responsibility of data partitioning (for scalability) and replication (for availability) and do not leave that to the client. Non-distributed databases leaves the responsibility of data partitioning and replication on the clients.

Table 1: Distributed and Non-distributed databases

Distributed
Not Distributed
Amazon Dynamo
Amazon S3
Scalaris
Voldemort
CouchDb (thru Lounge)
Riak
MongoDb
BigTable
Cassandra
HyperTable
HBase
Redis Tokyo
Tyrant
MemcacheDb Amazon
SimpleDb

2. Disk vs. Memory databases

An useful dimension is whether the database is memory-driven or disk-driven. This is important since in the latter case an explicit cache would be required, while in the former case data is not durable.

Table 2: Memory driven and disk driven databases




MemoryConfigurableDisk
Scalaris
Redis
BigTable
Cassandra
Hbase
HyperTable
CouchDb
MongoDb
Riak
Voldemort

On one end of the spectrum is Scalaris which is entirely memory-driven, and Redis which is primarily memory oriented. Cassandra, BigTable, Hypertable, Hbase allow configuring how large the Memtable can get, so that provides a lot of control. CouchDb, MongoDb and Riak all use on-disk B+ trees, and Voldemort uses BDB and MySQL.

3. Data Model richness

On the basis of data model the various NoSQL databases can be grouped in following three groups.

3.1 Key-value Stores


These systems store values and an index to find them, based on a programmer-defined key. These data stores use a data model similar to the popular memcached distributed in-memory cache, with a single key-value index for all the data. Like memcached, none of these systems offer secondary indices or keys.

3.2 Document Stores


These systems store documents. The documents are indexed and a simple query mechanism may be provided. Document stores support more complex data than the key-value stores. The term “document store” is not ideal, because these systems store objects (generally objects without pointers, described in JSON notation), not necessarily documents. Unlike the key-value stores, they generally support multiple indexes and multiple types of documents (objects) per database, and they support complex values.

3.3 Column Stores

These systems store extensible records that can be partitioned across nodes. They are also refered as “Extensible Record Stores”. Their basic data model is rows and columns, and their basic scalability model is splitting both rows and columns over multiple nodes. Rows are split across nodes through conventional sharding, on the primary key. They typically split by range rather than a hash function (this means that queries on ranges of values do not have to go to every node). Columns of a table are distributed over multiple nodes by using “column groups”.

These may seem like a new complexity, but column groups are simply a way for the customer to indicate which columns are best grouped together. These two partitionings (horizontal and vertical) can be used simultaneously on the same table. The column groups must be pre-defined with the extensible record stores. However, that is not a big constraint, as new attributes can be defined at any time. Rows are not that dissimilar from documents: they can have a variable number of attributes (fields), the attribute names must be unique, rows are grouped into collections (tables), and an individual row’s attributes can be of any type.

Table 3: Classification of NoSQL databases based on data model




Key-Value store Document store Column-Store
Amazon Dynamo
Amazon S3
Redis
Scalaris
Voldemort
SimpleDb
Couchdb
MongoDb
Riak
Cassandra
Google BigTable
HBase
Hyperbase

No comments:

Post a Comment