1. Distributed vs. Not-distributed databases
Distributed databases take the responsibility of data partitioning (for scalability) and replication (for availability) and do not leave that to the client. Non-distributed databases leaves the responsibility of data partitioning and replication on the clients.
Table 1: Distributed and Non-distributed databases
Distributed | Not Distributed |
Amazon Dynamo Amazon S3 Scalaris Voldemort CouchDb (thru Lounge) Riak MongoDb BigTable Cassandra HyperTable HBase | Redis Tokyo Tyrant MemcacheDb Amazon SimpleDb |
2. Disk vs. Memory databases
An useful dimension is whether the database is memory-driven or disk-driven. This is important since in the latter case an explicit cache would be required, while in the former case data is not durable.
Table 2: Memory driven and disk driven databases
Memory | Configurable | Disk |
Scalaris Redis | BigTable Cassandra Hbase HyperTable | CouchDb MongoDb Riak Voldemort |
On one end of the spectrum is Scalaris which is entirely memory-driven, and Redis which is primarily memory oriented. Cassandra, BigTable, Hypertable, Hbase allow configuring how large the Memtable can get, so that provides a lot of control. CouchDb, MongoDb and Riak all use on-disk B+ trees, and Voldemort uses BDB and MySQL.
3. Data Model richness
On the basis of data model the various NoSQL databases can be grouped in following three groups.
3.1 Key-value Stores
These systems store values and an index to find them, based on a programmer-defined key. These data stores use a data model similar to the popular memcached distributed in-memory cache, with a single key-value index for all the data. Like memcached, none of these systems offer secondary indices or keys.
3.2 Document Stores
These systems store documents. The documents are indexed and a simple query mechanism may be provided. Document stores support more complex data than the key-value stores. The term “document store” is not ideal, because these systems store objects (generally objects without pointers, described in JSON notation), not necessarily documents. Unlike the key-value stores, they generally support multiple indexes and multiple types of documents (objects) per database, and they support complex values.
3.3 Column Stores These systems store extensible records that can be partitioned across nodes. They are also refered as “Extensible Record Stores”. Their basic data model is rows and columns, and their basic scalability model is splitting both rows and columns over multiple nodes. Rows are split across nodes through conventional sharding, on the primary key. They typically split by range rather than a hash function (this means that queries on ranges of values do not have to go to every node). Columns of a table are distributed over multiple nodes by using “column groups”.
These may seem like a new complexity, but column groups are simply a way for the customer to indicate which columns are best grouped together. These two partitionings (horizontal and vertical) can be used simultaneously on the same table. The column groups must be pre-defined with the extensible record stores. However, that is not a big constraint, as new attributes can be defined at any time. Rows are not that dissimilar from documents: they can have a variable number of attributes (fields), the attribute names must be unique, rows are grouped into collections (tables), and an individual row’s attributes can be of any type.
Table 3: Classification of NoSQL databases based on data model
Key-Value store | Document store | Column-Store |
Amazon Dynamo Amazon S3 Redis Scalaris Voldemort | SimpleDb Couchdb MongoDb Riak | Cassandra Google BigTable HBase Hyperbase |
No comments:
Post a Comment