Tuesday, July 12, 2011

Few interesting quotes by "Einstein"

  • Black holes are where God divided by zero.
  • Coincidence is God's way of remaining anonymous.
  • It is not that I'm so smart. But I stay with the questions much longer.
  • The hardest thing in the world to understand is the income tax.
  • Reality is merely an illusion, albeit a very persistent one.
  • Science is a wonderful thing if one does not have to earn one's living at it.
  • The only thing that interferes with my learning is my education.
  • The most incomprehensible thing about the world is that it is comprehensible.
  • We can't solve problems by using the same kind of thinking we used when we created them.
  • Education is what remains after one has forgotten everything he learned in school.
  • Gravitation is not responsible for people falling in love.
  • Any man who can drive safely while kissing a pretty girl is simply not giving the kiss the attention it deserves.
  • If A is a success in life, then A equals x plus y plus z. Work is x; y is play; and z is keeping your mouth shut.
  • Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe.
  • As far as the laws of mathematics refer to reality, they are not certain, as far as they are certain, they do not refer to reality.
  • I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.
  • In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
  • No, this trick won't work. How on earth are you ever going to explain in terms of chemistry and physics so important a biological phenomenon as first love?
  • Not everything that counts can be counted, and not everything that can be counted counts.
  • A person who never made a mistake never tried anything new.
  • An empty stomach is not a good political adviser.
  • Any intelligent fool can make things bigger and more complex. It takes a touch of genius - and a lot of courage to move in the opposite direction.
  • Any man who reads too much and uses his own brain too little falls into lazy habits of thinking.
  • There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.
  • The difference between genius and stupidity is; genius has its limits.
  • I have no special talents. I am only passionately curious.
  • Everybody is a genius. But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.
  • With fame I become more and more stupid, which of course is a very common phenomenon.
  • A little knowledge is a dangerous thing. So is a lot.

source: http://www.goodreads.com/author/quotes/9810.Albert_Einstein

Tuesday, May 24, 2011

Logging Messages using Scribe and PHP

What's Scribe

Scribe is developed and open sourced by facebook. Its their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions and many others. Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed file system, or send them to another layer of scribe servers.

To know more about scribe visit scribe on github

Install Scribe:

Prerequisite:

--libevent, Event Notification library
--boost, Boost C++ library (version 1.36 or later)
Note: For latest scribe version 2.2 you need to install boost library version 1.45 or lower.
scribe 2.2 is not compatible with boost 1.46 or higher.
you can download boost 1.45 from here.
--thrift, version 0.5.0 or later
--fb303, Facebook Bassline (included in thrift/contrib/fb303/) r697294 or later is required.

You can download latest version of scribe from here.

Steps to install:

1. Install libevent if not already installed
2. Install boost version between 1.36 to 1.45
3. If boost is installed in a non-default location or there are multiple boost versions installed, you will need to set the Boost path and library names
export BOOST_ROOT=/opt/boost_1_45_1
export LD_LIBRARY_PATH=/opt/boost_1_45_1/stage/lib
4. Install thrift
5. Install fb303
6. Now install scribe
untar the source and run
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
./bootstrap.sh
make
make install

A sample scribe config file:

#This file configures Scribe to listen for messages on port 1463 and write them to /var/log/scribelogs

port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT
<store>
category=default
type=buffer

target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10

<primary>
type=file
fs_type=std
file_path=/var/log/scribelogs
base_filename=thisisoverwritten
max_size=1000000
add_newlines=1
</primary>

<secondary>
type=file
fs_type=std
file_path=/tmp
base_filename=thisisoverwritten
max_size=3000000
</secondary>
</store>
Please go through this README file to learn more advance scribe config options.

Running a scribe server:

run the follwoing command to run scribe server

path_to_scribe_dir/src/scribed path_to_config_file/config.conf


Write a Scribe Client in PHP:

In order to write a client in PHP, you will need the following:

  • Thrift PHP library files
  • Client libraries for scribe
  • Client libraries for fb303

Step 0: Create a folder for your client files, for example /home/gaurav/scribephplibs

Step 1: Get Thrift PHP library files
Thrift php library files are bundled with thrift source, you can just copy them to your folder
cp -R path_to_thrift_source/lib/php/src /home/gaurav/scribephplibs

Step 2: Generate client library for scribe
thrift -o /home/gaurav/scribephplibs --gen php path_to_scribe_source/if/scribe.thrift

Step 3: Generate client library for fb303
thrift -o /home/gaurav/scribephplibs --gen php path_to_thrift_source/contrib/fb303/if/fb303.thrift

Step 4: By default Thrift will place the generated files in a folder named gen-php. But Scribe client will expect them in folder named packages, so we need to rename that folder.

mv /home/gaurav/scribephplibs/gen-php /home/gaurav/scribephplibs/packages

A sample PHP client script (ScribeLogger.php):

<?php

class ScribeLogger {

public static function Log($msg, $category){

// Set this to where you have the Scribe and Thrift libary files.
$GLOBALS['THRIFT_ROOT'] = '/home/gaurav/scribephplibs';

// Include all of the lib files we need.
require_once $GLOBALS['THRIFT_ROOT'].'/packages/scribe/scribe.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';

// A message in Scribe is made up of two parts: the category (a key) and the actual message.
$message = array();
$message['category'] = $category;
$message['message'] = $msg;

// Create a new LogEntry instance to hold our message, then add it to a new $messages array (we can submit multiple messages at the same time if we want to).

$entry = new LogEntry($message);
$messages = array($entry);

$socket = new TSocket('localhost', 1463, true);
$transport = new TFramedTransport($socket);
$protocol = new TBinaryProtocol($transport, false, false);

$scribeClient = new scribeClient($protocol, $protocol);

try {
$transport->open();
$scribeClient->Log($messages);
$transport->close();
}catch (TException $e) {
echo $e->getMessage();
}
}
}

ScribeLogger::Log("This is a test message.", 'TEST');

?>

Running the script:

php ScribeLogger.php

Now go to the /var/log/scribelogs folder, there you will see a new folder created named TEST. Scribe creates a new folder for each category. Each of these folder may contain multiple files versioned like Category_0000, Category_0001 etc and one symlink Category_current pointing to the current file scribe is writing to for that particular category.

Monday, May 23, 2011

Centralized Logging

We all know that logging is necessary and we do log a lot of different types of data on a regular basis. Data like user transactions, customer behaviour, machine behaviour, security threats, fraudulent activities etc etc. Historically, logs were mostly a tool for troubleshooting problems. But more recently, they have become important for network and system performance optimization, recording user actions, providing helpful data for investigating suspicious activity, and assisting with proactive monitoring of the environment. In many cases, having log data easily available can provide early warnings about problems before they go out of control.

There are three broad areas where logging is helpful namely troubleshooting, resource tracking, and security.

Troubleshooting:

Logging helps in troubleshooting, finding and fixing problems. The event logs are usually the best source of information for determining whether a system or network is experiencing problems. Different events such as a disk space filling to capacity, or the failure of a necessary piece of equipment, failure of a driver to load, or the detection of an IP address conflict can be recorded in the event logs. Event logs also helps in reporting diagnostic information for background processes.

Resource Tracking and Monitoring applications:

Logging helps in resource tracking, monitoring and improving service levels. It helps in providing real time insights of the system. Information on the capacity and usage of system resources should be logged. Any type of system metric that can change over time should be reported and logged. These metrics may include the frequency of users on the system, maximum number of users, the duration of the use of specific applications, the amount of available disk space crossing a threshold, memory usage, DB resources, the load on the system crossing a threshold, the number of processes running on the system at any given time etc. All these logged data will help you tune your systems before disaster strikes.

Security:

Logging is also a very important part of security of the system. It helps in mitigating security exposures and risks. It is impossible to make a system 100% secure. There are always some security flaws that can be exploited, and unfortunately the greatest risks to security are the human users themselves. If illegal access to a system cannot be completely prevented, then at least they should be recorded and tracked. These logs will help in discovering the potential problems or signs of problems and will help in resolving them.


Why (benefits of) Centralized Logging:

Centralized logging (logging all data in a Central log server or repository) provides a number of benefits than logging on local servers.

  • All of the logs are in one place, this makes things like searching through logs and analysis across multiple servers easier than bouncing around between boxes. Greatly simplifying log analysis and correlation tasks.
  • It helps in having the answers to "why" quickly and accurately. All your logs are in one location and you can quickly access them and find the trouble.
  • Suppose your system is down or overloaded and unable to tell you what happened. If you have remote copies of all your system logs you can look at exactly what's been going-on on that system.
  • Local logs from the server may be lost in the event of an intrusion or system failure. But by having the logs elsewhere you at least have a chance of finding something useful about what happened.
  • It reduces disk space usage and disk I/O on core servers that should be busy doing something else.
  • Log processing and log rotation mechanism can also be centralized, if any.
  • Centralized logging can provide clues for making things better.
In my next post I'll try to explain steps to install scribe and write a scribe client in PHP for logging messages to a Central log server.

Thursday, March 10, 2011

Upgrade Samsung galaxy 3 from Android Eclair to Froyo

I bought a samsung Galaxy 3 phone few months back. It came with android 2.1 (Eclair). Soon after I got my phone I heard a news that samsung is going to release galaxy 3 froyo update and it will reach to the users within 2 to 3 months. After waiting for more than 5 months I finally decided to upgrade the OS on my phone by myself. So I searched on the internet and got some steps to do so. I followed those steps carefully and I was able to successfully upgrade my Galaxy 3 to froyo. Three things that I really liked after upgrading my phone:

1. My phone became faster dont know by how much but I can feel the difference.
2. Now I can move my apps to SD card. This is really cool, now I have much more space available in my phone memory.
3. Ability to play FLASH content.

Here are the steps that I followed to install froyo on my Galaxy 3.

Prerequisites:

1. You will need Windows environment for this. I used Windows 7.
2. ODIN Multi Downloader, you can download it from http://www.multiupload.com/LFJRACWNQ2
3. OPS file, you can download it from http://www.multiupload.com/SOMN2EWF0J
4. Froyo 2.2 update, you can download it from http://www.multiupload.com/IVMSZNKNLT (password: samfirmware.com)

Now the steps:

1. Take a backup of your phone. BACKUP YOUR PHONE'S DATA, CONTACTS, MESSAGES and other things important for you. This step is necessary and you may need this data in case you encounter some problems while upgrading. You can use Samsung Kies for taking the backup.

2. Note down your phone's firmware information, in case you need to restore later. You can check your firmware by dialling *#1234# in your phone.

3. Install and run ODIN Multi-Downloader.
  • Select One Package, Auto Reboot, Protect OPS options.
  • Select the OPS file you downloaded. Click the OPS button. Select file "apollo_0531.ops".
  • Now go to the bottom section "Select Integrate Package - Check One Package Option" and click on 'One Package' button and select the Froyo 2.2 tar file(for example I5800XXJPB.tar) you downloaded.
4. Factory reset your phone (Its Optional at this stage, you can do this later also).

5. Switch off your phone. Remove the battery, so it doesn't boot up automatically.

6. Hold the Home + MENU + Volume Up + Power buttons simultaneously. Until the screen shows the android icon with probably the "Force Upload by Button Pressing" words. (If those combination buttons don't work, try "Volume Down" or "Volume Up + Down" instead).

7. When the Force Upload screen appears, connect your phone to the computer with the USB cable provided and let windows detect the drivers for the phone and install them automatically.

8. Once the drivers are loaded, ODIN will detect the phone by showing it in one of the boxes in COM Port Mapping section. Now click Start.

9. The phone will load data from the ODIN application and will reboot itself. It takes a while (5-10mins). Do not uplug the phone until you see the Home Screen and have successfully unlocked the phone.

10. After reboot, run a factory reset on your phone and reboot it again. It is very very important that you hard reset your phone. Otherwise, the phone may have problems.

Tuesday, February 15, 2011

An Intro to Phing

What is Phing

Phing is a build system for php projects, like Ant is for java. Moreover phing is based on Apache Ant. Like any traditional build systems you can do a lot of things with Phing. Phing uses XML build files similar to ant build files. Phing is useful in cases where you have to write custom scripts for testing, packaging and deploying your application code. Phing provides a number of built in tasks or operational modules, and it also allows you to add custom tasks as per your need.

In short Phing provides the following features:
  • Simple XML buildfiles
  • Rich set of provided tasks
  • Easily extendable via PHP classes
  • Platform-independent: works on UNIX, Windows, MacOSX
  • No required external dependencies

How Phing Works

Phing uses XML buildfiles, these buildfiles contain a description of the things to do. A buildfile contains one or more targets. These targets contain the actual commands to execute or perform (e.g. create a directory, delete a directory, copy a file from one directory to another, create a archive file etc.). In order to use Phing, you need to write a buildfile first and then you need to run phing, specifying the target in your buildfile that you want to execute.

$ phing [-f buildfile.xml] [target]

If name of a buildfile is not given, then Phing will search for a file name build.xml in the directory from where the Phing command is run. You can also define default target in your buildfile. In that case you can skip the target also while executing the phing command.

Installation

Phing requires PHP version 5.2 or above compiled with --with-libxml2 at minimum. For advanced usage of Phing you may need other libararies and softwares. For a detailed list of dependencies you can visit this page.

The easiest way to install phing is by using the PEAR installer. Just run the following commands from the command line:
$ pear channel-discover pear.phing.info
$ pear install phing/phing

A sample Phing build file:

Given below is a small simple build file that I used for one of my projects. I have added comments before each line or section to explain what is that doing.

<?xml version="1.0"  encoding="UTF-8" ?>

<!-- project is the root element, a buildfile shoud start with project tag after the document prolog  -->
<!-- name : name of the project -->
<!-- basedir : the base project directory -->
<!-- default : the default target to build if phing is executed without specifying any target -->

<project name="applyservice" basedir="." default="main">

    <!-- define some project properties like package name, build directory and source directory -->
    <property name="package"  value="${phing.project.name}" override="true" />
    <property name="builddir" value="./build/" override="true" />
    <property name="srcdir"   value="${project.basedir}" override="true" />

    <!-- You can define set of all files to include/exclude in the build, give fileset an id for later reference -->
    <fileset dir="${srcdir}" id="allfiles">
        <include name="**/*.php" />
        <include name="**/*.yml" />
        <exclude name="**/*.log" >
    </fileset>

    <!-- Target: prepare, this will create a build directory if not already there -->
    <target name="prepare" description="initial setup">
        <echo msg="Creating build directory....." />
        <mkdir dir="./build" />
    </target>

    <!-- Target: main, Its our default target as defined in project tag -->
    <!-- A target can depend on other targets. You can define dependencies using depends attribute -->
    <!-- Our main target is dependent on prepare target to complete first -->
    <!-- You can define multiple targets in depends attribute using a comma separated list like A, B, C -->

    <target name="main" description="main target" depends="prepare">
        <echo msg="Copying files to build directory..." />
        <!-- copy all source files from the fileset defined above to the build directory -->
        <copy todir="${builddir}">
            <fileset refid="allfiles" />
        </copy>
       
        <!-- create a archive file of all the files in build directory using gzip compression -->
        <echo msg="Creating archive from build..." />
        <tar destfile="./build/build.tar.gz" compression="gzip">
            <fileset dir="./build">
                <include name="**" />
            </fileset>
        </tar>

        <echo msg="Build completed successfully........" />
    </target>

    <!-- Traget: rebuild, this target is for rebuilding, its first deletes previously build files and creates a new one by calling main target. -->
    <target name="rebuild" description="rebuilds this package">
    <echo msg="Deleting old build directory........" />
        <delete dir="${builddir}" />
        <phingcall target="main" />
    </target>
</project>

For a detailed list of all available options. Please go through the Phing User Guide.

Wednesday, February 9, 2011

Exploring timeout variables in Mysql

Few days back I was checking one of my mysql server's settings. I found that there are a number of timeout settings. Since I am using mysql for last few years, I am well aware of wait_timeout and connect_timeout variables. As if you are using mysql in production then in most of the cases you have to tune these two variables. But Mysql provides a number of other timeout variables also. 
 
If you run show variables like '%timeout' query then you will get a number of different timeout variables, in my case I got 9 such variables.

mysql> show variables like '%timeout';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| connect_timeout            | 10    |
| delayed_insert_timeout     | 300   |
| innodb_lock_wait_timeout   | 50    |
| interactive_timeout            | 28800 |
| net_read_timeout               | 30    |
| net_write_timeout          | 60    |
| slave_net_timeout          | 3600  |
| table_lock_wait_timeout    | 50    |
| wait_timeout               | 5     |
+----------------------------+-------+
9 rows in set (0.00 sec)

Why are there so many timeout variables, and what are their purpose?

MySQL uses different timeout variables at different stages. When a connection is just being established connect_timeout is used. When server waits for another query to be sent to it wait_timeout (or interactive_timeout for applications which specified they are interactive during connection). If query is being read or result set is being sent back, net_read_timeout and net_write_timeout are used. innodb_lock_wait_timeout is used with Innodb tables in case getting locks on table rows. delayed_insert_timeout is used in case you are using delayed insert queries. slave_net_timeout is used in case of replication when a slave is reading data from the master.

Lets look into some more details about these variables:

connect_timeout:
The number of seconds the mysqld server waits for a connect packet before responding with Bad handshake. As of MySQL 5.1.23 the default value is 10 seconds and before that it was 5 seconds. Increasing the connect_timeout value might help if clients frequently encounter errors of the form Lost connection to MySQL server.

delayed_insert_timeout:
You can delay insert queries from happening until the table is free by using the delayed hint in your SQL statement. For example:
INSERT DELAYED INTO table (id) VALUES (123);

The above SQL statement will return quickly, and mysql server will store the insert statement in a memory queue until the table you are inserting into is free from reads. The downside to this is that you don't really know how long its going to take for your INSERT to happen. INSERT DELAYED handler thread in Mysql server will wait for delayed_insert_timeout seconds before terminating.

innodb_lock_wait_timeout:
The timeout in seconds an InnoDB transaction may wait for a row lock before giving up. innodb_lock_wait_timeout applies to InnoDB row locks only. A MySQL table lock does not happen inside InnoDB and this timeout does not apply to waits for table locks. InnoDB does detect transaction deadlocks in its own lock table immediately and rolls back one transaction. The default value is 50 seconds. A transaction that tries to access a row that is locked by another InnoDB transaction will hang for at most this many seconds before issuing the following error:
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

interactive_timeout:
The number of seconds the server waits for activity on an interactive connection before closing it. An interactive client is defined as a client that uses the CLIENT_INTERACTIVE option to mysql_real_connect(). interactive_timeout is the amount of seconds during inactivity that MySQL will wait before it will close a connection for a interactive connection.

net_read_timeout:
The number of seconds to wait for more data from a connection before aborting the read. Before MySQL 5.1.41, this timeout applies only to TCP/IP connections, not to connections made through Unix socket files, named pipes, or shared memory. When the server is reading from the client, net_read_timeout is the timeout value controlling when to abort. net_read_timeout rarely becomes the problem unless you have extremely poor network, because in most cases query is generated and sent as single packet to the server and application can’t switch doing something else and leaving server with partial query received.

net_write_timeout:
When the server is writing to the client, net_write_timeout is the timeout value controlling when to abort. It defines the number of seconds to wait for a block to be written to a connection before aborting the write. Before MySQL 5.1.41, this timeout applies only to TCP/IP connections, not to connections made using Unix socket files, named pipes, or shared memory. If you do not fetch data for long enough MySQL Server may think client is dead and close connection. This well may happen if you need long processing for each row or have long periodic data flushes. Also, result set comes back in multiple pieces and if you're using mysql_use_result you can do any work between fetches, which potentially could take a lot of time.

slave_net_timeout:
The number of seconds to wait for more data from the master before the slave considers the connection broken, aborts the read, and tries to reconnect. The first retry occurs immediately after the timeout. The interval between retries is controlled by the MASTER_CONNECT_RETRY option for the CHANGE MASTER TO statement or --master-connect-retry option, and the number of reconnection attempts is limited by the --master-retry-count option. The default is 3600 seconds.

table_lock_wait_timeout:
As per the mysql manual this variable is not used.

wait_timeout:
The number of seconds the server waits for activity on a noninteractive connection before closing it. This timeout applies only to TCP/IP and Unix socket file connections, not to connections made using named pipes, or shared memory. On thread startup, the session wait_timeout value is initialized from the global wait_timeout value or from the global interactive_timeout value, depending on the type of client (as defined by the CLIENT_INTERACTIVE connect option to mysql_real_connect()). Setting a value too low may cause connections to drop unexpectedly. Setting a value too high may cause stale connections to remain open, preventing new access to the database. For wait_timeout, this value should be set as low as possible without affecting availability and performance.


Monday, February 7, 2011

Understanding MySQL Persistent Connections with mysql_pconnect()

What is mysql_pconnect():

The main purpose of using mysql_pconnect() function is to maintain a persistent connection to the mysql server. A persistent connection is a connection that do not get closed even after the excecution of the script is over which opened the connection. Even function mysql_close() can't close a persisitent connection. In contrast a normal connection opened using mysql_connect() gets closed either by mysql_close() or at the end of the scripts execution.

Why mysql_pconnect():

In one word the answer is "Efficiency". Persistent connections are good if the overhead to create a link to your db server is high. This overhead may be high due to various reasons. Persistent connections can help you considerably if the connection overhead is high.

How mysql_pconnect() works:

The most popular method to run PHP is to run it as a module in a multiprocess web server like Apache. A multiprocess server typically has one parent process which coordinates with a set of child processes. These child processes actually do the work of serving up web pages. When a request comes in from a client, it is handed over to one of the children that is free. This means that when the same client makes a second request to the server, it may be served by a different child process than the first time.

When opening a persistent connection, the function would first try to find a persistent link that is already open with the same host, username and password combination. If one is found, instead of opening a new connection an identifier for it will be returned. It causes the child process to simply connect only once for its entire lifespan, instead of every time it processes a page that requires connecting to the same db server. Every child that opened a persistent connection will have its own open persistent connection to the db server. For example, if you had 20 different child processes that ran a script that made a persistent connection to your db server, you would have 20 different connections to the db server, one from each child.

Issues with mysql_pconnect():
  • Persistent database connections don't necessarily reflect subsequent privilege changes.
  • Be very careful when using persistent connections and temporary tables on MySQL. With normal connections temporary tables are visible only to the current connection, but if you have a persistent connection the temporary tables will be visible to everybody sharing the same persistent connection. This can lead to major trouble.
  • Don't use mysql_pconnect() in situations where multiple MySQL servers are running on multiple ports of the same host. The connection pooling algo in php apparently only checks for the host, username and password combination but not the port. Therefore, if you use the same host, username and password but a different port, you might get a connection that is connected to a different port than the one you asked for.
  • Do not use transactions with persistent connections.  If your script stops or exits for any reason, your transaction will be left open and your locks will be left on.  You have to reset MySQL to release them. They won't rollback automatically on error, like they ought to. When you restart the script, you'll get a new connection, so you can't rollback or commit for the previous script.
  • You should be very careful when using LOCK TABLES with persistent connections. If the script terminates before the UNLOCK TABLES is executed, the the table(s) will stay locked, and very likely hang future scripts.

Things to remember:
  • Persistent connections were designed to have one-to-one mapping to regular connections. That means that you should always be able to replace persistent connections with non-persistent connections, and it won't change the way your script behaves. It may change the efficiency of the script, but not its behavior.
  • Any script with a start transaction, rollback, or commit SQL statement should use regular, not persistent connections.
  • Use totally random temporary table names when using persistent connections to avoid major problems.
  • Make damn sure that max connections limit in your my.cnf has a limit with a few more connections than your httpd.conf has for number of apache children. Less MySQL connections than apache children means some apache children will be starved for db.
  • Leave a few extra mysql connections, so that in case of a problem it can leave you with the ability to log in from shell to diagnose/fix it. Otherwise, you will have to bring down all of apache to get into your database.
  • You can also use register_shutdown_function() to register a simple cleanup function to unlock your tables or roll back your transactions. But it's better to avoid the problem entirely by not using persistent connections in scripts which use table locks or transactions.
  • Instead of use wait_timeout, you can set interactive_timeout to short period of time (for ex. 20 sec.) this is a lot better solution in apache + mysql environment than wait_timeout.

Sunday, February 6, 2011

ऊँचाई

ऊँचे पहाड़ पर,
पेड़ नहीं लगते,
पौधे नहीं उगते,
न घास ही जमती है।

जमती है सिर्फ बर्फ,
जो, कफ़न की तरह सफ़ेद और,
मौत की तरह ठंडी होती है।
खेलती, खिलखिलाती नदी,
जिसका रूप धारण कर,
अपने भाग्य पर बूंद-बूंद रोती है।

ऐसी ऊँचाई,
जिसका परस
पानी को पत्थर कर दे,
ऐसी ऊँचाई
जिसका दरस हीन भाव भर दे,
अभिनंदन की अधिकारी है,
आरोहियों के लिये आमंत्रण है,
उस पर झंडे गाड़े जा सकते हैं,

किन्तु कोई गौरैया,
वहाँ नीड़ नहीं बना सकती,
ना कोई थका-मांदा बटोही,
उसकी छाँव में पलभर पलक ही झपका सकता है।

सच्चाई यह है कि
केवल ऊँचाई ही काफ़ी नहीं होती,
सबसे अलग-थलग,
परिवेश से पृथक,
अपनों से कटा-बँटा,
शून्य में अकेला खड़ा होना,
पहाड़ की महानता नहीं,
मजबूरी है।
ऊँचाई और गहराई में
आकाश-पाताल की दूरी है।

जो जितना ऊँचा,
उतना एकाकी होता है,
हर भार को स्वयं ढोता है,
चेहरे पर मुस्कानें चिपका,
मन ही मन रोता है।

ज़रूरी यह है कि
ऊँचाई के साथ विस्तार भी हो,
जिससे मनुष्य,
ठूँठ सा खड़ा न रहे,
औरों से घुले-मिले,
किसी को साथ ले,
किसी के संग चले।

भीड़ में खो जाना,
यादों में डूब जाना,
स्वयं को भूल जाना,
अस्तित्व को अर्थ,
जीवन को सुगंध देता है।

धरती को बौनों की नहीं,
ऊँचे कद के इंसानों की जरूरत है।
इतने ऊँचे कि आसमान छू लें,
नये नक्षत्रों में प्रतिभा की बीज बो लें,

किन्तु इतने ऊँचे भी नहीं,
कि पाँव तले दूब ही न जमे,
कोई काँटा न चुभे,
कोई कली न खिले।

न वसंत हो, न पतझड़,
हो सिर्फ ऊँचाई का अंधड़,
मात्र अकेलेपन का सन्नाटा।

मेरे प्रभु!
मुझे इतनी ऊँचाई कभी मत देना,
ग़ैरों को गले न लगा सकूँ,
इतनी रुखाई कभी मत देना।

-अटल बिहारी वाजपेयी

Thursday, February 3, 2011

Interesting GTalk status messages

If electricity comes from electrons, does morality come from morons?

Save earth, this is the only planet where you get girls :)

The problem doesn't lie with me.... the problem lies with your expectations.

Be nice to nerds, chances are you'll end up working with one.

SEX is not the answer. SEX is the question and YES is the answer!!

Heaven won't take me and hell's afraid I'll take over.

A man in love is incomplete until he has married. Then he’s finished.

I can only please one person per day. Today isn't your day...and tomorrow don't look good either.

I believe in looking reality straight in the eye and denying it.

I was the best man at the wedding. If I'm the best man, why is she marrying him?

Great People like us work on the principle of rockets, Not that we aim for the skies but we dont start performing unless ur ass is on fire

Why do couples hold hands during their wedding? It's a formality just like two boxers shaking hands before the fight begins!

REASONS behind the REASONS are the REASONS due to REASONS.

Chuck Norris can parse HTML with regex.

Linux: the operating system with a CLUE... Command Line User Environment.

Reducing the number of atheists in India since 1989 - Sachin Tendulkar!

People who don't like their beliefs being laughed at shouldn't have such funny beliefs in the first place!

I am not HANDSOME guy, but i can give my HAND to SOME guy who needs my help - ABDUL KALAM.

Prediction is very difficult, especially about the future.

To kiss a miss is strategy and miss a kiss is tragedy.

Surely it’s no coincidence that the word "listen" is an anagram of the word "silent".

I was depressed last night so I called Lifeline. Got a call centre in Pakistan. Told them I felt suicidal. They got all excited and asked if I could drive a truck.

There are two types of people , those who divide people into two types, and those who don’t.

This thing, is thing, a thing, good thing, way thing, to thing, keep thing, an thing, idiot thing, busy thing, for thing, 20 thing, seconds thing! … Now read without
the word thing.

Both pessimist and optimist contributed to world. Optimist invented aero plane and pessimist , the parachute.

People are made to be Loved & Things are made to be Used. The confusion in the World is, People are being Used & Things are being Loved.

I want to change the world but God won't give me the Source Code ... :(

If you are not living life on the edge.. You are taking too much space.


You can also view some interesting quotes here.

Some useful settings and plugins for VI Editor : Part 5 - Comment a code block

In the part 5 of this series, I am going to explain how to Comment/Uncomment a code block in VIM editor.

Method 1

Add folowing lines in your ~/.vimrc file

func! PhpUnComment() range
    let l:paste = &g:paste
    let &g:paste = 0

    let l:line        = a:firstline
    let l:endline     = a:lastline

    while l:line <= l:endline
        if getline (l:line) =~ '^\s*\/\/.*$'
            let l:newline = substitute (getline (l:line), '^\(\s*\)\/\/ \(.*\).*$', '\1\2', '')
        else
            let l:newline = substitute (getline (l:line), '^\(\s*\)\(.*\)$', '\1// \2', '')
        endif
        call setline (l:line, l:newline)
        let l:line = l:line + 1
    endwhile

    let &g:paste = l:paste
endfunc

vnoremap <buffer> <C-c> :call PhpUnComment()<CR>

Now In visual mode select the lines to comment/uncomment and press <CTRL-c>. This will comment all the uncommented lines, and uncomment all the commented lines.


Method 2

Using blockwise visual mode (CTRL-V) select the block to be commented.
Press I (capital i) and write the text you want to prepend to each line of the selected block (e.g. // or #). Then press ESC and the text will be inserted to the left of each line of the selected block.


For other posts related to VIM settings and plugins you can also visit

Some useful settings and plugins for VI Editor : Part 1 - General settings
Some useful settings and plugins for VI Editor : Part 2 - Autocompletion
Some useful settings and plugins for VI Editor : Part 3 - PHP documentor
Some useful settings and plugins for VI Editor : Part 4 - CodeSniffer Integration

Wednesday, February 2, 2011

Some good Quotes

Here are some of my favourite quotes, I collected them over a period of time, not in any particular order.

Not all chemicals are bad. Without chemicals such as hydrogen and oxygen, for example, there would be no way to make water, a vital ingredient in beer. -Dave Barry

Imagination was given to man to compensate him for what he is not; a sense of humor to console him for what he is. -Francis Bacon

It takes a lot of experience for a girl to kiss like a beginner. -Ladies Home Journal

Seven days without laughter make one weak. - Joel Goodman

Give a man a fish and he will eat for a day. Teach a man to fish and he will sit in a boat all day drinking beer.

If raising children was going to be easy, it never would have started with something called labour!

Women will never be equal to men until they can walk down the street with a bald head and a beer gut, and still think they are sexy.

10 Terrorists came by Boat, 539 terrorists will come by your vote, vote Carefully

You shouldn't say it is not good. You should say, you do not like it; and then, you know, you're perfectly safe.

Always and never are two words you should always remember never to use.

Children: You spend the first 2 years of their life teaching them to walk and talk. Then you spend the next 16 years telling them to sit down and shut-up.

The main purpose of holding children's parties is to remind yourself that there are children more awful than your own.

I learned law so well, the day I graduated I sued the college, won the case, and got my tuition back.  ~Fred Allen

We judge others by their behavior. We judge ourselves by our intentions.

If we were meant to talk more & listen less, we’d have two mouths & one ear.

The problem is never how to get new, innovative thoughts into your mind but how to get the old ones out.

People ask you for criticism but they only want praise.

Brains are like mouths; when empty they blather, when full they digest. - Pete Harrison

If you make people think they're thinking, they'll love you, but if you really make them think, they'll hate you.

Common sense is the most widely shared commodity in the world, for every man is convinced that he is well supplied with it. - Rene Decartes

Learn from the mistakes of others. You can't live long enough to make them all yourself. -

Being successful is like being pregnant , everyone congratulates you but no one knows how much you got screwed to get there.

A clever person solves a problem. A wise person avoids it. ~ Einstein

God gives and forgives. Man get and forgets.

Men are like bank accounts. Without a lot of money they don't generate a lot of interest.

Good judgment comes from bad experience and a lot of that comes from bad judgment.

The severity of the itch is inversely proportional to the ability to reach it.

Time is what keeps everything from happening at once.

Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Programmers combine theory and practice: Nothing works and they don't know why.

To err is human. To keep erring is inhuman.

Inside every large program, there is a small program trying to get out.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

There are two ways to write error-free programs; only the third works.

If debugging is the process of removing bugs, then programming must be the process of putting them in.

If a million monkeys were typing on computers, one of them will eventually write a Java program. The rest of them will write Perl programs.

The key to understanding recursion is to begin by understanding recursion. The rest is easy.

In software, the chain isn't as strong as its weakest link; it's as weak as all the weak links multiplied together.

I'm so poor that I can't afford to pay attention.

There are two rules for success: 1.) Don't tell all you know.

Art is work, to sell it is art.

God made man before woman to give him time to think of an answer for her first question.

A fine is a tax for doing wrong. A tax is a fine for doing well.

All desirable things in life are either-- illegal, banned, expensive or married to someone else!

Bad is never good until worse happens.

It is the woman who chooses the man who will choose her.

Time you enjoy wasting, is not wasted.

I find television very educational. Every time someone switches it on I go into another room and read a good book.

Intelligence is like underwear, everyone has it but you don't have to show it off.

The politicians divide the country by Words, while the terrorists unite it with bullets!!!
       
Radar spelled backwards is radar. They get you coming and going.

When you ASSUME, it makes an ASS out of U and ME.

The closest to perfection a person ever comes is when he fills out a job application.

API design is like sex: make one mistake and support if for the rest of your life.

You can also view some interesting Gtalk status messages here.

Some useful settings and plugins for VI Editor : Part 4 - CodeSniffer Integration

In this part 4 of this series, I am going to discuss CodeSniffer and its integration with VIM editor.

What is CodeSniffer

CodeSniffer is a code analysis tool. This tool allows you to check your php code against a coding standard. It allows you to apply a set of rules (standard) to your source code. These rules can be used to detect common programming errors. It can also be used to define a set of coding standards for your project.

A coding standard in CodeSniffer is a collection of sniff files. Each sniff file checks one part of the coding standard only. CodeSniffer comes with a set of coding standards already defined. These are:

  • MySource
  • PEAR
  • PHPCS
  • Squiz
  • Zend

The default coding standard used by CodeSniffer is the PEAR coding standard. By integrating CodeSniffer into vim you can get the list of violations in a separate error window.

How to install CodeSniffer

You can install CodeSniffer using pear.

$ sudo pear install PHP_CodeSniffer

This will download and install the CodeSniffer from the PEAR repository.

How to run CodeSniffer

To run CodeSniffer execute

$ phpcs --standard=<standard> <path to file or directory>

The output will be a list of errors and warnings in your code as per your coding standard. 

CodeSniffer Integration with VIM editor

Add folowing lines in your ~/.vimrc file

function! RunPhpcs()
    let l:filename=@%
    let l:phpcs_output=system('phpcs --report=full --standard=PEAR '.l:filename)
    let l:phpcs_list=split(l:phpcs_output, "\n")
    unlet l:phpcs_list[0]
    cexpr l:phpcs_list
    cwindow
endfunction

set errorformat+=\"%f\"\\,%l\\,%c\\,%t%*[a-zA-Z]\\,\"%m\"
command! Phpcs execute RunPhpcs()

Now you can run CodeSniffer for the current file using command

:Phpcs

After that run

:cope

This will open a window with a list of all the errors and warnings in your code as per your coding standard. Alternatively you can use quickfix also to navigate through the error window. For that see help quickfix in the vim help.


For other posts related to VIM settings and plugins you can also visit

Some useful settings and plugins for VI Editor : Part 1 - General settings
Some useful settings and plugins for VI Editor : Part 2 - Autocompletion
Some useful settings and plugins for VI Editor : Part 3 - PHP documentor
Some useful settings and plugins for VI Editor : Part 5 - Comment a code block

Tuesday, February 1, 2011

Some useful settings and plugins for VI Editor : Part 3 - PHP documentor

In the part 1 and part 2 I talked about some useful vim settings and how to enable autocompletion in vim. In this part I am going to talk about a very useful plugin written by Tobias Schlitt, for generating comment blocks or docblocks for php scripts.

We do document our code with proper inline comments. To make this task easier, Tobias Schlitt wrote a VIM plugin which automatically lookup some characteristics of the item you want to document and creates a docblock skeleton for it. This plugin provides functions to generate documentation blocks for your PHP code. The script currently documents:

- Classes
- Methods/Functions
- Attributes

This plugin supports PHP 4 and 5 syntax elements. It also allows you to define default values for phpDocumentor tags like @version, @author, @license and so on. For function/method parameters and attributes, the script tries to guess the type as good as possible from PHP5 type hints or default values (array, bool, int, string etc).

Steps to install this plugin

  • Download the plugin file (php-doc.vim) from here.
  • After downloading just place the php-doc.vim file in ~/.vim/plugin/ folder
  • Add folowing lines in your ~/.vimrc file

        source ~/.vim/plugin/php-doc.vim
        inoremap <c-p> <esc>:call PhpDocSingle()<cr>i
        nnoremap <c-p> :call PhpDocSingle()<cr>
        vnoremap <c-p> :call PhpDocRange()<cr>

This includes the script and maps the combination <ctrl> + p to the doc functions.

How to use

Just hit <ctrl>+p on the line where the element to document resides and the doc block will be created directly above that line.

For other posts related to VIM settings and plugins you can also visit

Some useful settings and plugins for VI Editor : Part 1 - General settings
Some useful settings and plugins for VI Editor : Part 2 - Autocompletion
Some useful settings and plugins for VI Editor : Part 4 - CodeSniffer Integration
Some useful settings and plugins for VI Editor : Part 5 - Comment a code block

Some useful settings and plugins for VI Editor : Part 2 - Autocompletion

In the first part I talked about some useful settings, In this part I am going to explain how to use Autocompletion with VI editor.

To set autocompletion on, add the following settings in .vimrc file in your home directory or alternatively you can add them to /etc/vim/vimrc file, that will enable these settings for all users on the system.
   
set ofu=syntaxcomplete

Vim has autocomplete functionality for all common web development contexts. Now, If you are editing a file in vim which ends with .php, .html, .css, .js, .sql, .rb, or .py. Vim's "omnifunc" feature combined with its built-in autocomplete feature will show autocomplete options specific to the corresponding language. You can type a few words/chars and can press (in insert mode) following commands to autocomplete words from different context, vim shows a box below the cursor containing the options, with the first entry highlighted:

CTRL-X_CTRL-O - To search matching words in coding language manual
CTRL-X_CTRL-L - To search matching words in whole lines
CTRL-X_CTRL-N - To search matching words in the current file
CTRL-X_CTRL-K - To search matching words in dictionary
CTRL-X_CTRL-I - To search matching words in the current and included files
CTRL-X_CTRL-F - To search matching words in file names
CTRL-X_CTRL-] - To search matching words in tags
CTRL-N        - To search matching words in all of above

Autocompletion using the TAB key

Using above commands are little difficult. For easy use we can remap them to TAB key. Add the following function to your vimrc file. This function determines, wether we are on the start of the line text (then tab indents) or if we want to try autocompletion

func! InsertTabWrapper()
    let col = col('.') - 1
    if !col || getline('.')[col - 1] !~ '\k'
        return "\<tab>"
    else
        return "\<c-p>"
    endif
endfunction

Remap the TAB key to select action with InsertTabWrapper, add the following line in vimrc file

inoremap <buffer> <tab> <c-r>=InsertTabWrapper()<cr>

Now, when you will press a TAB key, it will check wheather you are on the start of a line, if yes it will indent your code, otherwise it will try to show you autocompletion window.


For other posts related to VIM settings and plugins you can also visit

Some useful settings and plugins for VI Editor : Part 1 - General settings
Some useful settings and plugins for VI Editor : Part 3 - PHP documentor
Some useful settings and plugins for VI Editor : Part 4 - CodeSniffer Integration
Some useful settings and plugins for VI Editor : Part 5 - Comment a code block

Monday, January 31, 2011

Some useful settings and plugins for VI Editor : Part 1 - General settings

Here are some useful settings, commands and plugins to make VI editor more developer friendy (some settings are there only for PHP users).

You need to add the following settings in .vimrc file in your home directory or alternatively you can add them to /etc/vim/vimrc file, that will enable these settings for all users on the system.

"For highlighting the code add
 syntax on

"Show a ruler at the bottom of screen
  set ruler
  set laststatus=2

"Show matching brackets.
  set showmatch

"To do a case insesnitive search.
  set ignorecase

"To replace TAB with shift of 4 spaces.
  set tabstop=4
  set shiftwidth=4
  set expandtab

"Show line numbers.
 set number

"Jump 5 lines when running out of the screen
 set scrolljump=5

"Indicate jump out of the screen when 3 lines before end of the screen
 set scrolloff=3

"Set indentation rules
 setlocal autoindent
 setlocal smartindent

"Correct indentation after opening a docblock and automatic * on every line
  setlocal formatoptions=qroct

"Append ending brackets whenever open a bracket
 inoremap [ []
 inoremap ( ( )

"Spell Checking
 set spell spelllang=en_us

 Now use can move to next or previous misspelled word using ]s and [s commands, also you can use following
 zg   Add word under cursor as good word
 z=   Suggest corrections for the word under cursor

For other posts related to VIM settings and plugins you can also visit

Some useful settings and plugins for VI Editor : Part 2 - Autocompletion
Some useful settings and plugins for VI Editor : Part 3 - PHP documentor
Some useful settings and plugins for VI Editor : Part 4 - CodeSniffer Integration
Some useful settings and plugins for VI Editor : Part 5 - Comment a code block

Wednesday, January 26, 2011

MongoDB vs MySQL: speed test part 2, Select queries

In the first part of this post I compared the performance of Insert operations for mongoDB and mysql. In this part I tried to compare the performance for different select operations. Test setup is same as used in the part 1.

1. Selects on an indexed column with different limit clauses

To check the performance of selects on an indexed column with limit clause, a number of queries with different limit clauses were executed on both the databases.

sample mysql query:
SELECT ID, NAME, BIRTH_DT, CONTACT_ADDRESS, CITY, TOTAL_EXP, ENTRY_DT, PROFILE, SUMMARY from db.resume where ID > 1000 limit 100000, 1000

sample MongoDB query:
$collection->find(array('ID' => array(':gt'=>1000)))->skip(100000)->limit(1000);













Start LimitTotal Records FetchedMySQLMongoDB
0 10000.846 ms0.0710ms
10000010000.903 ms0.0391ms
20000010000.969 ms0.0209ms
30000010001.029 ms0.0889ms
40000010001.058 ms0.0488ms
50000010001.149 ms0.0482ms
60000010001.214 ms0.0460ms
70000010001.170 ms0.0469ms
80000010001.196 ms0.0450ms
90000010001.216 ms0.0460ms

2. Selects on a non indexed column with different limit clauses

sample mysql query:
SELECT ID, NAME, BIRTH_DT, CONTACT_ADDRESS, CITY, TOTAL_EXP, ENTRY_DT, PROFILE, SUMMARY from db.resume where TOTAL_EXP > 5 limit 100000, 1000

sample MongoDB query:
$collection->find(array('TOTAL_EXP' => array(':gt'=>5)))->skip(100000)->limit(1000);








Start LimitTotal Records FetchedMySQLMongoDB
0 10001.133 ms0.0679 ms
10000010001.166 ms0.0469 ms
20000010001.334 ms0.0469 ms
30000010001.293 ms0.0438 ms
40000010002.047 ms0.0450 ms


3. Selects on an indexed column with sorting

sample mysql query:

SELECT ID, NAME, BIRTH_DT, CONTACT_ADDRESS, CITY, TOTAL_EXP, ENTRY_DT, PROFILE, SUMMARY from db.resume where ID > 1000 order by USERNAME asc

sample MongoDB query:

$collection->find(array('ID' => array(':gt'=>1000)))->sort(array("USERNAME"=>1));

Avg time in Mysql : 1.973 sec
Avg time in MongoDB : 0.138 ms

4. Selects with IN clause on an Indexed Column

To check the performance of select queries on an indexed key with IN clause, a number of queries were executed on both the databases and an avearge is taken. Each query had 100 random ID values in the IN clause.

Avg time in Mysql : 4.865 ms
Avg time in MongoDB : 1.570 ms

Its clear from the above results that MongoDB outperformed mysql in each case by a large margin.


MongoDB vs MySQL: speed test part 1, Insert queries

Recently I started exploring NoSQL databases as an alternative for some of our high traffic mysql tables. After going through a number of articles on net, I decided to explore MongoDB. I tried to compare the performance of different database operations (inserts/ different types of selects) in mongoDB and in MySQL. Performance comparison of insert operations are given here.

Test setup:
For testing I used a 3.16 GHz, Intel Xeon CPU with 2 GB of memory and 350 GB of disk.

MySQL:

key_buffer = 128M
sort_buffer_size = 512K
read_buffer_size = 256K
max_allowed_packet = 1M

Table schema:

ID int(11)
NAME varchar(35)
BIRTH_DT date
CONTACT_ADDRESS varchar(150)
CITY int(11)
TOTAL_EXP varchar(5)
ENTRY_DT date
PROFILE varchar(250)
SUMMARY varchar(250)

MongoDB:

For mongo two shard servers, one config server and one mongos were satarted on the same machine with chunk size set to 10.

Sample document:

{"_id" : ObjectId("4ca6cca6a87305c90b000000"),
"ID" : "5839427",
"NAME" : "Gaurav Asthana",
"BIRTH_DT" : "1981-06-29",
"CONTACT_ADDRESS" : "Noida, India",
"CITY" : "19",
"TOTAL_EXP" : "06.10",
"ENTRY_DT" : "2010-11-26",
"PROFILE" : "zxzzzzz zzzzzzzzzz zzzzzzzzzzzzzzzzzz zzzzzzzz",
"SUMMARY" : "abcfsf fsdfs gdgdfg gdfgdh dfghdh dfhdh" }

An index is also created on ID field.

I have created a simple php script to perform the benchmark. This script inserted a total of 15 Lac records both in the mysql and mongodb. I have recorded time for each batch of 100 records that were inserted. So, in total I recorded 15000 readings. The average time taken by both the databases is given below.

Average time per batch of 100 records :

Mysql : 18.77 ms
MongoDB : 5.53 ms

Size on disk:

Mysql : 390 MB
MongoDB : 1.6 GB

In my benchmark, MongoDB came out three times faster that mysql in case of insert queries. But it occupied four times more disk than mysql.

Tuesday, January 25, 2011

Classification of NoSQL Databases

NoSQL databases can be broadly classified as:

1. Distributed vs. Not-distributed databases

Distributed databases take the responsibility of data partitioning (for scalability) and replication (for availability) and do not leave that to the client. Non-distributed databases leaves the responsibility of data partitioning and replication on the clients.

Table 1: Distributed and Non-distributed databases

Distributed
Not Distributed
Amazon Dynamo
Amazon S3
Scalaris
Voldemort
CouchDb (thru Lounge)
Riak
MongoDb
BigTable
Cassandra
HyperTable
HBase
Redis Tokyo
Tyrant
MemcacheDb Amazon
SimpleDb

2. Disk vs. Memory databases

An useful dimension is whether the database is memory-driven or disk-driven. This is important since in the latter case an explicit cache would be required, while in the former case data is not durable.

Table 2: Memory driven and disk driven databases




MemoryConfigurableDisk
Scalaris
Redis
BigTable
Cassandra
Hbase
HyperTable
CouchDb
MongoDb
Riak
Voldemort

On one end of the spectrum is Scalaris which is entirely memory-driven, and Redis which is primarily memory oriented. Cassandra, BigTable, Hypertable, Hbase allow configuring how large the Memtable can get, so that provides a lot of control. CouchDb, MongoDb and Riak all use on-disk B+ trees, and Voldemort uses BDB and MySQL.

3. Data Model richness

On the basis of data model the various NoSQL databases can be grouped in following three groups.

3.1 Key-value Stores


These systems store values and an index to find them, based on a programmer-defined key. These data stores use a data model similar to the popular memcached distributed in-memory cache, with a single key-value index for all the data. Like memcached, none of these systems offer secondary indices or keys.

3.2 Document Stores


These systems store documents. The documents are indexed and a simple query mechanism may be provided. Document stores support more complex data than the key-value stores. The term “document store” is not ideal, because these systems store objects (generally objects without pointers, described in JSON notation), not necessarily documents. Unlike the key-value stores, they generally support multiple indexes and multiple types of documents (objects) per database, and they support complex values.

3.3 Column Stores

These systems store extensible records that can be partitioned across nodes. They are also refered as “Extensible Record Stores”. Their basic data model is rows and columns, and their basic scalability model is splitting both rows and columns over multiple nodes. Rows are split across nodes through conventional sharding, on the primary key. They typically split by range rather than a hash function (this means that queries on ranges of values do not have to go to every node). Columns of a table are distributed over multiple nodes by using “column groups”.

These may seem like a new complexity, but column groups are simply a way for the customer to indicate which columns are best grouped together. These two partitionings (horizontal and vertical) can be used simultaneously on the same table. The column groups must be pre-defined with the extensible record stores. However, that is not a big constraint, as new attributes can be defined at any time. Rows are not that dissimilar from documents: they can have a variable number of attributes (fields), the attribute names must be unique, rows are grouped into collections (tables), and an individual row’s attributes can be of any type.

Table 3: Classification of NoSQL databases based on data model




Key-Value store Document store Column-Store
Amazon Dynamo
Amazon S3
Redis
Scalaris
Voldemort
SimpleDb
Couchdb
MongoDb
Riak
Cassandra
Google BigTable
HBase
Hyperbase