Tuesday, December 21, 2010

Using Thrift with Java and PHP

Thrift is a software framework for scalable cross-language services development. Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.

This post provides a step by step guide to install thrift and write a server (in java) and a client (in php) using it.

1. Download Thrift

Basic requirements
Please go through this link for a list of prerequisite or basic requirements for thrift compiler.

Download the latest stable release of from here and extract it. OR do a svn checkout

$ svn co http://svn.apache.org/repos/asf/thrift/trunk thrift

2. Build and Install

Now go to the thrift directory and run

$ ./bootstrap.sh
$ ./configure
$ make
$ make install

this will install thrift on your system.

3. Writing a Thrift file

Next step is to write a thrift definition or .thrift file. This file describes the data structures, and functions available to your remote service. For this post I am going to write a simple service for getting a user profile.

profileservice.thrift

namespace php ProfileService #client
namespace java test.services.profile.thrift #server

enum JobType {
P, //Permanent
T //Temporary
}

enum EmploymentStatus {
F, //Full Time
P, //Part Time
}

exception ProfileServiceException {
1: i32 code,
2: string message
}

struct Profile {
1: i32 profileId,
2: string name,
3: string birthDate,
4: string contactAddress,
5: i32 cityId,
6: double totalExperience,
7: JobType jobType,
8: EmploymentStatus employmentStatus,
9: string summary,
}

service ProfileService {
Profile getProfileById(1:i32 profileId) throws (1: ProfileServiceException e),
Profile getProfileByName(1:string name) throws (1: ProfileServiceException e),
}

4. Using the Thrift Compiler

Now its time to generate the thrift code for server and client. For java server run the command

thrift --gen java profileservice.thrift

After you run the thrift generation for java, it’ll make a directory called gen-java/. Under this, you can find relevant files and classes to do work based on your Thrift definition. For my thrift its generated the following files under the directory gen-java/test/services/profile/thrift/ (its based on package name or namespace provided in the .thrift file)

$ ls gen-java/test/services/profile/thrift/
EmploymentStatus.java
JobType.java
Profile.java
ProfileServiceException.java
ProfileService.java

For php client run

thrift --gen php profileservice.thrift

for php, it’ll make a directory called gen-php/. For my thrift its generated the following files under the directory gen-php/profileservice/ (its based on package name or namespace provided in the .thrift file)

$ ls gen-php/profileservice/
ProfileService.php
profileservice_types.php

5. Creating a Thrift Server using Java

The next step is to create a java source file for implementing the interface (functions that we had defined in the profileservice.thrift file). The name of the interface is our case is ProfileService.Iface. We named the java class that implemented this interface in our case "ProfileServiceImpl". You will also need thrift java library for this. You can get lib/java/libthrift.jar file from your thrift source directory.

ProfileServiceImpl.java

package server;

import java.util.*;
import org.apache.thrift.*;
import test.services.profile.thrift.*;

class ProfileServiceImpl implements ProfileService.Iface
{
public Profile getProfileById(int profileId) throws ProfileServiceException, TException {
// your code goes here
return profile;
}

public Profile getProfileByName(String name) throws ProfileServiceException, TException {
// your code goes here
return profile;
}
}

Now write a java server for this service.

Server.java

package server;

import java.io.*;
import org.apache.thrift.protocol.*;
import org.apache.thrift.protocol.TBinaryProtocol.*;
import org.apache.thrift.server.*;
import org.apache.thrift.transport.*;
import test.services.profile.thrift.*;

public class Server
{
private void start()
{
try
{
TServerSocket serverTransport = new TServerSocket(7911);
ProfileService.Processor processor = new ProfileService.Processor(new ProfileServiceImpl());
Factory protFactory = new TBinaryProtocol.Factory(true, true);
TServer server = new TThreadPoolServer(processor, serverTransport, protFactory);
System.out.println("Starting server on port 7911 ...");
server.serve();
}catch(TTransportException e)
{
e.printStackTrace();
}
}

public static void main(String[] args)
{
Server srv = new Server();
srv.start();
}
}

This program simply has a main function which binds the service to a particular port and makes the server ready to accept connections and provide response. This code will generally remain constant unless you want to provide additional functionality at server level.

Compile all the files and start the server.

6. Creating a Thrift Client using PHP

Now its time to write a thrift client in php to use this service. You'll need to include the language specific libraries to facilitate access to thrift. Look for the folder ./lib/php/src/ in your thrift source directory which contains the library files you will need.

For this tutorial I have created a folder testclient in my home directory. Now create a subfoder named src-php, and copy all the library files in this folder. You will also need to mv or cp the autogenerated thrift files (from gen-php folder) for this project into the packages folder of these library files. Here’s a screenshot of my directorys structure for this project.

testclient
..src-php
....autoload.php
....ext
....packages
......profileservice
........ProfileService.php
........profileservice_types.php
....protocol
....server
....Thrift.php
....transport

Write a php client script to connect to the thrfit ProfileService server

ProfileServiceClient.php

// Setup the path to the thrift library folder
$GLOBALS['THRIFT_ROOT'] = 'thrift';
// Load up all the thrift stuff
require_once $GLOBALS['THRIFT_ROOT'].'/Thrift.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';

// Load the package that we autogenerated for this tutorial
require_once $GLOBALS['THRIFT_ROOT'].'/packages/profileservice/ProfileService.php';

try {
// Create a thrift connection
$socket = new TSocket('localhost', '9090');
$transport = new TBufferedTransport($socket);
$protocol = new TBinaryProtocol($transport);

// Create a profile service client
$client = new ProfileServiceClient($protocol);

// Open up the connection
$transport->open();
$data = $this->client->getProfileById(123);
$this->transport->close();
$this->socket->close();
print_r($data);
}
catch (TException $tx) {
// a general thrift exception
echo "ThriftException: ".$tx->getMessage()."\r\n";
}
?>

to run the client execute
php ProfileServiceClient.php

Friday, December 10, 2010

An Intro to NoSQL

What is NoSQL


For a quarter of a century, the relational database (RDBMS) has been the dominant model for database management. In the past, relation databases were used for nearly everything. Because of their rich set of features, query capabilities and transaction management they seemed to be fit for almost every possible task one could imagine to do with a database. But their feature richness is also their flaw, because it makes building distributed RDBMSs very complex. In particular it is difficult and not very efficient to make transactions and join operations in a distributed system.

This is why, there are now some non relational databases with limited feature sets and no full ACID support, which are more suitable for the usage in a distributed environment. These databases are currently called NoSQL databases. The need to look at Non SQL systems arises out of scalability issues with relational databases, which are a function of the fact that relational databases were not designed to be distributed (which is key to write scalability), and could thus afford to provide abstractions like ACID transactions and a rich high-level query model. All NoSQL databases try and address the scalability issue in many ways – by being distributed, by providing a simpler data / query model, by relaxing consistency requirements, etc.

The name first suggests that these databases do not support the SQL query language and are not relational. But it also means "Not Only SQL", which is not so aggressive against relational databases. This stands for a new paradigm: One database technology alone is not fit for everything. Instead it is necessary to have different kinds of databases for different demands. Most NoSQL databases are developed to run on clusters consisting of commodity computers and therefore have to be distributed and failure tolerant. To achieve this, they have to make different trade-offs regarding the ACID properties, transaction management, query capabilities and performance. They are usually designed to fit the requirements of most web services and most of them are schema free and bring their own query languages.

Why NoSQL

Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. Today, the situation is slightly different. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability.

Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

Cloud computing also has placed new challenges on the database. The economic vision for cloud computing is to provide computing resources on demand with a "pay-as-you-go" model. A pool of computing resources can exploit economies of scale and a levelling of variable demand by adding or subtracting computing resources as workload demand changes. The traditional RDBMS has been unable to provide these types of elastic services. For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that
come with relational databases.