Tuesday, May 24, 2011

Logging Messages using Scribe and PHP

What's Scribe

Scribe is developed and open sourced by facebook. Its their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions and many others. Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed file system, or send them to another layer of scribe servers.

To know more about scribe visit scribe on github

Install Scribe:

Prerequisite:

--libevent, Event Notification library
--boost, Boost C++ library (version 1.36 or later)
Note: For latest scribe version 2.2 you need to install boost library version 1.45 or lower.
scribe 2.2 is not compatible with boost 1.46 or higher.
you can download boost 1.45 from here.
--thrift, version 0.5.0 or later
--fb303, Facebook Bassline (included in thrift/contrib/fb303/) r697294 or later is required.

You can download latest version of scribe from here.

Steps to install:

1. Install libevent if not already installed
2. Install boost version between 1.36 to 1.45
3. If boost is installed in a non-default location or there are multiple boost versions installed, you will need to set the Boost path and library names
export BOOST_ROOT=/opt/boost_1_45_1
export LD_LIBRARY_PATH=/opt/boost_1_45_1/stage/lib
4. Install thrift
5. Install fb303
6. Now install scribe
untar the source and run
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
./bootstrap.sh
make
make install

A sample scribe config file:

#This file configures Scribe to listen for messages on port 1463 and write them to /var/log/scribelogs

port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT
<store>
category=default
type=buffer

target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10

<primary>
type=file
fs_type=std
file_path=/var/log/scribelogs
base_filename=thisisoverwritten
max_size=1000000
add_newlines=1
</primary>

<secondary>
type=file
fs_type=std
file_path=/tmp
base_filename=thisisoverwritten
max_size=3000000
</secondary>
</store>
Please go through this README file to learn more advance scribe config options.

Running a scribe server:

run the follwoing command to run scribe server

path_to_scribe_dir/src/scribed path_to_config_file/config.conf


Write a Scribe Client in PHP:

In order to write a client in PHP, you will need the following:

  • Thrift PHP library files
  • Client libraries for scribe
  • Client libraries for fb303

Step 0: Create a folder for your client files, for example /home/gaurav/scribephplibs

Step 1: Get Thrift PHP library files
Thrift php library files are bundled with thrift source, you can just copy them to your folder
cp -R path_to_thrift_source/lib/php/src /home/gaurav/scribephplibs

Step 2: Generate client library for scribe
thrift -o /home/gaurav/scribephplibs --gen php path_to_scribe_source/if/scribe.thrift

Step 3: Generate client library for fb303
thrift -o /home/gaurav/scribephplibs --gen php path_to_thrift_source/contrib/fb303/if/fb303.thrift

Step 4: By default Thrift will place the generated files in a folder named gen-php. But Scribe client will expect them in folder named packages, so we need to rename that folder.

mv /home/gaurav/scribephplibs/gen-php /home/gaurav/scribephplibs/packages

A sample PHP client script (ScribeLogger.php):

<?php

class ScribeLogger {

public static function Log($msg, $category){

// Set this to where you have the Scribe and Thrift libary files.
$GLOBALS['THRIFT_ROOT'] = '/home/gaurav/scribephplibs';

// Include all of the lib files we need.
require_once $GLOBALS['THRIFT_ROOT'].'/packages/scribe/scribe.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';

// A message in Scribe is made up of two parts: the category (a key) and the actual message.
$message = array();
$message['category'] = $category;
$message['message'] = $msg;

// Create a new LogEntry instance to hold our message, then add it to a new $messages array (we can submit multiple messages at the same time if we want to).

$entry = new LogEntry($message);
$messages = array($entry);

$socket = new TSocket('localhost', 1463, true);
$transport = new TFramedTransport($socket);
$protocol = new TBinaryProtocol($transport, false, false);

$scribeClient = new scribeClient($protocol, $protocol);

try {
$transport->open();
$scribeClient->Log($messages);
$transport->close();
}catch (TException $e) {
echo $e->getMessage();
}
}
}

ScribeLogger::Log("This is a test message.", 'TEST');

?>

Running the script:

php ScribeLogger.php

Now go to the /var/log/scribelogs folder, there you will see a new folder created named TEST. Scribe creates a new folder for each category. Each of these folder may contain multiple files versioned like Category_0000, Category_0001 etc and one symlink Category_current pointing to the current file scribe is writing to for that particular category.

3 comments:

  1. hey, good tutorial, we are also looking for using scribe, this tutorial will surely going to help me. Thanks.

    ReplyDelete
  2. Hey, Thanks for the tutorial. I had a quick question. Is there a way we can somehow tweak the message which we had received from the PHP file while we put it in aggregated log, something like say timestamp etc???

    ReplyDelete
  3. Hey, Thanks for the tutorial. I had a quick question. Is there a way somehow in the scribe that we can tweak the message sent across by our PHP code to the scribe server like say appending time-stamp or something before we store it in the aggregated log.

    ReplyDelete