Tuesday, May 24, 2011

Logging Messages using Scribe and PHP

What's Scribe

Scribe is developed and open sourced by facebook. Its their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions and many others. Scribe is a server for aggregating log data that's streamed in real time from clients. It is designed to be scalable and reliable. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed file system, or send them to another layer of scribe servers.

To know more about scribe visit scribe on github

Install Scribe:

Prerequisite:

--libevent, Event Notification library
--boost, Boost C++ library (version 1.36 or later)
Note: For latest scribe version 2.2 you need to install boost library version 1.45 or lower.
scribe 2.2 is not compatible with boost 1.46 or higher.
you can download boost 1.45 from here.
--thrift, version 0.5.0 or later
--fb303, Facebook Bassline (included in thrift/contrib/fb303/) r697294 or later is required.

You can download latest version of scribe from here.

Steps to install:

1. Install libevent if not already installed
2. Install boost version between 1.36 to 1.45
3. If boost is installed in a non-default location or there are multiple boost versions installed, you will need to set the Boost path and library names
export BOOST_ROOT=/opt/boost_1_45_1
export LD_LIBRARY_PATH=/opt/boost_1_45_1/stage/lib
4. Install thrift
5. Install fb303
6. Now install scribe
untar the source and run
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
./bootstrap.sh
make
make install

A sample scribe config file:

#This file configures Scribe to listen for messages on port 1463 and write them to /var/log/scribelogs

port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT
<store>
category=default
type=buffer

target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10

<primary>
type=file
fs_type=std
file_path=/var/log/scribelogs
base_filename=thisisoverwritten
max_size=1000000
add_newlines=1
</primary>

<secondary>
type=file
fs_type=std
file_path=/tmp
base_filename=thisisoverwritten
max_size=3000000
</secondary>
</store>
Please go through this README file to learn more advance scribe config options.

Running a scribe server:

run the follwoing command to run scribe server

path_to_scribe_dir/src/scribed path_to_config_file/config.conf


Write a Scribe Client in PHP:

In order to write a client in PHP, you will need the following:

  • Thrift PHP library files
  • Client libraries for scribe
  • Client libraries for fb303

Step 0: Create a folder for your client files, for example /home/gaurav/scribephplibs

Step 1: Get Thrift PHP library files
Thrift php library files are bundled with thrift source, you can just copy them to your folder
cp -R path_to_thrift_source/lib/php/src /home/gaurav/scribephplibs

Step 2: Generate client library for scribe
thrift -o /home/gaurav/scribephplibs --gen php path_to_scribe_source/if/scribe.thrift

Step 3: Generate client library for fb303
thrift -o /home/gaurav/scribephplibs --gen php path_to_thrift_source/contrib/fb303/if/fb303.thrift

Step 4: By default Thrift will place the generated files in a folder named gen-php. But Scribe client will expect them in folder named packages, so we need to rename that folder.

mv /home/gaurav/scribephplibs/gen-php /home/gaurav/scribephplibs/packages

A sample PHP client script (ScribeLogger.php):

<?php

class ScribeLogger {

public static function Log($msg, $category){

// Set this to where you have the Scribe and Thrift libary files.
$GLOBALS['THRIFT_ROOT'] = '/home/gaurav/scribephplibs';

// Include all of the lib files we need.
require_once $GLOBALS['THRIFT_ROOT'].'/packages/scribe/scribe.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';

// A message in Scribe is made up of two parts: the category (a key) and the actual message.
$message = array();
$message['category'] = $category;
$message['message'] = $msg;

// Create a new LogEntry instance to hold our message, then add it to a new $messages array (we can submit multiple messages at the same time if we want to).

$entry = new LogEntry($message);
$messages = array($entry);

$socket = new TSocket('localhost', 1463, true);
$transport = new TFramedTransport($socket);
$protocol = new TBinaryProtocol($transport, false, false);

$scribeClient = new scribeClient($protocol, $protocol);

try {
$transport->open();
$scribeClient->Log($messages);
$transport->close();
}catch (TException $e) {
echo $e->getMessage();
}
}
}

ScribeLogger::Log("This is a test message.", 'TEST');

?>

Running the script:

php ScribeLogger.php

Now go to the /var/log/scribelogs folder, there you will see a new folder created named TEST. Scribe creates a new folder for each category. Each of these folder may contain multiple files versioned like Category_0000, Category_0001 etc and one symlink Category_current pointing to the current file scribe is writing to for that particular category.

Monday, May 23, 2011

Centralized Logging

We all know that logging is necessary and we do log a lot of different types of data on a regular basis. Data like user transactions, customer behaviour, machine behaviour, security threats, fraudulent activities etc etc. Historically, logs were mostly a tool for troubleshooting problems. But more recently, they have become important for network and system performance optimization, recording user actions, providing helpful data for investigating suspicious activity, and assisting with proactive monitoring of the environment. In many cases, having log data easily available can provide early warnings about problems before they go out of control.

There are three broad areas where logging is helpful namely troubleshooting, resource tracking, and security.

Troubleshooting:

Logging helps in troubleshooting, finding and fixing problems. The event logs are usually the best source of information for determining whether a system or network is experiencing problems. Different events such as a disk space filling to capacity, or the failure of a necessary piece of equipment, failure of a driver to load, or the detection of an IP address conflict can be recorded in the event logs. Event logs also helps in reporting diagnostic information for background processes.

Resource Tracking and Monitoring applications:

Logging helps in resource tracking, monitoring and improving service levels. It helps in providing real time insights of the system. Information on the capacity and usage of system resources should be logged. Any type of system metric that can change over time should be reported and logged. These metrics may include the frequency of users on the system, maximum number of users, the duration of the use of specific applications, the amount of available disk space crossing a threshold, memory usage, DB resources, the load on the system crossing a threshold, the number of processes running on the system at any given time etc. All these logged data will help you tune your systems before disaster strikes.

Security:

Logging is also a very important part of security of the system. It helps in mitigating security exposures and risks. It is impossible to make a system 100% secure. There are always some security flaws that can be exploited, and unfortunately the greatest risks to security are the human users themselves. If illegal access to a system cannot be completely prevented, then at least they should be recorded and tracked. These logs will help in discovering the potential problems or signs of problems and will help in resolving them.


Why (benefits of) Centralized Logging:

Centralized logging (logging all data in a Central log server or repository) provides a number of benefits than logging on local servers.

  • All of the logs are in one place, this makes things like searching through logs and analysis across multiple servers easier than bouncing around between boxes. Greatly simplifying log analysis and correlation tasks.
  • It helps in having the answers to "why" quickly and accurately. All your logs are in one location and you can quickly access them and find the trouble.
  • Suppose your system is down or overloaded and unable to tell you what happened. If you have remote copies of all your system logs you can look at exactly what's been going-on on that system.
  • Local logs from the server may be lost in the event of an intrusion or system failure. But by having the logs elsewhere you at least have a chance of finding something useful about what happened.
  • It reduces disk space usage and disk I/O on core servers that should be busy doing something else.
  • Log processing and log rotation mechanism can also be centralized, if any.
  • Centralized logging can provide clues for making things better.
In my next post I'll try to explain steps to install scribe and write a scribe client in PHP for logging messages to a Central log server.