Wednesday, December 24, 2008

Stop bots from spamming : Old style CAPTCHA - Alternatives or replacements

Old style CAPTCHAs are pretty much broken. So what else we can use to replace it. Given below are some alternatives to old style CAPTCHAs.

1. reCAPTCHA - It is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. They deliver CAPTCHAs that are proved to be unreadable by OCR and donate the human processing to a charitable cause, preserving out of copyright books for future generations.

2. CSS hidden field - Add a text input field to your form and give it a name that makes sense. Then with some CSS hide the table row or div that the input field is in. The bots should fill it in, add some code that checks that the hidden field was not filled in, and if you find this field filled in, you can quit execution right there. Make sure to label this so that people with screen readers can understand not to fill it in.

3. Work out the time that it took for the form to be submitted - In your form, add a hidden variable and set its value to the time stamp of when the form was loaded. Then, once the form has been submitted, get a new time stamp value and compare the two values. If the new value is less than say about 5 seconds (or the time you estimate it will take a human to fill in your form, remembering that spam bots will do it almost instantaneously) then you can return to the form with a error message stating that the form was submitted in too short a time period.

4. Give the user simple Challenge questions like - What are the total number of syllables in the American President's full name. OR, put a simple math equation at the bottom of the form like (2 + 4 - 1 =). Remember the fact that you would need to make the questions random.

5. Use music. Play the music and give the user multiple choice answers.

6. After filling the form have the users go to another link and copy and paste an constantly changing image into a box. This would be similar to an RSA token but without the hardware.

In the end, nothing is perfect but the end result is something that is accessible and will keep your site safe.

Thursday, December 4, 2008

Php scripts : problem with whitespace

Recently while working on some project I faced a strange problem. In one of our app we were trying to implement and call a service using hessian. But somehow we were getting malformed reply for all the requests. We checked all the scripts and there were no problems in any of them. Finally, after a lot of efforts we got the problem. The problem was very simple, in one of the scripts there were some blank lines before the php start tag (<?php)

Leaving whitespaces (spaces/line breaks etc) before or after php scripts can be problematic and can result in unexpected or undesirable behaviour. Since these whitespaces will be echoed to the browser along with the normal output, It can break any script attempting to send headers (e.g. initializing session, sending content type, etc.) and can distort the web page layout.

Here is a simple php script that will scan php files in a given directory (and its subdirectories) and will remove all the whitespaces.

<?php
$maindir = "/path_to_the_project_dir";

define("PRE", "/^[\n\r|\r\n|\n|\r|\s]+<\?php/");
define("POST", "/\?>[\n\r|\r\n|\n|\r|\s]+$/");

clearstatcache();

if(scan_dir( $maindir, "removeWSpace", true ) === false)
{
echo "'{$maindir}' is not a valid directory\n";
}

function scan_dir( $maindir, $callback, $recursive = true )
{
$dh = @opendir( $maindir );
if( $dh === false)
return false;

while( $file = readdir( $dh ))
{
if( "." == $file || ".." == $file )
{
continue;
}
call_user_func( $callback, "{$maindir}/{$file}" );
if( $recursive !== false && is_dir( "{$maindir}/{$file}" ))
{
scan_dir( "{$maindir}/{$file}", $callback, $recursive );
}
}
closedir( $dh );
return true;
}

function removeWSpace( $path )
{
if( !is_dir( $path ) && substr($path, -4) == ".php")
{
$fh = file_get_contents($path);
$fh = preg_replace(PRE, ' 0 || $c2 > 0)
{
if(file_put_contents($path, $fh))
echo $path . " -- modified \n";
}
}
}

?>

Wednesday, December 3, 2008

Convert word/doc files to html using openoffice macros

As many of you know OpenOffice.org has a powerful support for plugins or Macros. These macros allow you to add a lot of additional functionality to the application. You can also use this feature to write a command line tool to convert a .doc or .odt file to a .html file. Writing this tool is a two step process:

Step 1: Write a micro to covert a .doc file to .html

Start up OpenOffice.org Word Processor. Then from the Tools menu, select Macros, Organize Macros, OpenOffice.org Basic. A window will popup. Navigate to My Macros, Standard, Module1. Edit the module to include the following code:
REM  *****  BASIC  *****

Sub ConvertWordToHTML(cFile)
cURL = ConvertToURL(cFile)
dim args(0) as new com.sun.star.beans.PropertyValue
oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0,
Array(MakePropertyValue("Hidden", True),))
cFile = Left(cFile, Len(cFile) - 4) + ".html"
cURL = ConvertToURL(cFile)

' Save the document using a filter.
args(0).Name = "FilterName"
args(0).Value = "HTML (StarWriter)"
oDoc.storeToURL(cURL, args())
oDoc.close(True)
End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue )
As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function

Now save it and exit from OpenOffice.org.
Step 2: Create a shell script to execute this macro from command line

Create a shell script, called doc2htm in /usr/local/bin with the following code:
#!/bin/sh
DOC=$1
/usr/bin/oowriter -invisible
"macro:///Standard.Module1.ConvertWordToHTML($DOC)"

Thats it. Now you can run the script from command line like:
$ doc2htm /path_to_the_doc_file/file_name.doc
and you will get file_name.html in the same directory where the original doc file resides.

Saturday, November 29, 2008

PostgreSQL vs MySQL

We have been using Mysql for last serveral years. Bur from some time now we are facing scalability issues with Mysql as our tables are growing fast and no. of queries on these tables are increasing day by day. While looking the possible solutions many suggested to use PostgreSql. So first thing that came to my mind was in what ways Pgsql is better than Mysql. I searched the web, asked this question to my network and here is some of the findings...

Pgsql is a much more feature complete SQL engine.

PostgreSQL does not have an unsigned integer data type, but it has a much richer data type support (including BOOLEAN, IP addresses, UUIDs, and such), user-defined data types mechanism, built-in and contributed data types.

Both PostgreSQL and MySQL support Not-Null, Unique, Primary Key and Foreign Key constraints. MySQL doesn't support the Check constraint while PostgreSQL has supported it for a long time. PostgreSQL's base data types are much more consistent about enforcing data integrity, even in the absence of constraints and foreign keys. Thus, "NOT NULL" really means that NULLs are forbidden, you can't have "February 29th" in other than leap years, "blank" isn't automagically transmogrified into a "zero", and such.

PostgreSQL can compress and decompress its data on the fly with a fast compression scheme to fit more data in an allotted disk space. MySQL's high performance storage engines do not support on the fly compression as of 5.1. MySQL 6.0 will support on the fly compression with its Falcon storage engine

MySQL's MyISAM engine performs faster than PostgreSQL on simple queries and when concurrency is low. MyISAM's speed comes at the cost of not supporting transactions, foreign keys, and not offering guaranteed data durability.

MySQL's count(*) is really fast. PostgreSQL count(*) is very slow because instead of counting rows using an index scan, it goes through the entire table sequentially.

MySQL supports INSERT IGNORE and REPLACE statements. PostgreSQL supports neither of these statements and suggests using stored procedures to get around the lack of these statements.

PostgreSQL's speed advantage over MySQL can be seen drastically in a large multi-core/processor environment. PostgreSQL scales much better, both in terms of using up scale hardware, and dealing with concurrency. MySQL, on the other hand, focuses on scale out technologies and the use of off the shelf commodity hardware.

PostgreSQL is fully ACID-compliant, while MySQL's InnoDB storage engine provides engine-level ACID-compliance.

PostgreSQL supports Partial and Bitmap Indices. MySQL has no bitmap indices (but achieves similar functionality using its "index_merge" feature) and partial indices (MySQL supports partial indexing using the InnoDB engine, but not with the MyISAM engine).

A PostgreSQL trigger can execute any user defined function from any of its procedural languages, not just PL/pgsql. PostgreSQL also supports "rules" which allow operating on the query syntax tree, and can do some operations more simply that are traditionally done by triggers.

MySQL has built-in replication, PostgreSQL is modular by design, and replication is not in the core. There are several packages that allow replication in PostgreSQL.

PostgreSQL makes a much better impression, from administration perspective. Backup and replication features are more advanced, many features that MySQL is going to have (?) like point in time recovery , multiple replication slaves , better support for foreign keys, cursors and stored procedures are all available in PostgreSQL already.

MySQL is an open-source product. Postgres is an open-source project. MySQL community is more active and enthusiastic than PostgreSQL's and the MySQL documentation (books, blogs etc) are way more and up to date.