Wednesday, December 24, 2008

Stop bots from spamming : Old style CAPTCHA - Alternatives or replacements

Old style CAPTCHAs are pretty much broken. So what else we can use to replace it. Given below are some alternatives to old style CAPTCHAs.

1. reCAPTCHA - It is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. They deliver CAPTCHAs that are proved to be unreadable by OCR and donate the human processing to a charitable cause, preserving out of copyright books for future generations.

2. CSS hidden field - Add a text input field to your form and give it a name that makes sense. Then with some CSS hide the table row or div that the input field is in. The bots should fill it in, add some code that checks that the hidden field was not filled in, and if you find this field filled in, you can quit execution right there. Make sure to label this so that people with screen readers can understand not to fill it in.

3. Work out the time that it took for the form to be submitted - In your form, add a hidden variable and set its value to the time stamp of when the form was loaded. Then, once the form has been submitted, get a new time stamp value and compare the two values. If the new value is less than say about 5 seconds (or the time you estimate it will take a human to fill in your form, remembering that spam bots will do it almost instantaneously) then you can return to the form with a error message stating that the form was submitted in too short a time period.

4. Give the user simple Challenge questions like - What are the total number of syllables in the American President's full name. OR, put a simple math equation at the bottom of the form like (2 + 4 - 1 =). Remember the fact that you would need to make the questions random.

5. Use music. Play the music and give the user multiple choice answers.

6. After filling the form have the users go to another link and copy and paste an constantly changing image into a box. This would be similar to an RSA token but without the hardware.

In the end, nothing is perfect but the end result is something that is accessible and will keep your site safe.

Thursday, December 4, 2008

Php scripts : problem with whitespace

Recently while working on some project I faced a strange problem. In one of our app we were trying to implement and call a service using hessian. But somehow we were getting malformed reply for all the requests. We checked all the scripts and there were no problems in any of them. Finally, after a lot of efforts we got the problem. The problem was very simple, in one of the scripts there were some blank lines before the php start tag (<?php)

Leaving whitespaces (spaces/line breaks etc) before or after php scripts can be problematic and can result in unexpected or undesirable behaviour. Since these whitespaces will be echoed to the browser along with the normal output, It can break any script attempting to send headers (e.g. initializing session, sending content type, etc.) and can distort the web page layout.

Here is a simple php script that will scan php files in a given directory (and its subdirectories) and will remove all the whitespaces.

<?php
$maindir = "/path_to_the_project_dir";

define("PRE", "/^[\n\r|\r\n|\n|\r|\s]+<\?php/");
define("POST", "/\?>[\n\r|\r\n|\n|\r|\s]+$/");

clearstatcache();

if(scan_dir( $maindir, "removeWSpace", true ) === false)
{
echo "'{$maindir}' is not a valid directory\n";
}

function scan_dir( $maindir, $callback, $recursive = true )
{
$dh = @opendir( $maindir );
if( $dh === false)
return false;

while( $file = readdir( $dh ))
{
if( "." == $file || ".." == $file )
{
continue;
}
call_user_func( $callback, "{$maindir}/{$file}" );
if( $recursive !== false && is_dir( "{$maindir}/{$file}" ))
{
scan_dir( "{$maindir}/{$file}", $callback, $recursive );
}
}
closedir( $dh );
return true;
}

function removeWSpace( $path )
{
if( !is_dir( $path ) && substr($path, -4) == ".php")
{
$fh = file_get_contents($path);
$fh = preg_replace(PRE, ' 0 || $c2 > 0)
{
if(file_put_contents($path, $fh))
echo $path . " -- modified \n";
}
}
}

?>

Wednesday, December 3, 2008

Convert word/doc files to html using openoffice macros

As many of you know OpenOffice.org has a powerful support for plugins or Macros. These macros allow you to add a lot of additional functionality to the application. You can also use this feature to write a command line tool to convert a .doc or .odt file to a .html file. Writing this tool is a two step process:

Step 1: Write a micro to covert a .doc file to .html

Start up OpenOffice.org Word Processor. Then from the Tools menu, select Macros, Organize Macros, OpenOffice.org Basic. A window will popup. Navigate to My Macros, Standard, Module1. Edit the module to include the following code:
REM  *****  BASIC  *****

Sub ConvertWordToHTML(cFile)
cURL = ConvertToURL(cFile)
dim args(0) as new com.sun.star.beans.PropertyValue
oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0,
Array(MakePropertyValue("Hidden", True),))
cFile = Left(cFile, Len(cFile) - 4) + ".html"
cURL = ConvertToURL(cFile)

' Save the document using a filter.
args(0).Name = "FilterName"
args(0).Value = "HTML (StarWriter)"
oDoc.storeToURL(cURL, args())
oDoc.close(True)
End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue )
As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function

Now save it and exit from OpenOffice.org.
Step 2: Create a shell script to execute this macro from command line

Create a shell script, called doc2htm in /usr/local/bin with the following code:
#!/bin/sh
DOC=$1
/usr/bin/oowriter -invisible
"macro:///Standard.Module1.ConvertWordToHTML($DOC)"

Thats it. Now you can run the script from command line like:
$ doc2htm /path_to_the_doc_file/file_name.doc
and you will get file_name.html in the same directory where the original doc file resides.