Wednesday, December 3, 2008

Convert word/doc files to html using openoffice macros

As many of you know OpenOffice.org has a powerful support for plugins or Macros. These macros allow you to add a lot of additional functionality to the application. You can also use this feature to write a command line tool to convert a .doc or .odt file to a .html file. Writing this tool is a two step process:

Step 1: Write a micro to covert a .doc file to .html

Start up OpenOffice.org Word Processor. Then from the Tools menu, select Macros, Organize Macros, OpenOffice.org Basic. A window will popup. Navigate to My Macros, Standard, Module1. Edit the module to include the following code:
REM  *****  BASIC  *****

Sub ConvertWordToHTML(cFile)
cURL = ConvertToURL(cFile)
dim args(0) as new com.sun.star.beans.PropertyValue
oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0,
Array(MakePropertyValue("Hidden", True),))
cFile = Left(cFile, Len(cFile) - 4) + ".html"
cURL = ConvertToURL(cFile)

' Save the document using a filter.
args(0).Name = "FilterName"
args(0).Value = "HTML (StarWriter)"
oDoc.storeToURL(cURL, args())
oDoc.close(True)
End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue )
As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function

Now save it and exit from OpenOffice.org.
Step 2: Create a shell script to execute this macro from command line

Create a shell script, called doc2htm in /usr/local/bin with the following code:
#!/bin/sh
DOC=$1
/usr/bin/oowriter -invisible
"macro:///Standard.Module1.ConvertWordToHTML($DOC)"

Thats it. Now you can run the script from command line like:
$ doc2htm /path_to_the_doc_file/file_name.doc
and you will get file_name.html in the same directory where the original doc file resides.

No comments:

Post a Comment