You are invited to Log in or Register a free Frihost Account!

Reading word documents using PHP?

Hi guys. I was wondering if this was possibile... I want to use PHP to read a word document file, decode it and display it as HTML. Does anyone here know if whether this can be achieved using PHP?
Not really. The Word document (DOC) format is binary, obscure, changes with each version of Word, and for other than Word97, the specification is not publicly available - MS try to keep it a secret. Some have tried to reverse engineer it, but I haven't heard of anyone doing a complete layout of the format yet.

You can do a real conversion by actually having Office installed on the server (but that requires a Windows server, obviously), letting Word itself do the document access, and communicate with it through PHP's COM package. It's still cumbersome, though, and as mentioned, won't work on a non-Windows server.

With Word 2003, Microsoft are phasing out the DOC format for an XML implementation, however. The specification of their XML format is "Open" and it would be relatively easy to convert such a document into HTML - also using PHP.

Another alternative, if you have control over the format, would be to use RTF instead of the binary DOC format. The RTF specification is open, I believe there are RTF-reading and writing libraries for PHP - and RTF supports every single feature of Word - it was actually developed as a cross-application alternative to the DOC format.

But if you mean taking any user-supplied document and converting it into HTML, I think you're out of luck. It may be possible to extract the text and some basic formatting (bold etc.), but I wouldn't even count on that Razz
my ISP is using Horde as their web base e-mail client. if i got attachment like word, excel or powerpoint, i can either download or view it. if i view it, Horde will convert it to web page.

if i'm not mistaken, Horde is open source PHP. so maybe u can view their source.

here's their website
Hi Kaneda. Thanks for your reply. I know that Microsoft's .doc format is in binary where as RTF documents are plain text files. I was hoping I would be able to find a php class or something which I could use to convert .doc files to .htm and display them in the viewers browser but to my suprise, forget .doc converters, I am not even able to find a class which can convert a .rtf document into HTML. All I came across was a package which used COM to convert MS Word documents into HTML but that wont be of any use to me as frihost runs on a *nix based server. I have been googling from the past 1 and a half hours. I came across one parser but I think its buggy because its not working correctly. Any help will be appreciated.

-- Naif
Related topics
[tutor] How to protect images without htaccess using PHP
Using PHP
Microsoft Word Documents & Visual Basics
Includ Images using PHP.
How to start using PHP? Help!
Website Designing using PHP
how can i setup a page break for printer on html using php ?
I am trying to build a contact form using php and flash
Communicating between web pages
how to compress a file using php ?
Checking if ports are up using PHP
Hot to change "Reset Ownership" using php?
Using php to change content of a .txt
i want to dezign a website using php
Reply to topic    Frihost Forum Index -> Scripting -> Php and MySQL

© 2005-2011 Frihost, forums powered by phpBB.