[NTLUG:Discuss] utility to convert doc and xls to text, xml, or html

Steve Baker sjbaker1 at airmail.net
Tue Feb 3 17:26:20 CST 2004


robert apodaca wrote:
> Does anyone know of a linux utility, perl module, anything that can convert microsoft word files into either plain text, xml, html or even some other easily parsed format?
> Also need the same for excel files to csv, xml, html, or anything else.

There is a tool called 'catdoc' out there.  It does a reasonable job of getting
plain text out of .DOC files:

     http://www.45.free.net/~vitus/ice/catdoc/

> I know there are programs like abiword and open office which can convert these, but
 > I'm looking for something I can call from a script.

Yes - catdoc is just a filter.  Use it like this:

   catdoc < file.doc > file.txt

I also saw xls2csv that takes Excel and generates comma-separated-values, word2x
that allegedly does a similar job to catdoc.  I havn't used either of those though.

---------------------------- Steve Baker -------------------------
HomeEmail: <sjbaker1 at airmail.net>    WorkEmail: <sjbaker at link.com>
HomePage : http://www.sjbaker.org
Projects : http://plib.sf.net    http://tuxaqfh.sf.net
            http://tuxkart.sf.net http://prettypoly.sf.net
-----BEGIN GEEK CODE BLOCK-----
GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M-
V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
-----END GEEK CODE BLOCK-----




More information about the Discuss mailing list