[NTLUG:Discuss] utility to convert doc and xls to text, xml, or html
Steve Baker
sjbaker1 at airmail.net
Tue Feb 3 17:26:20 CST 2004
robert apodaca wrote:
> Does anyone know of a linux utility, perl module, anything that can convert microsoft word files into either plain text, xml, html or even some other easily parsed format?
> Also need the same for excel files to csv, xml, html, or anything else.
There is a tool called 'catdoc' out there. It does a reasonable job of getting
plain text out of .DOC files:
http://www.45.free.net/~vitus/ice/catdoc/
> I know there are programs like abiword and open office which can convert these, but
> I'm looking for something I can call from a script.
Yes - catdoc is just a filter. Use it like this:
catdoc < file.doc > file.txt
I also saw xls2csv that takes Excel and generates comma-separated-values, word2x
that allegedly does a similar job to catdoc. I havn't used either of those though.
---------------------------- Steve Baker -------------------------
HomeEmail: <sjbaker1 at airmail.net> WorkEmail: <sjbaker at link.com>
HomePage : http://www.sjbaker.org
Projects : http://plib.sf.net http://tuxaqfh.sf.net
http://tuxkart.sf.net http://prettypoly.sf.net
-----BEGIN GEEK CODE BLOCK-----
GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M-
V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
-----END GEEK CODE BLOCK-----
More information about the Discuss
mailing list