[NTLUG:Discuss] pulling tables out of web pages.
Greg Edwards
greg at nas-inet.com
Wed Sep 15 17:45:06 CDT 2004
David Camm wrote:
> bobby wrote:
> >I have tried some html2txt tools and have had no success.
> >
> > I need to convert a web page into a tab delimited file (preferably
> > keeping only the data table). My goal is to do several of these pages
> > and cat them into a big table and delete duplicates.
> >
> > I think I can handle most of the problem if I can just convert the html
> > to a tab delimited text file.
> >
> > Anyone know of a reliable tool?
> >
>
> unless someone on the list knows of a tool that parses html and returns
> the contents of specific structures, i'm afraid you're in for some
> custom programming.
>
> david camm
> advanced web systems
>
Bobby,
You might get a hold of the OpenOffice source and see if you can glean out
the import routines for HTML source files.
--
Greg Edwards
Software Engineering Services - http://consult.nas-inet.com
Custom Hosted Websites - http://www.nas-inet.com
More information about the Discuss
mailing list