[NTLUG:Discuss] pulling tables out of web pages.

Greg Edwards greg at nas-inet.com
Wed Sep 15 17:45:06 CDT 2004


David Camm wrote:
> bobby wrote:
>  >I have tried some html2txt tools and have had no success.
>  >
>  > I need to convert a web page into a tab delimited file (preferably
>  > keeping only the data table). My goal is to do several of these pages
>  > and cat them into a big table and delete duplicates.
>  >
>  > I think I can handle most of the problem if I can just convert the html
>  > to a tab delimited text file.
>  >
>  > Anyone know of a reliable tool?
>  >
> 
> unless someone on the list knows of a tool that parses html and returns 
> the contents of specific structures, i'm afraid you're in for some 
> custom programming.
> 
> david camm
> advanced web systems
> 

Bobby,

You might get a hold of the OpenOffice source and see if you can glean out 
the import routines for HTML source files.

-- 
Greg Edwards

Software Engineering Services - http://consult.nas-inet.com
Custom Hosted Websites        - http://www.nas-inet.com




More information about the Discuss mailing list