[NTLUG:Discuss] pulling tables out of web pages.

Rob Apodaca rapodaca at raacc.com
Wed Sep 15 20:52:55 CDT 2004


>  >I have tried some html2txt tools and have had no success.
>  >
>  > I need to convert a web page into a tab delimited file (preferably
>  > keeping only the data table). My goal is to do several of these pages
>  > and cat them into a big table and delete duplicates.
>  >
>  > I think I can handle most of the problem if I can just convert the html
>  > to a tab delimited text file.
>  >
>  > Anyone know of a reliable tool?

I think the perl module HTML-TableContentParser is what you want:
http://search.cpan.org/~sdrabble/HTML-TableContentParser-0.13/TableContentParser.pm

or check cpan for other modules.

Cheers,
-Rob





More information about the Discuss mailing list