[NTLUG:Discuss] pulling tables out of web pages.
Rob Apodaca
rapodaca at raacc.com
Wed Sep 15 20:52:55 CDT 2004
> >I have tried some html2txt tools and have had no success.
> >
> > I need to convert a web page into a tab delimited file (preferably
> > keeping only the data table). My goal is to do several of these pages
> > and cat them into a big table and delete duplicates.
> >
> > I think I can handle most of the problem if I can just convert the html
> > to a tab delimited text file.
> >
> > Anyone know of a reliable tool?
I think the perl module HTML-TableContentParser is what you want:
http://search.cpan.org/~sdrabble/HTML-TableContentParser-0.13/TableContentParser.pm
or check cpan for other modules.
Cheers,
-Rob
More information about the Discuss
mailing list