[NTLUG:Discuss] pulling tables out of web pages.

Bobby Wrenn bjwrenn at augustmail.com
Thu Apr 8 16:09:08 CDT 2004


Greg Edwards wrote:
> Bobby Wrenn wrote:
> 
>> I have tried some html2txt tools and have had no success.
>>
>> I need to convert a web page into a tab delimited file (preferably 
>> keeping only the data table). My goal is to do several of these pages 
>> and cat them into a big table and delete duplicates.
>>
>> I think I can handle most of the problem if I can just convert the 
>> html to a tab delimited text file.
>>
>> Anyone know of a reliable tool?
>>
>> Here is a sample of the web pages I am working on:
>> http://partsurfer.hp.com/cgi-bin/spi/main?sel_flg=partlist&model=KAYAK+XU+6%2F266MT&HP_model=&modname=Kayak+XU+6%2F266MT&template=secondary&plist_sval=ALL&plist_styp=flag&dealer_id=&callingsite=&keysel=X&catsel=X&ptypsel=X&strsrch=&pictype=I&picture=X&uniqpic= 
>>
>>
>> TIA
>> Bobby
> 
> 
> If this is a one time deal?  Read the file in with StarOffice Calc, then 
>  save as a comma delimited file (text CVS).  Some of the other 
> spreadsheet progs can do this as well.
> 
> HTH
I have been using OOo with good results for the one offs. But now i have 
24 files to process and just wanted a way do them all at once.

Ultimately, I would like to build a tool to go to the website pull down 
the page and convert it and save the result to a file. This is (as 
Usual) going to be a learning process for me.

Thanks,
Bobby




More information about the Discuss mailing list