[NTLUG:Discuss] copying web documents

Fred fredstevens at yahoo.com
Fri May 19 15:53:15 CDT 2006


Although wget may indeed be The Tool to use, I have
my doubts. If it truly is the best available for Linux, 
then something else needs to be developed as wget is a 
pain in the butt to use for this purpose. It may be great
for copying web sites but it sucks at what I wanted to do.

Look, what I wanted was to go to the index page of a
multi-paged html document and download all the relevant 
pages and nothing else. So far, wget has not been able to
do just that. My take on any tool is that if it takes more
work to use the tool than to not use it, why use it at all?

As for the robots.txt file, that appears to be a listing of
the user agents to exclude, so a -U "Mozilla" will get past 
that. 

Again, thanks to everyone for their help. 

Fred 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the Discuss mailing list