[NTLUG:Discuss] copying web documents
Fred
fredstevens at yahoo.com
Fri May 19 15:53:15 CDT 2006
Although wget may indeed be The Tool to use, I have
my doubts. If it truly is the best available for Linux,
then something else needs to be developed as wget is a
pain in the butt to use for this purpose. It may be great
for copying web sites but it sucks at what I wanted to do.
Look, what I wanted was to go to the index page of a
multi-paged html document and download all the relevant
pages and nothing else. So far, wget has not been able to
do just that. My take on any tool is that if it takes more
work to use the tool than to not use it, why use it at all?
As for the robots.txt file, that appears to be a listing of
the user agents to exclude, so a -U "Mozilla" will get past
that.
Again, thanks to everyone for their help.
Fred
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Discuss
mailing list