[NTLUG:Discuss] copying web documents
Terry Henderson
trryhend at gmail.com
Fri May 19 16:59:22 CDT 2006
On 5/19/06, Fred <fredstevens at yahoo.com> wrote:
> Although wget may indeed be The Tool to use, I have
> my doubts. If it truly is the best available for Linux,
> then something else needs to be developed as wget is a
> pain in the butt to use for this purpose. It may be great
> for copying web sites but it sucks at what I wanted to do.
>
> Look, what I wanted was to go to the index page of a
> multi-paged html document and download all the relevant
> pages and nothing else. So far, wget has not been able to
> do just that. My take on any tool is that if it takes more
> work to use the tool than to not use it, why use it at all?
>
> As for the robots.txt file, that appears to be a listing of
> the user agents to exclude, so a -U "Mozilla" will get past
> that.
>
> Again, thanks to everyone for their help.
>
> Fred
>
It's not the "tool" it's the webserver:
ERROR 403: Forbidden
--
<><
More information about the Discuss
mailing list