[NTLUG:Discuss] copying web documents
David Stanaway
david at stanaway.net
Thu May 18 09:30:30 CDT 2006
wget is still your tool, I suggest you check the options some more. -r
by default recurses the whole site, but you can restrict the number of
links followed, or restrict the URL path matched. I think you can also
tell it to use a different user agent if the webserver has a user agent
filter.
Fred wrote:
> How does one copy a many paged online html document? I tried wget but it tries
> to do the whole website (and is told to buzz off by the server). If the
> document was available in pdf form it would be moot, but someone stuck it on
> their web site in html. Y'know, link after bloody link... God only knows how
> many pages.
> Something like wget which is able to start at the table of contents and
> retrieve all pages.
>
> Fred
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> _______________________________________________
> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>
More information about the Discuss
mailing list