[NTLUG:Discuss] copying web documents

David Stanaway david at stanaway.net
Thu May 18 09:30:30 CDT 2006


wget is still your tool, I suggest you check the options some more. -r
by default recurses the whole site, but you can restrict the number of
links followed, or restrict the URL path matched. I think you can also
tell it to use a different user agent if the webserver has a user agent
filter.


Fred wrote:
> How does one copy a many paged online html document? I tried wget but it tries
> to do the whole website (and is told to buzz off by the server). If the
> document was available in pdf form it would be moot, but someone stuck it on
> their web site in html. Y'know, link after bloody link... God only knows how
> many pages.
> Something like wget which is able to start at the table of contents and
> retrieve all pages.
> 
> Fred
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> http://ntlug.pmichaud.com/mailman/listinfo/discuss
> 




More information about the Discuss mailing list