[NTLUG:Discuss] copying web documents

steve sjbaker1 at airmail.net
Thu May 18 17:48:43 CDT 2006


I agree - I'd be quite suprised if you couldn't find a set of options
to wget that did exactly what you wanted.

The one I use most often is '-m' which does what ever is necessary to
"mirror" the site - so it gets you all the pages from that server - but
not things it links to outside of that server.

The '-l' option allows you to restrict the 'depth' of recursive
retrieval.

But I think you may want '-p' which loads the page you request AND
everything necessary to properly display it (images it references,
etc)

David Stanaway wrote:
> wget is still your tool, I suggest you check the options some more. -r
> by default recurses the whole site, but you can restrict the number of
> links followed, or restrict the URL path matched. I think you can also
> tell it to use a different user agent if the webserver has a user agent
> filter.
> 
> 
> Fred wrote:
> 
>>How does one copy a many paged online html document? I tried wget but it tries
>>to do the whole website (and is told to buzz off by the server). If the
>>document was available in pdf form it would be moot, but someone stuck it on
>>their web site in html. Y'know, link after bloody link... God only knows how
>>many pages.
>>Something like wget which is able to start at the table of contents and
>>retrieve all pages.
>>
>>Fred
>>
>>__________________________________________________
>>Do You Yahoo!?
>>Tired of spam?  Yahoo! Mail has the best spam protection around 
>>http://mail.yahoo.com 
>>
>>_______________________________________________
>>http://ntlug.pmichaud.com/mailman/listinfo/discuss
>>
> 
> 
> 
> _______________________________________________
> http://ntlug.pmichaud.com/mailman/listinfo/discuss
> 




More information about the Discuss mailing list