[NTLUG:Discuss] copying web documents
steve
sjbaker1 at airmail.net
Thu May 18 17:48:43 CDT 2006
I agree - I'd be quite suprised if you couldn't find a set of options
to wget that did exactly what you wanted.
The one I use most often is '-m' which does what ever is necessary to
"mirror" the site - so it gets you all the pages from that server - but
not things it links to outside of that server.
The '-l' option allows you to restrict the 'depth' of recursive
retrieval.
But I think you may want '-p' which loads the page you request AND
everything necessary to properly display it (images it references,
etc)
David Stanaway wrote:
> wget is still your tool, I suggest you check the options some more. -r
> by default recurses the whole site, but you can restrict the number of
> links followed, or restrict the URL path matched. I think you can also
> tell it to use a different user agent if the webserver has a user agent
> filter.
>
>
> Fred wrote:
>
>>How does one copy a many paged online html document? I tried wget but it tries
>>to do the whole website (and is told to buzz off by the server). If the
>>document was available in pdf form it would be moot, but someone stuck it on
>>their web site in html. Y'know, link after bloody link... God only knows how
>>many pages.
>>Something like wget which is able to start at the table of contents and
>>retrieve all pages.
>>
>>Fred
>>
>>__________________________________________________
>>Do You Yahoo!?
>>Tired of spam? Yahoo! Mail has the best spam protection around
>>http://mail.yahoo.com
>>
>>_______________________________________________
>>http://ntlug.pmichaud.com/mailman/listinfo/discuss
>>
>
>
>
> _______________________________________________
> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>
More information about the Discuss
mailing list