[NTLUG:Discuss] Offline browsing/mirroring utility
Kenneth Loafman
kenneth at loafman.com
Fri May 26 09:35:32 CDT 2006
Here's my last attempt and it works the same as the --no-span-hosts option.
wget --mirror -Dstartrek.com -w 1 -b http://www.startrek.com
I looked at the log and for all the pages outside the startrek.com
domain, each one was caused by a redirect like below. It could be
argued that a redirected page is 'in' the same domain. wget did not try
to traverse anything on the redirected pages since they were outside the
domain in the -D option.
--09:15:55--
http://www.startrek.com/startrek/page/redirect/external?id=15306
=> `www.startrek.com/startrek/page/redirect/external?id=15306'
Reusing existing connection to www.startrek.com:80.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://www.planetxpo.com/40th/ [following]
--09:15:56-- http://www.planetxpo.com/40th/
=> `www.planetxpo.com/40th/index.html'
Resolving www.planetxpo.com... 216.69.176.90
Connecting to www.planetxpo.com|216.69.176.90|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5,355 (5.2K) [text/html]
...Ken
Kenneth Loafman wrote:
> Neil,
>
> I can't see to get it to stop either, even with the --no-span-hosts
> option turned on. Thankfully, it only retrieves the pages that allow it
> to display the page and does not try to traverse Amazon or the others.
> Maybe that's intentional, but it does seem there should be a filter that
> says to stay on this site only.
>
> I must admit the man page for wget is rather obscure, even for an old
> UNIX hacker like me. There's probably something to do what we want, but
> I just don't see it now. Googling is somewhat useless since all of the
> sites seem to plagiarize each other, all with the same bad info.
>
> ...Ken
>
> Neil Aggarwal wrote:
>> Ken:
>>
>> wget -m http://www.startrek.com
>>
>> still pulls pages from amazon.com and other domains when I run it.
>>
>> Any ideas?
>>
>> Thanks,
>> Neil
>>
>> --
>> Neil Aggarwal, JAMM Consulting, (214)986-3533, www.JAMMConsulting.com
>> FREE! Valuable info on how your business can reduce operating costs by
>> 17% or more in 6 months or less! http://newsletter.JAMMConsulting.com
>>
>> -----Original Message-----
>> From: Discuss-bounces at ntlug.org [mailto:Discuss-bounces at ntlug.org] On Behalf
>> Of Kenneth Loafman
>> Sent: Thursday, May 25, 2006 1:50 PM
>> To: NTLUG Discussion List
>> Subject: Re: [NTLUG:Discuss] Offline browsing/mirroring utility
>>
>> Rev. wRy wrote:
>>> On Thu, 2006-05-25 at 10:43, . Daniel wrote:
>>>> I've started seeking a utility that will save a website for reference and
>>>> offline browsing and so far, I'm coming up empty-handed.
>>>>
>>>> Specifically, I'm trying to get http://bigyellowbox.tripod.com/. The
>> site
>>>> hasn't been updated since some time in 2002 and I'd like to get it onto a
>>>> local system as I fear it may disappear after some time. (attempts to
>>>> contact the author/owner results in "unknown recipient" bounce)
>>> wget --convert-links -r http://bigyellowbox.tripod.com/
>> wget -m http://bigyellowbox.tripod.com
>>
>> works for me
>>
>> ...Ken
>>
>> _______________________________________________
>> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>>
>>
>> _______________________________________________
>> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>>
>
> _______________________________________________
> http://ntlug.pmichaud.com/mailman/listinfo/discuss
>
More information about the Discuss
mailing list