[NTLUG:Discuss] Offline browsing/mirroring utility

Neil Aggarwal neil at JAMMConsulting.com
Thu May 25 16:58:16 CDT 2006


RW:

That is the example I tried to follow.

According to that page:

Limit spanning to certain domains---`-D' 

The `-D' option allows you to specify the domains that will be followed,
thus limiting the 
recursion only to the hosts that belong to these domains. 
Obviously, this makes sense only in conjunction with `-H'. 
A typical example would be downloading the contents of `www.server.com', 
but allowing downloads from `images.server.com', etc.: 
  wget -rH -Dserver.com http://www.server.com/


Following their example, I tried this:

wget -rH -Dstartrek.com http://www.startrek.com 

and it is still pulling pages from other domains including amazon.com

Any ideas?

	Neil

--
Neil Aggarwal, JAMM Consulting, (214)986-3533, www.JAMMConsulting.com
FREE! Valuable info on how your business can reduce operating costs by
17% or more in 6 months or less! http://newsletter.JAMMConsulting.com
-----Original Message-----
From: Discuss-bounces at ntlug.org [mailto:Discuss-bounces at ntlug.org] On Behalf
Of Rev. wRy
Sent: Thursday, May 25, 2006 1:02 PM
To: NTLUG Discussion List
Subject: Re: [NTLUG:Discuss] Offline browsing/mirroring utility

On Thu, 2006-05-25 at 12:42, Neil Aggarwal wrote:
> RW:
> 
> This looked interesting to me, so I tried doing this for grins:
> 
> wget --convert-links --domains=startrek.com --exclude-domains amazon.com
-H
> -r  <http://www.startrek.com> http://www.startrek.com
> 
> It keeps going off and pulling down pages from other sites, including
> Amazon.com.
> 
> Any ideas why this is happening?

I'm thinking it has to do with -H.  See 
http://www.delorie.com/gnu/docs/wget/wget_15.html for the details about
host spanning with wget.

RW


_______________________________________________
http://ntlug.pmichaud.com/mailman/listinfo/discuss




More information about the Discuss mailing list