[NTLUG:Discuss] copying web documents

Stuart Johnston saj at thecommune.net
Thu May 18 15:10:20 CDT 2006


Fred wrote:
> I may be trying Jay's suggestion about a Windoze prog since wget has resisted
> my puny efforts to make it work. Here's a thought: y'all try to get something
> to copy the manual at the following URL and tell me how you did it. That way we
> are on the (no pun intended) same page.
> 
> http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/

Apparently, wget respects the robots.txt file which causes problems with 
this site.  But here's what I did.

First, download the index file:

wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" 
http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/index.html

Then, use that file as the input:

wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -nd -F 
-L -r -B 
http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/ 
-D www.globalsecurity.org -A .htm -i index.html




More information about the Discuss mailing list