[NTLUG:Discuss] copying web documents
Stuart Johnston
saj at thecommune.net
Thu May 18 15:10:20 CDT 2006
Fred wrote:
> I may be trying Jay's suggestion about a Windoze prog since wget has resisted
> my puny efforts to make it work. Here's a thought: y'all try to get something
> to copy the manual at the following URL and tell me how you did it. That way we
> are on the (no pun intended) same page.
>
> http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/
Apparently, wget respects the robots.txt file which causes problems with
this site. But here's what I did.
First, download the index file:
wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/index.html
Then, use that file as the input:
wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -nd -F
-L -r -B
http://www.globalsecurity.org/military/library/policy/army/fm/3-19-40/
-D www.globalsecurity.org -A .htm -i index.html
More information about the Discuss
mailing list