[NTLUG:Discuss] Scripting help

Tue Apr 28 12:37:43 CDT 2009

Dennis Kaptain さんは書きました:
> 
> 
> 
>> I am trying to show my hardware students some of the things that cane be 
>> done from the command line once they get Linux installed on their 
>> hardware.  I was trying to show how to write a basic bash script that 
>> would grep 'the' from a text file, run each instance of 'the' through wc 
>> and redirect that output to another file called thecount.txt.  I 
>> originally thought that:
>>
>> #!/bin/bash
>>
>> grep 'the' > thecount.txt | wc
>>
>> would work, but that does not seem to do the trick.  So I changed the 
>> order a bit and tried:
>>
>> #!/bin/bash
>>
>> grep 'the' | wc > thecount.txt
>>
>> This does not seem to work either.  I remember doing something like this 
>> in school (10 years and 3 moves back) but I cannot seem to locate my 
>> notes.  What am I missing here, besides the notes, I mean?  What would 
>> be the best way to write this script?  Thanks, Dennis in Victoria
>>
>>
>> _______________________________________________
>> http://www.ntlug.org/mailman/listinfo/discuss
> 
> You are missing the file name you want to look in.
> 
> grep 'the' FileToLookIn |  wc > thecount.txt
> 
> DK
> 

How does this account for multiple instances of the word "the" on the
same line?  How about the importance of words that contain "the" in them
such as "these" "them" "theirs" "theory" and the like?

The first thing that should be done is to convert all lines into
multiple lines containing singe words.

Other questions that arrise are matters of case sensitivity.  Adding a
"-i" removes case sensitivity in grep.

I came up with this off the top of my head:

cat yro_quote.txt | tr '[:space:]' '\n' | gawk '{ print "%%%" $1 "%%%"
}' | grep -i "%%%the%%%" | wc

I added the three percent signs before and after the word to make it
easier to identify the difference between "the" and "there".  The "tr"
portion is simple yet effective in that it changes all spaces into
carriage returns.  It does not filter out punctuation and perhaps it
should.  For example if a line ended in "the." for whatever reason, it
would be missed.  Adding additional "tr" functions to account for other
characters would be trivial.  I am sure there are other ways of say
"everything that is not A-Z and not a-z is to be made into \n" but this
is a "top of the head kind of response.