[NTLUG:Discuss] Scripting help

Tue Apr 28 13:55:58 CDT 2009

Daniel Hauck wrote:

> Dennis Kaptain さんは書きました:
>
>>
>>
>>> I am trying to show my hardware students some of the things that
>>> cane be
>>> done from the command line once they get Linux installed on their
>>> hardware. I was trying to show how to write a basic bash script that
>>> would grep 'the' from a text file, run each instance of 'the'
>>> through wc
>>> and redirect that output to another file called thecount.txt. I
>>> originally thought that:
>>>
>>> #!/bin/bash
>>>
>>> grep 'the' > thecount.txt | wc
>>>
>>> would work, but that does not seem to do the trick. So I changed the
>>> order a bit and tried:
>>>
>>> #!/bin/bash
>>>
>>> grep 'the' | wc > thecount.txt
>>>
>>> This does not seem to work either. I remember doing something like this
>>> in school (10 years and 3 moves back) but I cannot seem to locate my
>>> notes. What am I missing here, besides the notes, I mean? What would
>>> be the best way to write this script? Thanks, Dennis in Victoria
>>>
>>>
>>> _______________________________________________
>>> http://www.ntlug.org/mailman/listinfo/discuss
>>>
>> You are missing the file name you want to look in.
>>
>> grep 'the' FileToLookIn | wc > thecount.txt
>>
>> DK
>>
>
> How does this account for multiple instances of the word "the" on the
> same line? How about the importance of words that contain "the" in them
> such as "these" "them" "theirs" "theory" and the like?
>
> The first thing that should be done is to convert all lines into
> multiple lines containing singe words.
>
> Other questions that arrise are matters of case sensitivity. Adding a
> "-i" removes case sensitivity in grep.
>
> I came up with this off the top of my head:
>
> cat yro_quote.txt | tr '[:space:]' '\n' | gawk '{ print "%%%" $1 "%%%"
> }' | grep -i "%%%the%%%" | wc
>
>
> I added the three percent signs before and after the word to make it
> easier to identify the difference between "the" and "there". The "tr"
> portion is simple yet effective in that it changes all spaces into
> carriage returns. It does not filter out punctuation and perhaps it
> should. For example if a line ended in "the." for whatever reason, it
> would be missed. Adding additional "tr" functions to account for other
> characters would be trivial. I am sure there are other ways of say
> "everything that is not A-Z and not a-z is to be made into \n" but this
> is a "top of the head kind of response.
>
Adding one more 'tr' (tr '[:punct:]' '\n') might help ... as in ...

cat yro_quote.txt | tr '[:punct:]' '\n' | tr '[:space:]' '\n' (etc)
... hope that helps
Regards
Fred James