[NTLUG:Discuss] xargs guide

Sat Aug 1 10:42:08 CDT 2015

On 07/31/2015 07:33 PM, Steve Litt wrote:
> On Thu, 30 Jul 2015 07:24:26 -0500
> Pesto <dawjer at gmail.com> wrote:
>
>> As someone who has used xargs since the mid '90s I gotta say this is
>> well done. I learned stuff. Thanks.
>>
>>
>> pesto
>
> Thanks pesto!
>
> I just got finished incorporating about 20 improvements suggested by
> various people, including typos, omissions, and failure to properly
> identify the document's scope.
>
> Anyway, it's still at http://www.troubleshooters.com/linux/xargs.htm .
>
> Thanks,
>
> SteveT
>

Ok... some history from the "old guy" (and former sith lord)...

If you needed to supply a ton of arguments to a command for processing, your 
command line "blows up".  Let's say you have a directory containing 1 million 
files and most end in *.txt (extreme just to illustrate):

The command:

$ ls *.txt

..is going to blow up.  Why?  Because the shell globbing pattern, *.txt, likely 
expands to far too many values for the shell to handle.  In the older days, this 
problem actually happened pretty early on.  Today with Linux, not really a huge 
problem like it once was (getconf ARG_MAX, it's pretty big now). One way to tell 
just how much your systems can handle is:

$ xargs --show-limits </dev/null
Your environment variables take up 3446 bytes
POSIX upper limit on argument length (this system): 2091658
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 2088212
Size of command buffer we are actually using: 131072

(other things start breaking with a million arguments btw... not just the 
command line length... number of arguments, but "ok" for this discussion to 
igore for now).

So... what if you need to run a command with a huge number of arguments, or huge 
command line?

xargs is BORN!

The idea is that xargs will excecute a command the number of times required 
using the maximum amount of arguments allowed by your system.

The case Steve mentioned of using -n 1 (or GNUified, --max-lines=1) is usually 
for specific cases.  For example a command that only can take 1 argument.

But let's say the command I want to use is "grep".  So if my data input (files 
to grep) size is really really large, I would need to break things into multiple 
greps.

$ find . -type f -name '*.txt' -print

The above tries to find all files starting with the current directory.

If you start that from root it might fail if trying to put the output as
arguments to grep:

$ grep 'my string' -- $(find . type f -print)

(given the crazy filenames we have now, this will likely fail for other reasons)

So, xargs to the rescue! (I'm avoiding find's -exec here to make a point)

$ find . -type f -print0 | xargs -0 grep 'my string' --

That command willl run grep for the maximum allowed number of argument possible 
until the list of elements is exhausted.

You've seen my famous pipelining one liner for finding all text file and 
searching before (well many of you have):

$ find . -type f -print0 | xargs -0 file | grep -i text | cut -f1 -d: | tr 
'\012' '\000' | xargs -0 grep -n 'mystring' --

Now, when using Linux, all of what was done it that command might not be 
necessary, but the above will likely work on really really old Linux as well as 
systems having a "find" and "xargs" that handles the "-0" argument.

(on old typical Unix, the command one-liner gets really complex to handle the 
lack of "-0" since you have to handle special file names via quoting and escaping)

Anyway, obviously you can use "xargs" for many many things... just wanting folks 
to know one of the major reasons why it exists historically.