[NTLUG:Discuss] bash question

Steve Baker sjbaker1 at airmail.net
Wed Sep 12 14:49:30 CDT 2001


"Wrenn, Bobby J." wrote:
> 
> I have been unable to get anything with spaces to work at all. Output is
> truncated file name .txt with zero length files.
> 
> This seems to start to work. But it doesn't change the file name on disk
> only in memory. Then it try's to work on the renamed file which doesn't
> exist.
> 
> for i in ` ls *.pdf | tr ' ' '_'`; do
>         DONAME=`basename $i .pdf`
>         pdftotext $DONAME.pdf $DONAME.txt
> done

Nah - that's not going to work - but since we are *TEACHING* you stuff
here, I won't give you the solution until you've learned how this works
(or doesn't in this case).

Let's break this down into smaller, digestible chunks.

Firstly, any command that's enclosed in reverse quotes is executed, then
whatever it outputs is inserted into the command line in place of the
back-quoted string.  (Backquotes are the ones you never use over by the
escape key - not the one on the left by the enter key)...

So the first line translates to:

  ls *.pdf | tr ' ' '_'

...list all the files that end with the string '.pdf' - pass that list of
filenames into the 'tr' program.  'tr' ('translate') converts all characters
that are found in the first string into the corresponding characters in the
second...so the output of tr (in this case) should be a list of filenames
with the spaces replaced by underscores.  You can run just that command by
itself and verify that it works.

Now, because that command is enclosed in backquotes, it'll be run and then
replaced by whatever it generated as it's output...which is a list of filenames
with underscores instead of spaces.

So now, we have something like:

 for i in a_b_c.pdf d_e_f.pdf h_i_j.pdf ; do
         DONAME=`basename $i .pdf`
         pdftotext $DONAME.pdf $DONAME.txt
 done

which is a loop in which the shell variable '$i' takes
a different filename each time around the loop.

So, onto the next line:

DONAME=`basename $i .pdf`

...OK, more back-quotes - so 'basename' is a program that
takes it's first argument (the filename that's currently
stored in $i) and a suffix (the second argument '.pdf') and
prints the filename to stdout with the suffix removed (the
"base" filename).  It also deletes any leading directory
path...which shouldn't matter in this case.

So, 'basename' is running in backquotes - so it's removed and
replaced with it's output - which is the filename minus the suffix.
Hence, this command is transformed into:

DONAME=a_b_c

...or whatever $i is without the '.pdf' part. This sets the shell
variable '$DONAME' equal to the filename with underscores and without
'.pdf'.

Finally (and here is where the bug is), it runs:

pdftotext $DONAME.pdf $DONAME.txt

...which translates to:

pdftotext a_b_c.pdf a_b_c.txt

...which is **WRONG** because there is no such file as a_b_c.pdf,
we only have "a b c.pdf".

So a better way to do this is:

 for i in *.pdf ; do
   DONAME=`basename $i .pdf | tr ' ' '_'`
   pdftotext $i $DONAME.txt
 done

...which is also un-tested - but should work (famous last words!).

----------------------------- Steve Baker -------------------------------
Mail : <sjbaker1 at airmail.net>   WorkMail: <sjbaker at link.com>
URLs : http://web2.airmail.net/sjbaker1
       http://plib.sf.net http://tuxaqfh.sf.net http://tuxkart.sf.net
       http://prettypoly.sf.net http://freeglut.sf.net
       http://toobular.sf.net   http://lodestone.sf.net



More information about the Discuss mailing list