[NTLUG:Discuss] bash question

Fred James fredjame at concentric.net
Wed Sep 12 22:11:06 CDT 2001


A tip of the hat, and a "from the heart" thank you.
This thread brought forth such cool stuff - I may smile for a week.


Richard Cobbe wrote:

> Lo, on Wednesday, September 12, Wrenn, Bobby J. did write:
> 
> 
>>If I can get an answer to this I will finally be able to use Linux at
>>work.
>>
>>I need to take 209 pdf files with spaces in the file names and convert
>>them into text.  I am very new to scripting and know nothing about regular
>>expressions.
>>
> 
> While the available documentation on regular expressions tends to be pretty
> opaque, I'd highly recommend taking the time to read up on them and figure
> out how they work.  They show up in lots of different contexts and are very
> useful.  I think you will find that it's time well spent.
> 
> 
>>Is there an easy way to remove the spaces from the file names?  Then how
>>do I recursively submit the files to pdftotext with the same name except
>>for the .pdf changed to .txt?
>>
> 
> Well, I'm sure all of the solutions that have been posted are quite nice,
> but they're also *way* overcomplicated.  tr?  sed?  awk?  Oy!  You can do
> this all in the shell, except of course for the pdftotext bit.
> 
> First, as many people have suggested, you don't necessarily have to get rid
> of the spaces in your filenames; you can either surround the entire
> filename with quotes or backslash each of the space characters.  If,
> however, you think that this is a real headache, you can get rid of the
> spaces pretty easily using a bash parameter expansion goodie:
> 
> (This assumes that you want to process all of the .pdf files in the current
> directory.)
> 
> for file in *.pdf ; do
>     mv "$file" ${file// /-}
>     # quotes around the first file are necessary to handle spaces correctly
>     # inside the curly braces, that's f i l e slash slash space slash hyphen
> done
> 
> In English, this means:
>     For each file in the current directory which matches *.pdf:
>         set $file to the filename
>         mv $file to $file-with-all-spaces-replaced-by-hyphens
> 
> To run them through pdftotext, the following will work nicely (even if
> you've still got spaces in your filenames):
> 
> for file in *.pdf ; do
>     pdftotext "$file" > "${file/%.pdf/.txt}"
>     # Or however you invoke pdftotext; I don't have it installed, so I
>     # can't check the manpage.
> done
> 
> In English:
>     for each file in the current directory which matches *.pdf:
>         set $file to the filename
>         run pdftotext on $file (escaping any spaces in the filename),
>             redirecting output to $file-with-a-final-.pdf-replaced-with-.txt
>             (again escaping any spaces in the filename).
> 
> See the `Parameter Expansion' section of the bash man page---and in fact
> the bash man page in general---for more information.  I think there's also
> an O'Reilly book out on the various shells that you may want to look into.
> 
> 
>>Just getting that much done will be a big help. The next step may be
>>trickier. I need to extract a name, address, and equipment list from each of
>>the files and get it into some kind of database where I can query for total
>>by item or item by location.
>>
> 
> This is almost certainly possible, but as another poster said, it depends
> heavily on the format of the text files.  He suggested awk; I'd go with
> Perl, but that's primarily because I know it better.  You can most likely
> do it with either.
> 
> (And, to those of you who recall my rants several months back about why I
> don't like Perl, no, I still don't like Perl.  <grin>  This is, however,
> one spot where it's most likely the best tool for the job.)
> 
> Back to the point: Bobby, while my code above will do what you need, you
> will get a lot more out of this in the long run if you sit down with the
> bash man page or the O'Reilly book and figure out exactly why and how it
> works.  I'd highly recommend investing the time and effort; it will pay off
> bigtime down the road.  You may also want to do the same for the other
> posters's suggestions.
> 
> Richard
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
> 
> 
> 


-- 
...make every program a filter...




More information about the Discuss mailing list