[NTLUG:Discuss] bash question

Thu Sep 13 15:17:27 CDT 2001

Please let me give you permission, and encouragement, to test this:
(1) First make a safe place
     mkdir dir_name_of_your_choice
     cd dir_name_of_your_choice
(2) make a bunch of dummy files
     touch file_name_of_your_choice{1,2,3,4,5,6}.pdf
     (this will make 5 empty files with names like
       file_name_of_your_choice1.pdf)
(3) test your script to convert them to files with names like
       filenameofyourchoice1.txt
(4) test your script to convert them back.
You won't hurt anything no matter what happens.

Wrenn, Bobby J. wrote:

> So, I'll ask again. 
> 
> Will the converse work? If I want to put the spaces back in, would:
> 
>     mv "$file" ${file/-// }
> 
> be the correct expression?
> 
> This didn't work. I know it is my lack of knowledge of reg exp. And as you
> might guess. Now that I have the files back on the Window$ machine I need
> the spaces back in the file names. So, how do I put the spaces back in?
> 
> Bobby
> -----Original Message-----
> From: Richard Cobbe [mailto:cobbe at airmail.net]
> Sent: Wednesday, September 12, 2001 6:01 PM
> To: discuss at ntlug.org
> Subject: Re: [NTLUG:Discuss] bash question
> 
> 
> Lo, on Wednesday, September 12, Wrenn, Bobby J. did write:
> 
> 
>>If I can get an answer to this I will finally be able to use Linux at
>>work.
>>
>>I need to take 209 pdf files with spaces in the file names and convert
>>them into text.  I am very new to scripting and know nothing about regular
>>expressions.
>>
> 
> While the available documentation on regular expressions tends to be pretty
> opaque, I'd highly recommend taking the time to read up on them and figure
> out how they work.  They show up in lots of different contexts and are very
> useful.  I think you will find that it's time well spent.
> 
> 
>>Is there an easy way to remove the spaces from the file names?  Then how
>>do I recursively submit the files to pdftotext with the same name except
>>for the .pdf changed to .txt?
>>
> 
> Well, I'm sure all of the solutions that have been posted are quite nice,
> but they're also *way* overcomplicated.  tr?  sed?  awk?  Oy!  You can do
> this all in the shell, except of course for the pdftotext bit.
> 
> First, as many people have suggested, you don't necessarily have to get rid
> of the spaces in your filenames; you can either surround the entire
> filename with quotes or backslash each of the space characters.  If,
> however, you think that this is a real headache, you can get rid of the
> spaces pretty easily using a bash parameter expansion goodie:
> 
> (This assumes that you want to process all of the .pdf files in the current
> directory.)
> 
> for file in *.pdf ; do
>     mv "$file" ${file// /-}
>     # quotes around the first file are necessary to handle spaces correctly
>     # inside the curly braces, that's f i l e slash slash space slash hyphen
> done
> 
> In English, this means:
>     For each file in the current directory which matches *.pdf:
>         set $file to the filename
>         mv $file to $file-with-all-spaces-replaced-by-hyphens
> 
> To run them through pdftotext, the following will work nicely (even if
> you've still got spaces in your filenames):
> 
> for file in *.pdf ; do
>     pdftotext "$file" > "${file/%.pdf/.txt}"
>     # Or however you invoke pdftotext; I don't have it installed, so I
>     # can't check the manpage.
> done
> 
> In English:
>     for each file in the current directory which matches *.pdf:
>         set $file to the filename
>         run pdftotext on $file (escaping any spaces in the filename),
>             redirecting output to $file-with-a-final-.pdf-replaced-with-.txt
>             (again escaping any spaces in the filename).
> 
> See the `Parameter Expansion' section of the bash man page---and in fact
> the bash man page in general---for more information.  I think there's also
> an O'Reilly book out on the various shells that you may want to look into.
> 
> 
>>Just getting that much done will be a big help. The next step may be
>>trickier. I need to extract a name, address, and equipment list from each
>>
> of
> 
>>the files and get it into some kind of database where I can query for
>>
> total
> 
>>by item or item by location.
>>
> 
> This is almost certainly possible, but as another poster said, it depends
> heavily on the format of the text files.  He suggested awk; I'd go with
> Perl, but that's primarily because I know it better.  You can most likely
> do it with either.
> 
> (And, to those of you who recall my rants several months back about why I
> don't like Perl, no, I still don't like Perl.  <grin>  This is, however,
> one spot where it's most likely the best tool for the job.)
> 
> Back to the point: Bobby, while my code above will do what you need, you
> will get a lot more out of this in the long run if you sit down with the
> bash man page or the O'Reilly book and figure out exactly why and how it
> works.  I'd highly recommend investing the time and effort; it will pay off
> bigtime down the road.  You may also want to do the same for the other
> posters's suggestions.
> 
> Richard
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
> 
> 
> 

-- 
...make every program a filter...