[NTLUG:Discuss] bash question
Steve Baker
sjbaker1 at airmail.net
Wed Sep 12 12:50:44 CDT 2001
"Wrenn, Bobby J." wrote:
>
> If I can get an answer to this I will finally be able to use Linux at work.
>
> I need to take 209 pdf files with spaces in the file names. and convert them
> into text. I am very new to scripting and know nothing about regular
> expressions. Is there an easy way to remove the spaces from the file names?
> Then how do I recursively submit the files to pdftotext with the same name
> except for the .pdf changed to .txt?
Well, as someone already said, you don't have to get rid of the spaces. UNIX
(and Linux) allow any character to be used in a filename (including spaces),
so you just need to enclose the filename in quotes.
For one-time hacks like this, I generally suggest that you do this:
ls -1 *pdf >temp
This puts all your filenames into a list (one to a line) in 'temp'.
Next, I'd use a text editor to change each line:
name with spaces.pdf
...to:
pdftotext "name with spaces.pdf" name_with_spaces.txt
...then just "chmod a+x temp" and run it as a script: "./temp"
If you are good with your text editor, this should be an easy
job that'll take a lot less effort than figuring out the bashology
needed to "do it right". (Although I'm sure it *can* be done using
bash commands).
> Just getting that much done will be a big help. The next step may be
> trickier. I need to extract a name, address, and equipment list from each of
> the files and get it into some kind of database where I can query for total
> by item or item by location.
I have no idea what to do here. I'd generally write a short C program to do
it - but then I'm pretty fast at writing C.
----------------------------- Steve Baker -------------------------------
Mail : <sjbaker1 at airmail.net> WorkMail: <sjbaker at link.com>
URLs : http://web2.airmail.net/sjbaker1
http://plib.sf.net http://tuxaqfh.sf.net http://tuxkart.sf.net
http://prettypoly.sf.net http://freeglut.sf.net
http://toobular.sf.net http://lodestone.sf.net
More information about the Discuss
mailing list