[NTLUG:Discuss] bash question

Greg Edwards greg at nas-inet.com
Wed Sep 12 17:01:01 CDT 2001


"Wrenn, Bobby J." wrote:
> 
> I found a perl script on the Web that did the trick on the file name part.
> Then I pumped the files through:
> 
> for i in *.pdf ; do
>   DONAME=`basename $i .pdf`
>   pdftotext $i $DONAME.txt
> done
> 
> Success!
> 
> Now all I have to do is get the data out of the text files.
> 
> First major hurdle passed.
> 
> Thanks again,
> Bobby
> 

If the data in the text files contains a label on the same line as the
data you can use grep to extract just those lines.

ie: if "Address:" was a label

grep Address *.txt

would get something like

file1.txt: Address:  123 Somestreet
file2.txt: Address:  456 Anotherstreet

Pipe this to a file and use you DB load process to just concatenate and
save the fields after the second field using space as you delimieter.

-- 
Greg Edwards
New Age Software, Inc.
http://www.nas-inet.com



More information about the Discuss mailing list