[NTLUG:Discuss] bash question
Greg Edwards
greg at nas-inet.com
Wed Sep 12 17:01:01 CDT 2001
"Wrenn, Bobby J." wrote:
>
> I found a perl script on the Web that did the trick on the file name part.
> Then I pumped the files through:
>
> for i in *.pdf ; do
> DONAME=`basename $i .pdf`
> pdftotext $i $DONAME.txt
> done
>
> Success!
>
> Now all I have to do is get the data out of the text files.
>
> First major hurdle passed.
>
> Thanks again,
> Bobby
>
If the data in the text files contains a label on the same line as the
data you can use grep to extract just those lines.
ie: if "Address:" was a label
grep Address *.txt
would get something like
file1.txt: Address: 123 Somestreet
file2.txt: Address: 456 Anotherstreet
Pipe this to a file and use you DB load process to just concatenate and
save the fields after the second field using space as you delimieter.
--
Greg Edwards
New Age Software, Inc.
http://www.nas-inet.com
More information about the Discuss
mailing list