[NTLUG:Discuss] OT Perl vs C question

Wed Apr 27 01:35:08 CDT 2005

On Tue, Apr 26, 2005 at 03:30:10PM -0500, Fred James wrote:
> So, my question:  In a moderate volume data processing project, say 
> reading 7 flat files of 3 to 7 fields each, and > 500,000 records each, 
> and doing something like steps 2 and 3 (above), how does Perl compare to 
> C in terms of speed?

Others have answered many other aspects of your question -- I'll just
add a slightly different data point.  Often the question isn't "how
fast is Perl vs. C", but rather "will Perl (or C) be fast enough to
meet my needs?"  If what you need done can be done fast enough in Perl,
then that's the way you really want to go.

That said, at A&M-CC I directed a project that collects environmental
data from a network of stations along the Gulf coast.  Each station
collects 5-10 channels of data of hourly records, and we did all of
our major processing and reporting using Perl on flat file formats,
similar to what you describe.  Several of our perl applications 
were regularly used to graph or otherwise process long
time series of hourly records (i.e., five years or more of hourly
records with ten data points per record).  This easily translates
into millions of records that would be processed in under 30
minutes, and the processing/graphing was *much* more involved 
than computing simple sums.

All of this was done on a medium-range Linux PC, i.e., 1 GHz processor
and also running a somewhat heavily loaded webserver (also serving
perl cgi-bin scripts).

So, I'd say that unless you need lightning-fast results, Perl can
handle what you described.  Even if you do need lightning-fast
processing, there are some fast mechanisms in Perl for loading and
processing the data that can give you the speed advantages of C while
keeping Perl's flexibility.  (We had another application in Perl/PDL that 
would load and solve linear equations of multiple 8760 x 62 element 
matrices of sines and cosines and lookups into external tables.  
It actually took at lot longer to load the data than to solve 
the equation, but total runtime for the application was only a few 
seconds.)

In the scenario you propose, the expensive part will be scanning the
files and extracting the data.  But rather than try to a-priori figure
out if Perl is fast enough, I'd just write a quick Perl program to see
how long it takes to scan the files and extract the data you need
from each.  That will give you a rough idea of the overall time required
without having to develop the full application.

Pm