[NTLUG:Discuss] OT Perl vs C question

Tom Hayden tom.hayden.iii at mail.airmail.net
Wed Apr 27 10:40:31 CDT 2005


 From what you have told me, it does not SOUND like a coordinated query, 
but I could be wrong. The query you are describing may being going about 
it backwards. It sounds like you are reading the entire central table 
once for each employee and/or date. It might help to sort the central 
table first, before you run the report, by employee/date (or 
date/employee) then write the query/program so it begins reading from 
the first record and calculates and totals as it goes until the 
employee/date changes, writes the results for that employee/date and 
then proceeds to the next employee/date. This way, the query/program 
only reads through the central table ONCE. Depending upon your hardware, 
sorting 500,000 records should take only 15 to 30 minutes. Then reading 
through the entire, sorted, table once, processing as you go, should go 
fairly quickly (possibly under an hour). Doing this may require that the 
query be rewritten as part of a program using something like PERL::DBI 
or Pro*C. In any case, I think it has become clear that the issue is not 
C vs. PERL, but a problem with the query it's self.
Hope this helps.
Fred James wrote:
> The "query" in question (the one that runs for 24 hours to get a 30 day 
> range of data)
> (1) for each employee, for each day within the range
> (1.1) calls 16 different functions one after another, each of which
> (1.1.1) traverses about 42,000 records in one table (cross checking 7 
> other tables as it goes) to calculate and total one of 16 specific 
> categories of elapsed time.
> (1.1.1.1) A one year date range would make that > 500,000 records in the 
> central table, currently.
> 
> Is that a coordinated or non coordinated query, or neither?  I plead 
> ignorance on the subject, but would like to know.
> 
> In any case, 1 simple query against each of the 8 tables for a date 
> range of one year (8 queries total) will complete in about 10 minutes.  
>  From there even AWK could finish it up fairly quick, parsing it out as 
> substrings of fixed width data.
> 
> I didn't mention, and I doubt it makes any difference, but the final 
> output is a CSV file so the numbers people can put it in Excel - that 
> isn't a problem to do.
> 
> The data set has to be run for the full year each time because the start 
> and stop times are continually audited and corrected.
> 
> I do want to say thank you for all the thoughts you people have been 
> willing to share - I very much appreciate any help you may be able to 
> offer.
> Regards
> Fred James
> 
> Chris J Albertson wrote:
> 
>> Yup. Wow is the word I came up with as well.
>> I don't know your structure and don't want to get into a tiff over
>> what-ifs. Your original email made this sound like it was a small, 
>> once in
>> a while type thing. I'm sorry if I oversimplified your architecture. It
>> really sounds to me like you've got much larger problems than what
>> language to write your reports in.
>> If your systems design indeed requires you to process 24 hours to
>> summarize your data, then you may want to think about storing this
>> resultant summary information in a database structure for many people in
>> your org to have access. Those flat files cost a lot of $ to produce each
>> month. If you're willing to spend 24 hours producing the data, it must be
>> important. Make that data usable.
>> Your company may benefit from looking into the possibility of 
>> implementing
>> a data cube (OLAP) design in order to present massive amounts of data 
>> into
>> a high-perfoming, trending analysis engine.
>> Just trying to help.
>> Good luck
>>
>> Chris
>>
>>  
>>
>>> WOW! You could defiantly benefit from some optimization, or it is 
>>> time to
>>> invest in new hardware.  I've run a database with millions of monthly
>>> financial transactions running queries with multiple joins, calculating:
>>> reversals, chargebacks, splits, interchange, surcharge and totals for 
>>> all
>>> the transactions took less that 30 seconds on an AMD xp 1800 with 512MB
>>> RAM.
>>> That was running yearly totals too.  The monthly settlements where much
>>> quicker.
>>>
>>> BTW, sometimes it isn't the query that needs optimization make sure you
>>> are
>>> indexing the fields used in the queries and it should improve your life
>>> drastically.  Think about it, if google can index the entire "visible
>>> internet" and return results to your queries quicker than your 
>>> browser can
>>> render the page you should be able to run your queries in much less than
>>> 24
>>> hours.
>>>
>>> -----Original Message-----
>>> From: discuss-bounces at ntlug.org [mailto:discuss-bounces at ntlug.org]On
>>> Behalf Of Fred James
>>> Sent: Tuesday, April 26, 2005 7:44 PM
>>> To: alb at chrisalbertson.com; NTLUG Discussion List
>>> Subject: Re: [NTLUG:Discuss] OT Perl vs C question
>>>
>>> Chris Albertson
>>> Admitting that I should not be considered a prize SQL programmer, I have
>>> relied upon our SQL developers to produce as efficient a query as
>>> possible.  That said ...
>>>
>>> Evil flat files not withstanding, the SQL has been tested and it runs
>>> for over 24 hours to acquire a 30 days range of data (that test run
>>> being on a weekend, which is non-peak hours for us).  Our final run must
>>> collect a 365 days range of data - estimated running time for the SQL
>>> would therefore be about 304 hours (12.67 days)  We were kind of hoping
>>> to run this query at least once a week, if not daily.
>>>
>>> All sad, but true.
>>> Regards
>>> Fred James
>>>
>>> -- 
>>> Compassion alone stands apart from the continuous traffic between 
>>> good and
>>> evil proceeding within us.  "Om Mani Padme Hum"
>>>
>>>
>>>
>>> _______________________________________________
>>> https://ntlug.org/mailman/listinfo/discuss
>>>
>>>   
>>
>>
>>
>>
>> _______________________________________________
>> https://ntlug.org/mailman/listinfo/discuss
>>
>>
>>  
>>
> 


-- 
Tom Hayden III

Coherent solutions for chaotic situations

tom.hayden.iii at mail.airmail.net
214-435-4174

1531 San Antone ln.
Lewisville Texas 75077




More information about the Discuss mailing list