[NTLUG:Discuss] pthreads on Linux

Sat Aug 11 12:59:40 CDT 2001

"Peter A. Koren" wrote:
> 
> Steve Baker wrote:
> >
> My system is a dual CPU machine running an SMP kernel (2.4.3).
> 
> > HOWEVER, that's not a magical guarantee of speedup if (as I suspect
> > here) you are main memory bandwidth bound...the only solution then
> > is to restructure your code so that you are no longer memory bound
> > - or to go to a Beowulf style solution where you are running on
> > multiple computers.  If you *can* restructure your application to
> > run on a Beowulf cluster (and it's NOT a trivial matter to do that)
> > then you have multiple RAM banks and memory busses and thereby
> > increase net RAM bandwidth over the entire distributed system.
> 
> I suspect you are correct about being main memory bandwidth bound. I
> have a few isolated global variables accessed in the thread, but also
> two global floating point arrays each comprised of 17,500 doubles and
> some smaller arrays. Anything which is written to by another thread is
> protected by a mutex or semaphore. One of the large arrays is read only
> and the other is write only. No array element is written to by more
> than one thread.
> 
> The answers given by both the threaded and non threaded versions are
> consistent. They are not identical, because the algorithm is calculation
> order sensitive because of the calls to the random number generator use
> by the algorithm.
> 
> Unfortunately, from a threading standpoint, each thread must randomly
> access memory in the large global arrays, so that there seems to be no
> way to arrange memory optimally for this algorithm.
> 
> The only approach that might have some payoff would be to reduce the
> number of simultaneous threads. This might improve things if the problem
> is related to the thread manager thrashing. This would break the natural
> structure of the code which uses NP parallel threads where NP is the
> population size of each generation. But then threads are expected to be
> painful.
> 
> Thanks Steve,
> 
> Peter Koren
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss

Peter,

>From the discussion I'd hazzard to say that your probably not going to
be able to squeeze out a lot more performance by tweaking the threads. 
You might be able to gain more use of the dual cpus by running 2 single
thread processes and use shared memory.

I have only been partially following the thread so if you mentioned much
about your algorithm I missed it.  You mention the use of a random
number generator?  Are you using this to select your next cell index? 
If so I assume your using an already hit flag to prevent repeats?  I've
run into a similar problem before.  It was an NP that took huge amounts
of cpu and human time to run.

I changed the index selection so that I generated an array of flags
(0..n-1) from which I generated a random array of indexes during the
init process to control cell selection.  Turned the problem from an NP
to near N*2 solution.

  for i = 0..n-1
    if flagarray[rand()]
      then set indexarray[i] = rand and set flagarray[rand] = false
      else use next unused flagarray[]

If your interested, post some info on your algorithm and maybe someone
will have an idea that could help.

-- 
Greg Edwards
New Age Software, Inc.
http://www.nas-inet.com