[NTLUG:Discuss] pthreads on Linux

Sat Aug 11 11:21:22 CDT 2001

Steve Baker wrote:
> 
> Chris Cox wrote:
> >
> > I know previously Linus and others claimed that threads were not
> > needed in Linux because they wouldn't buy that much in performance.
> 
> For once, Linus was wrong.  We've used threads at work and they certainly
> do help THE RIGHT KIND OF APPLICATION.
> 
> However, it's not that this is a kind of magic wand.  If you have only
> one CPU and your application is just doing heavy CPU/memory work, then
> splitting it into two threads can only hurt performance.
> 
> But if your application mixes calculations with I/O, then of course
> threading helps - while one thread is doing I/O, the other can use
> the CPU for doing calculations - and vice-versa.
> 

My code is mainly floating point number crunching with only enough I/O
to keep the user aware of how the run is proceeding.

> Then if you have multiple CPU's, a single threaded application can't
> possibly gain any benefit from that second CPU - so you pretty much
> HAVE to have threads to take advantage of it.
> 

My system is a dual CPU machine running an SMP kernel (2.4.3).

> HOWEVER, that's not a magical guarantee of speedup if (as I suspect
> here) you are main memory bandwidth bound...the only solution then
> is to restructure your code so that you are no longer memory bound
> - or to go to a Beowulf style solution where you are running on
> multiple computers.  If you *can* restructure your application to
> run on a Beowulf cluster (and it's NOT a trivial matter to do that)
> then you have multiple RAM banks and memory busses and thereby
> increase net RAM bandwidth over the entire distributed system.

I suspect you are correct about being main memory bandwidth bound. I
have a few isolated global variables accessed in the thread, but also
two global floating point arrays each comprised of 17,500 doubles and
some smaller arrays. Anything which is written to by another thread is
protected by a mutex or semaphore. One of the large arrays is read only 
and the other is write only. No array element is written to by more
than one thread.

The answers given by both the threaded and non threaded versions are
consistent. They are not identical, because the algorithm is calculation
order sensitive because of the calls to the random number generator use
by the algorithm.

Unfortunately, from a threading standpoint, each thread must randomly
access memory in the large global arrays, so that there seems to be no
way to arrange memory optimally for this algorithm.

The only approach that might have some payoff would be to reduce the
number of simultaneous threads. This might improve things if the problem
is related to the thread manager thrashing. This would break the natural
structure of the code which uses NP parallel threads where NP is the
population size of each generation. But then threads are expected to be
painful.

Thanks Steve,

Peter Koren