[NTLUG:Discuss] Distributed processing

Tue Jul 17 22:07:09 CDT 2001

On Tue, 17 Jul 2001 20:07:03 CDT, the world broke into rejoicing as
Greg Edwards <greg at nas-inet.com>  said:
> I've been searching for tools that will do distibuted processing at the
> function level and haven't had much luck.  There are plenty of
> distributed load managers and parallel processing managers (such as
> Beowolf).  The load managers work at the program level and parallel
> process managers at the calculation level.  Neither of these solutions
> answer my needs and multi-threaded has too many drawbacks for a solution
> here.  What I need is a distribution manager that will pass the load at
> the procedural level.
> 
> What I'm trying to do is run an application farm for interactive web
> applications.  This solution would be usable well beyond web
> applications.  The idea is that during the processing of an application
> rather than a single program using a set of libraries on a single box an
> API would allow the function request to be distributed among N boxes
> that support that function.  Not every box would support every function
> available throughout the farm but every box would have knowledge of
> every function in the farm.
> 
> For example, say the application needs to search a database for all
> employees that have 20 years of service and then return that list sorted
> by age of the employee.  The entry application would reach a point of
> needing the data and call the API which in turn would determine the best
> machine in the farm to process the request based on current load and
> data availability.  The request would then be passed to that machine and
> the entry application would go on about its business until the results
> were returned.  During the processing of that request the search for
> employees and the sort may be split among mutiple machines as well.
> 
> I want to eliminate the issues of connection counts, task counts, user
> counts, etc. that a high count of concurrent users can cause.  This will
> also maximize process hueristics such as cache usage, repeated dataset
> processing, heavy math processing, graphic generators, database access,
> etc.
> 
> The basic topology of the farm would be a web server that handles the
> web connections and static pages.  The farm would handle the processing
> and pass dynamic pages back to the web server for delivery as static
> pages.  The web server would determine which entry point in the farm to
> send the initial request.
> 
> I hope this makes since?  Has anyone seen anything along these lines in
> the Linux world?  My target language (initially) is C for performance
> reasons.

This sounds just like a "message queueing" application.

The entry application reaches the point where it needs data; it then
submits a request to for data, throwing it into the "Data Request Queue."

It proceeds with something else.

The machines that are processing Data Requests will then grab requests from
the Data Request Queue, do processing, and send a message back to the
requestor [perhaps putting it in the Requestor's Queue] with the answer.

If you've got different kinds of requests, perhaps with substantially
different urgencies, you might very well have a bunch of queues.

That should make the issue of managing connection counts go away;
supposing server "A" can process five of those requests in a second,
and server "B" only processes 2 per second, they'll happily share the
queue in a rather communistic "From Each According To His Abilities"
manner; they grab queue entries when they're ready, which should allow
to chew CPU time nicely.  [It may be unAmerican, but that doesn't
prevent IBM from selling licenses to this sort of software :-).]

The canonical example of a message queueing system is IBM's MQSeries
framework; it is rather expensive, and perhaps heavier weight than you
want, so you might instead to look at the
available-free-for-many-platforms package called Isect.
<http://pweb.netcom.com/~tgagne/index.html>

Note that if the messages are rather large, communications costs
may be correspondingly high, perhaps prohibitively so.  Building
parallel applications sufficiently intelligently to keep that from
happening is the usual crux of the problem of making parallelism
scale...
--
(concatenate 'string "aa454" "@freenet.carleton.ca")
http://vip.hex.net/~cbbrowne/linux.html
"Real concurrency---in which one program actually continues to
function while you call up and use another---is more amazing but of
small use to the average person.  How many programs do you have that
take more than a few seconds to perform any task?"
-- New York Times, 4/25/89