[NTLUG:Discuss] Clustering

Mike just_mike_y at yahoo.com
Wed Mar 26 20:01:52 CST 2003


> The one thing that the cluster
> cannot share is executing programs. 

Not quite.  a single program can be parsed out to several 
boxes on a cluster under certain conditions.  The majority 
of "clusters" are designed just to do that kind of thing, 
run a single program on X boxes to speed up the result. 

Some conditions required to spread a program over several 
boxes:

o must be designed as a multi thread (that is, multiple 
tasks are setup to run, then all the tasks run, then the 
program comes to a wait state)  

o the tasks must be non-specific to local hardware (that 
is... you can't spread a bunch of tasks that draw polygons 
on one monitor over 16 boxes.)

o other things... it's been several years since I researched 
clusters. 


Clustering Under Linux:

Beowulf: the poor man's supercomputer.

http://www.beowulf.org

For large scale clustering Beowulf is thegrandaddy of 
architectures under linux. Beowulf is a star type cluster, 
processes are started from a single controlling box, and  
are farmed out straight from the controller to the boxes on 
the cluster, who only listen for the master box. 

You effectively have one big computer that has as many boxes 
as you want to add.

Only certain kinds of problems are big enough to use very 
many boxes for long periods of time. (crash simulators, 
Atronomical body simulators, etc.) Unless you have problems 
like that to solve, building a Beowulf is just an 
experiment. 

Beowulf based clusters have been rated within the top 25 
fastest computers in the world at times, and the cost per 
gigaflop for a beowulf cluster is way way below any of the 
commercial solutions (like 1/10,000th the cost to install, 
and 1/1,000 to support). The downside of beowulf is that 
programs must be written specifically for the architecture. 

MOSIX: A "Timeshare" Cluster 

Http://www.mosix.org

For a smaller cluster under linux there's MOSIX, which 
allows CPU, Storage, etc. sharing among medium sized 
clusters.  Mosix is a web type cluster.  That is every box 
on the system can start processes. 

You effectively have as many computers as you have boxes, 
they just run faster because each box can use CPU time from 
any box that's idle. 95% of your CPU resources are idle on 
a desktop system, but there are certain times that you fill 
the stack up and go into major latency... (Like starting 
Mozilla or Open Office.) MOSIX offloads the CPU overflow to 
other boxes that are currently idle. 

When local processes stack up, the box looks for other boxes 
on the cluster with low utilization and farms processes out 
as needed. It differs from Beowulf because processes can 
start from any box on the cluster. so you have a 
multi-access system.  MOSIX is also designed to be below 
the application... it works on many apps in linux without 
any recompiling at all. 

With Mosix, You can effectively double the system speed of  
a group of 4-16 boxes with relatively low implementation 
costs.

The stuff I read on MOSIX back in summer of 1999 was that up 
to 64 boxes made sense  Somewhere around that number of 
boxes, keeping the process lists up to date starts being a 
time hog, and more boxes produces no more effective output. 

In 2000, I had a 3 box MOSIX cluster working (well, sort of. 
when I added number 3 I kept setting something wrong and it 
hung quite frequently. Uptime with 3 boxes was pretty low, 
and I lost interest.) It did improve the startup speed on 
multithreaded apps by quite a bit (star office 5.1 came up 
in 1/10th the time with all 3 processors doing nothing else 
compared to a single processor.) However, It was a pain to 
boot up the extra boxes just to start star office, and 
leaving them all on 100% was a very effective 5000 btu 
heater.  I definitely noticed an increased electricity bill  
usage leaving them on.  Leaving all 3 Cyrix M2 300mhz boxes 
cost an extra 25-50 dollars extra a month to keep the AC 
running (and the AC wasn't keeping up.) This is Texas... a 
home MOSIX cluster might be a better Idea in an arctic 
situation, where system heat is a plus not a minus. 

For a business environment, commercial apps that can run 
native in a linux environment (oracle, Sybase, an 
OpenOffice.org server.) You could significantly lower costs 
by investing some time in Mosix, and clustering several 
desktops instead of buying a beefy server. It really does 
run pretty much invisibly once you dig thru the setup. 
 
 



More information about the Discuss mailing list