Distributed compiling and clustering
Compiling large amounts of code takes time, sometimes even massive amounts of time. Some of us see the test-build as a well deserved coffee break, but after a while it gets old. Having to wait for a long time, in some cases hours, for a compile to finish, is not only counter-productive but makes debuging in many cases alot harder. (eg: you can't just make a quick change in your code, quickly run it, and see what it does.) The solution: Gather around those old computers you have lying around everywhere (and I'm sure many of us do), and set up a little compile farm to help your computer crunch some code.
The tools: Finding the right tool for the job can be a bit tricky, and reading up on clustering might make your head explode initially, but there are truely some great tools out there that make this all to easy, and that's what this review is about.
In my short adventure through the distributed compiling and clustering world, I have run in to quite a few options:
The first was building a
beowulf cluster with tools such as heartbeat (
http://www.linux-ha.org/).
From wikipedia(
http://en.wikipedia.org/wiki/Beowulf_cluster):
A Beowulf cluster is a group of usually identical PC computers running a FOSS Unix-like operating system, such as GNU/Linux or BSD. They are networked into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among them.
Unfortionally not all my computers are identical so this was not an option.
Then there is openMosix, which is a kernel patch for linux that lets you share cpu power and memory over any number of machines, or as wikipedia (
http://en.wikipedia.org/wiki/OpenMosix) describes:
openMosix is a free cluster management system that provides single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allows program processes (not threads) to migrate to machines in the node's network that would be able to run that process faster. It is particularly useful for running parallel and intensive input/output (I/O) applications. It is released as a Linux kernel patch, but is also available on specialized LiveCDs and as a Gentoo Linux kernel choice.
And last but not least there is distcc. Distcc is actually the only one that will work with windows. Distcc is different from all of the above as it focuses on distributed compiling rather than regular clustering. It requires very little setup. You can use it togeather with ccache, which makes it even faster. From wikipedia(
http://en.wikipedia.org/wiki/Distcc):
distcc works as an agent for the compiler. A distcc daemon has to run on each of the participating machines. The originating machine invokes a preprocessor to handle source files and sends the preprocessed source to other machines over the network via TCP. Remote machines compile those source files without any local dependencies (such as header files or macro definitions) to object files and send them back to the originator for further compilation.
Note that none of the above requires any tampering with makefiles or creating complex build scripts.
The results: openMossix had a fairly easy setup ( just configure / install the kernel and run the daemon ) and did seem to do a good job with applications that are cpu-intensive (such as a compile job) however I ran into some problems now and then, getting segmentation faults. I assume it's my fault, but after playing with it for a day I was ready to try something new.
Distcc was
VERY impressive. It seems like the perfect tool for the job. Setting it up was very easy (just install distcc, and set it as your default compiler, it has a configuration tool that sets the participating hosts, and you just have to start the distccd daemon specifying which ip's to allow) and it worked right away. Required though is that your build-environment has the same versions of things. (like same version of gcc, ld, etc,..) but that's quite easy to deal with. I must say the speedup was significant. distcc comes with a monitoring tool (openMosix does too) that shows the running jobs on the farm. Now I can finally compile things on my slow computer, taking advantage of the speed of my faster computer.
I tested distcc with one machine running windows (distcc running in cygwin) and the other running gentoo linux (
http://www.gentoo.org ). Because both platforms were different I had to set up a cross compiling envoronment (binutils come in handly) which worked out just fine. Later I tried it with one machine running gentoo, the other gentoo on vmware with windowsXP host. I must say this was the easyest of all, as there was no additional setup needed for cross-compilation.
I was also reading that you can set up distcc to run
on openMosix, but i did not get into that. (I'm curious as to what the difference would be in benchmark results with just distcc and distcc+openMosix)
Conclusion: Distcc seems to be the best tool for the job, and to save yourself some cross-compiling trouble, the uber easyest is to set it up on the same platform.
Screenshots: http://images.google.com/images?q=openmosix&svnum=10&hl=en&lr=&client=firefox-a&rls=org.mozilla:en-US:official&sa=N&imgsz=xxlarge