So after finding out about the mini-atx format, and reading a bit about clustering I thought that the via platform would make a great solution for a load balanced set of web servers, at an extremely low pricepoint.
I haven't seen any real benchmarks, but I have seen many people complain about the potential performance that even the 1.2ghz eden processors offer. The upside to the poor performance is the fact that they eat up a tiny amount of power and have the capability of running diskless. There are a lot of eden based thin client systems available out there, but the relatively small amount of server solutions leads me to the conclusion that they simply are not appropriate for mainstream server side implementations.
Outside of the mainstream, however, you can put these puppies to good use.
For example - the way back machine actually runs on a set of 600 via mini-itx boards that manages roughly 2500 disks. It uses about 50kW of power / month and is managed by 1 full time and 1 part time employee. You can actually buy a rack yourself for about $2/gb for 40 or 60 terabytes. In reading a bit about their setup, I came to discover something that I've long held true, but didn't have much information to back me up - that RAID does not scale well. In a highly scalable system like this, RAID will actually introduce more failures than it gets you out of - and if you have several terabytes of storage at your disposal, you are better off keeping live backups of your data than relying on a recovery solution. On the same line of thought - hot swappable storage is also problematic and for the cost, you are much better off turning off a box temporarily in the case of a drive failure. JBOD (Just a Bunch of Disks) is a much more scalable solution and doesn't require you to maintain drives from a single manufacturer in a single size for recovery. You might find a great deal on 100 Hitachi drives this month, but when 20 of them fail 2 years from now, do you really want to have to find the same model to replace them? I think not.
I did some pricing and found that I could reasonably set a rack up - taking the cost of an Intel CPU out of the mix really makes things a lot more reasonable. If for nothing else, you could set up several servers to run lighttpd for static files, memcached for quick and scalable high speed cacheing and firewall/router responsibility. Push it a little further, and you could use them for network attached storage pretty easily and inexpensively.
I also decided to start inspecting commodity hardware availability, which gave me a couple of important impressions. First and foremost - disk drives can be had relatively cheep - but a new drive is going to cost upwards of 29 cents a gigabyte. Lots of used drives in varying capacities are all over the place, but then you have questions as to reliability and performance that wouldn't be acceptable in a server environment.
I have found a few places to look for commodity level memory chips. I've found that in 1GB capacities even old RAM sells for $50+ retail, but can be had for as little as $20 in large quantities from shifty guys in trench coats.
While searching around, I spotted a lot of 100 Pentium 3.2 ghz CPUs and a lot of 200 2.4ghz CPUs that I'm just going to keep bidding on. If I can grab them for under $10/cpu it would be well worth building a rack out and forgetting the whole mini-itx form factor and finding a set of low cost mini-atx motherboards to run with. There's also the idea of finding old motherboard/cpu combos on clearance - there's a lot of sub-$100 celeron and duron stuff sitting on shelfs.
The idea of building a scalable set of nodes truly intrigues me. Especially the idea that I could set up an extremely capable system for under $10K - the real problem is hard disks and their reliability - 20 nodes means 20 disks if you want local storage, but then again you could get away with 40gb disks, but that seems a waste of electricity to me. You could run things off of remote boot and serve files from a NAS, but the idea is highly scalable architecture on a micro budget.
Oh well, just a few thoughts. I'll worry about the whole disk pricing thing if I actually pick up 100 or 200 cpus.
Join The Via, Clustering, and JBOD Discussion
