Architecture
From KallestadWiki
My current web server architecture leaves a lot to be desired. I frequently ponder what the ideal low cost architecture would be for me.
I think in terms of the potential for expansion - including load balancing web servers and running database clusters. I also think in terms of separation of responsibilities. The fewer responsibilities a particular machine has the easier that particular machine becomes to manage.
Right now, I'm thinking also in terms of dedicated storage and how useful that would be. A network attached storage device alleviates a lot of concern for future expansion, but dedicated NAS boxes are expensive. Why not roll my own? It certainly would be useful in my own home, but in the world of web servers and databases I wonder about the performance of network attached storage as opposed to local storage. Even in terms of a fibre channel or gigE connection I'm introducing unnecessary latency by pushing towards an architecture that involves storing data outside the system of responsibility.
And then there's Amazon's network based storage. I would never think of storing a database on an amazon machine, but it has promise as far as backups are concerned as well as storing and accessing large and infrequently accessed files - for instance user attachments in a web forum for threads that have not received traffic in X amount of time.
So just to start off - let's think about three storage mechanisms:
- Local Disk
- NAS Disk
- Amazon Disk
Local Disk
What do we keep on the local disk? I would think that you would keep any very frequently accessed data on a local disk - this would include pretty much all static web elements that are part of a site's template. I would like to store production level data in a local database - user profiles and authentication (LDAP and SQL) and small and centric data sets from applications that would be considered current production information.
NAS Disk
The arbitrary line between what we would store on a local disk and a NAS disk is really a question of performance. I would think that all of your information should be backed up to the NAS. Databases should also be segmented such that large data elements exist on the NAS disk - it would take advantage of read-ahead and write level caching most effectively. I also wonder a bit about placing large static elements on the NAS as well - pdf files and downloads that aren't necessarily part of a static site template. I wonder if there shouldn't be more than one NAS device available - one for high speed database access dedicated to the task and the other for other less used files. That way the database performance would not be affected by other areas of the site.
Amazon Disk
The amazon disk would be used for very rarely accessed large files and backups. Amazon takes a bit of worry out of the equation because their disks are spread across and replicated to multiple geographically disparate machines. You pay for total storage amount and data transfer. Since I can't depend on Amazon for throughput I don't want any files there that would be considered core to a web project, but I wouldn't mind a bit if I was to have to wait a while to download a backup of a photo archive or say an install disk image.
