Problems Unseen
One of the things that makes the Internet so amazing is how well it works. Sure, it has its moments where things stop working, but these events are usually very isolated. Mother nature might take out some critical fiber optic cable and cut off access to a particular country or region, or one provider might suffer a major outage that impacts tens of thousands (or hundreds of thousands) of customers, but I can't think of a situation in which the entire Internet broke down. It's world-wide, decentralized, and not under any one organization's control (you could make an argument for ICANN, but even their authority only goes so far). The best thing about the way the Internet works is how it is designed. A commonly quoted line says that it is designed to route around failure (or censorship, or atomic blasts, or whatever). It's redundant. It has multiple paths, and that is where the title of this post comes from.
Parts of the Internet, large and small, break all the time, and nobody ever knows. They are unseen problems. One of my responsibilities as a network and server administrator is to help ensure that problems on my little chunk of the Internet remain unseen. We take extraordinary precautions to ensure that when someone needs to access a service we manage, be it a website or an e-mail inbox, it will be there for them all the time. We try to eliminate single points of failure. Many people think of a firewall and envision a single box that filters traffic. We think two boxes which monitor one another, each connected to its own network switch which in turn has its own dedicated connection back to our provider. People think of a hard disk for storage and we think in RAID arrays of disks which stripe data over multiple drives. If one fails, the system notifies us and keeps right on working. When the replacement arrives, we pull the bad drive and replace it without ever turning off the server. Data centers need to be kept cool. They are, after all, just converting electricity into heat 24/7, so air handling is a major part of data center operations. When the A/C goes out, there is temperature monitoring to alert us and people available 24/7 to vent the server room and ensure it stays cool enough for the servers to keep running. Our clients don't care if the air is hot, cold, clear or filled with a purple haze; just that the servers stay online. They don't care if a hard drive failed, or if a firewall goes down. The systems just have to keep working, no matter what.
So it is to all of the people (like me), that keep the systems running, that I raise my glass of mango iced tea and say "Thank You!" Don't forget, System Administrator Appreciation Day is Friday, July 30, 2010 (the last Friday in July each year). Give your favorite sysadmin a nod and know that even when everything appears to be working just fine, they may be in the trenches working on some obscure problem that nobody else will ever know about.
Recent Comments