Problems Unseen

Posted on July 27, 2010 by: Justin Scott 0 Comments

One of the things that makes the Internet so amazing is how well it works.  Sure, it has its moments where things stop working, but these events are usually very isolated.  Mother nature might take out some critical fiber optic cable and cut off access to a particular country or region, or one provider might suffer a major outage that impacts tens of thousands (or hundreds of thousands) of customers, but I can't think of a situation in which the entire Internet broke down.  It's world-wide, decentralized, and not under any one organization's control (you could make an argument for ICANN, but even their authority only goes so far).  The best thing about the way the Internet works is how it is designed.  A commonly quoted line says that it is designed to route around failure (or censorship, or atomic blasts, or whatever).  It's redundant.  It has multiple paths, and that is where the title of this post comes from.

Parts of the Internet, large and small, break all the time, and nobody ever knows.  They are unseen problems.  One of my responsibilities as a network and server administrator is to help ensure that problems on my little chunk of the Internet remain unseen.  We take extraordinary precautions to ensure that when someone needs to access a service we manage, be it a website or an e-mail inbox, it will be there for them all the time.  We try to eliminate single points of failure.  Many people think of a firewall and envision a single box that filters traffic.  We think two boxes which monitor one another, each connected to its own network switch which in turn has its own dedicated connection back to our provider.  People think of a hard disk for storage and we think in RAID arrays of disks which stripe data over multiple drives.  If one fails, the system notifies us and keeps right on working.  When the replacement arrives, we pull the bad drive and replace it without ever turning off the server.  Data centers need to be kept cool.  They are, after all, just converting electricity into heat 24/7, so air handling is a major part of data center operations.  When the A/C goes out, there is temperature monitoring to alert us and people available 24/7 to vent the server room and ensure it stays cool enough for the servers to keep running.  Our clients don't care if the air is hot, cold, clear or filled with a purple haze; just that the servers stay online.  They don't care if a hard drive failed, or if a firewall goes down.  The systems just have to keep working, no matter what.

So it is to all of the people (like me), that keep the systems running, that I raise my glass of mango iced tea and say "Thank You!"  Don't forget, System Administrator Appreciation Day is Friday, July 30, 2010 (the last Friday in July each year).  Give your favorite sysadmin a nod and know that even when everything appears to be working just fine, they may be in the trenches working on some obscure problem that nobody else will ever know about.

Pro Webmasters

Posted on July 8, 2010 by: Justin Scott 1 Comments

As I've mentioned here before, I'm a regular contributor to ServerFault, a site for system and network administrators run by the same crew that runs Stack Overflow (which is for programmers).  They've recently begun to spawn some additional sites, and the latest one to go into private beta is sure to be a winner as well.  A week from now, the Pro Webmasters site will go into public beta.  This new site is meant for professional webmasters and those who's lives revolve around HTML and managing websites.  I'm jumping into the private beta and helping to seed the site with questions and answers.  For anyone who works on websites for a living, this site is sure to be a hit.  Come check it out next week when the public beta opens.

Time for New Batteries

Posted on July 6, 2010 by: Justin Scott 0 Comments

Summer is here, and it's brought daily rain showers and thunderstorms with it.  One of the joys of living in parts of Florida is the constant glitches on the power grid. Last week our power was out for a few hours due to a problem with something on the pole right outside our house.  FPL was out fairly quickly to diagnose the problem, and when the fixed the problem and reconnected the line there was a heck of a spark.  Here's the FPL repair technician working on the line.

FPL working on the lines outside our house.

For a couple of days after that we had random brownouts where the power would drop off for a few seconds and then come back on.  This isn't a big deal unless you're doing something that can be interrupted (like watching a DVD like we were doing at home yesterday evening).  Even at work this morning the power flickered for a few minutes shortly after I arrived.

To combat these issues, I generally put battery backup units on the electronics.  Our Verizon FiOS box has a battery built in, but it only powers the telephone portion of the box (which we don't use), so I have my own battery backup on it to keep the Internet alive.  I have another in the office to run the cable modem and computer monitor, and yet another on the TV, cable box, and DVD player in the living room.  Unfortunately, some of my boxes are several years old and the batteries are worn out (as we found out yesterday).  So, it's time to make a trek out to Batteries Plus and acquire replacement batteries for my backup systems.  Hopefully the rain and storms will give us a short reprieve so the electronics won't get too stressed from all the interruptions (the lawn could use a few days to dry out as well).