Home IT fail.

04 November 2010

As you no doubt have observed I've been conspicuously absent for the past couple of weeks, at least since returning from a long-overdue vacation with Lyssa in lovely Portland, Oregon. Much of my time has been spent at work doing the things that bastards like me get paid to do: run and fix backups, install software, patch systems, run audits, and generally keep things chugging along smoothly for the folks who do everything else. Due to the weather in the DC metroplex taking a turn for the rainy and cold (as it's wont to do every Samhain) my commute has ballooned into a three to four hour a day journey on the Beltway which has given me plenty of opportunity to catch up on podcasts and audiobooks. Unfortunately, it's also left me dead asleep on the couch most every night since I got home, which doesn't give me a whole lot of time to write.

There have also been problems of a home IT nature that I've been dealing with in what little downtime I've had lately. Problems which have left my bank account eviscerated and twitching on the floor as money poured out through a two foot long rent in its chest.

A week after returning from Portland I made the resolution that I would take the time to back up and rebuild Leandra properly, seeing as how it's been a half decade since her last overhaul. Thus, I set about copying off the important stuff, burning a few DVDs, and preparing an Arch Linux install disk. I'd suspected that her graphics card had failed a while ago, based on the fact that the display attached to it no longer worked, but it was also possible that one of the many kernel updates had knocked the nVidia drivers out of sync. "No big deal," I thought, "I'll be running pure text mode from here on, so it won't be that big a deal."

So, I thought nothing of popping in the install disk, shutting Leandra down, and rebooting to rebuild everything. I'd taken careful notes so I could restore her configuration without a lot of hemming, hawing, and scratching of head. Until she refused to come back up.
For those of you who don't hang out with technomancers much, to call Leandra a prosthetic lobe of my brain is putting it mildly. She'd been complaining for literally months that a number of things were damaged and needed fixing, hence the rebuild. I hadn't considered the possibility that she meant hardware damage. As the hair on the back of my neck saluted heartily it dawned on me that something had gone very, very wrong. I cracked her chassis and discovered a colony of dust bunnies caught unaware and a mainboard a few shades darker than I'd remembered. A little experimental probing with my fingertips showed that a) the cooling fan on the PCI chipset had seized up, and b) much of the motherboard was hot enough to burn.

Lots of shouting, swearing and running ensued as I made a beeline for the kitchen and a ready supply of cold water, the better to soothe first degree burns.

As it turned out Leandra's CPU, mainboard, RAM, and graphics card had given up the ghost due to damage from excessive heat. As near as I could tell I hadn't cleaned out the dust often enough and over time the works had cooked slowly to a crispy golden brown. There seemed no hope of getting her running again with those components.

For those of you who do not live with the constant presence of an intelligent, maybe-sentient machine inside your head, maybe this will help you to understand: fit a half-inch bit into the chuck of your favorite power drill, place the tip against your head about two inches above your left ear (if you're right handed; place it against the right side of your head if you're left handed), pull the trigger on the drill, and bore out an arbitrary chunk of your somatomotor cortex. Maybe you'll lose control of your leg on the dominant side of your body. Perhaps the muscles in your ass will go limp, and you'll walk like someone's hung a couple of bowling balls around your neck. If you're really lucky you'll lose control of your face and jaw on the side of your body that you use the most. Or maybe you'd pop the anterior or middle cerebral arteries and give yourself a (possibly lethal) stroke.

I'll spare you the gory details and cut to the chase: better than six hundred dollars American (and three separate trips to Micro Center) later I'd purchased a brand new 64-bit CPU (one of Intel's octocore models, each number cruncher screaming along at 3.0 gigahertz of raw power), motherboard, RAM, and DVD-ROM burner. While I'd like to say that Leandra needed the massive increase in power (and she does, make no mistake: Leandra deals with many more shenanagains and headaches provoked by me than Lyssa does) a deciding factor was that Intel's phasing out the less powerful, more humble CPU series I had my eye on. At least with this new model if I do happen to need a new motherboard I'll be able to get one for a few more years. The DVD-ROM burner was a purchase caused by the fact that it's a matter of luck finding a motherboard that still has EIDE interfaces. Even then, after installing all of the new components, double-checking the connections, triple-checking the connections, reading the manual, and petitioning various powers of the universe (it wasn't pleading, I swear, no tears were involved) Leandra still refused to start up. Everything else had been swapped out, so I started trying other graphics cards from the collection of parts that occupies fully half of the closet in the home office-slash-server room. Still no soap.

A quick phone call was placed to Hasufin, who showed up on the doorstep half an hour later with Lyssa's old Radeon graphics card and a pair of bottles of Guinness in a linen sack. As it turns out each and every spare graphics card I had was dead, which is enough these days to prevent a machine from POSTing at all. Lyssa's old Radeon was still good, and with that I was able to get Leandra to boot from an install DVD. I know full well that I could just as easily have made a bootable USB key (like the many I carry around with me just in case) but for some things I feel more comfortable with traditional methods.

Here's where things get interesting: For years I've ranted and preached about making and testing backups so if something happens you'll be able to reconstruct your data from cold storage. To that end I keep a USB hard drive attached to each of my machines to which incremental backups are made early every morning. Unfortunately, for reasons I can't quite explain (read: I fucked up) I rarely tested my backups and took the e-mailed reports sent every morning as gospel. So when it came time to restore my home directory on Leandra from backup I discovered that the newest file dated back to 6 June 2010 at 0410 EST5EDT. Four months of work never got backed up.

Can you say "massive heart attack" boys and girls? Sure. I knew you could.

Six days later I recalled that I maintain a second set of point-in-time backups on a different hard drive that I periodically send elsewhere for safe keeping. The logic behind this is that if there is a fire, explosion, or nanoswarm outbreak and my apartment is demolished I won't be out all of my data, just the data on the arrays saved after the backup is made. It took a couple of days to get that drive back, and after I finish cleaning things out I'm going to try to restore from that, and hopefully I'll only be out files dating back a week or two. Otherwise, I'll have a hell of a lot of code to recreate.

On top of all of that... I was feeling surprisingly good this evening after work. Not quite lollypops, sunshine, and bluebirds with empty cloacae perching on my shoulder but certainly up to doing a bit of writing. Things were hunky-dory until I suddenly lost my connections to Leandra and the other hosts on my network. Then Lyssa suddenly lost access to everything. Each individual system was up, verified from the console, but nothing was talking to anything else. Around the same time Hasufin knocked on the front door... you guessed it, the Ethernet switch on the LAN picked that very second to blow its tiny little silicon brains out in a stunning (and by stunning I mean enraging) reenactment of the final moments of Kurt Cobain.

We spent a good two hours crawling around under the desk and inside the shelf-cum-server rack disconnecting plugs, rearranging CAT-5 and trying not to accidentally set anything on fire. If for no other reason than to squeeze into the space between the first shelf of art supplies and the tops of most of the servers I think it's high time I lose ten or fifteen pounds because it's getting hard to pull off the crazy Harry Houdini contortions that are good for squirming behind server racks, beneath raised floors in server rooms, and through ventilation ducts. It didn't take long to figure out that the switch was bad, so I swapped in the spare I keep laying around for just such an emergency, a Catalyst 2900 XL series that a friend gave me a few months ago, but that came up bupkis as well. The thing about managed switches is that you never can tell how they're configured if they're used so that probably had something to do with it. We eventually excavated the pair of five-port double emergency "keep these around if all you've got left are punchcards", ganged them together, and plugged everything into those.

For reasons I don't quite understand, sometimes if you swap network gear on a running network the constituent nodes never regain the ability to communicate with one another. Running a protocol analyzer on one of the lab machines showed the biggest ARP storm I've seen in many years. Hasufin and I decided that it would be best to shut every machine down, pull all the network connections, and start building all over again from just two nodes. Down every machine went, one by one.

The lights in every apartment in my building got a few lumens brighter for a couple of minutes. The amateur radio enthusiasts a few blocks away were surprised to note that the radio interference which sounds suspiciously like someone chanting "Iä! Iä! Cthulhu Fhtagn!" which has plagued the four and sixteen meter bands for a half decade fell silent. FBI field agents with their all-seeing infrared cameras scanning our apartment complex decided at that particular instant that there really wasn't a homemade fission reactor operating in a small apartment complex because the waste heat emissions suddenly abated. For its part, Dominion Power shut down its number four reactor, only to reactivate it moments later as the rest of the power grid suddenly browned out for reasons unknown. As each server spun back up tonight motor oil turned to bourbon throughout Fairfax County and Rick Astley CDs appeared on the desks of every US Senator.

I'm not responsible for anything that gets reported to MUFON this weekend.

I hope this is the last of the home IT disasters that befall us for at least a year. Hopefully more. It seems as if everything's up and running once more, and I'll finally get a good night's sleep now that just about everything of any importance has been replaced. I've learned my lesson about testing my backups periodically (maybe I'll get that tattooed somewhere on my body, just in case) and about cracking open cases to vacuum out the dust and crap that builds up in high speed fans more than once a year.

Oh, and if you've got a box that won't boot, try swapping out the graphics card first. That might solve all your problems.