Crash handlers in Python

03 August 2023

Some weeks ago when I was trying to get the bot that runs my weather station stable, I ran yet again into a problem that for various reasons I hadn't put forth the brainpower to come up with a solution for. Stability implies that a system of some kind doesn't crash, which Weather Station Bot was doing occasionally. Part of this wound up being due to the microSD card Clavicula 1 was running on wasn't well suited to being outside all the time, but part of this was due to bugs in my code that I hadn't quite shaken out yet. While systemfail does a fairly good job of restarting things when they crash, what it's not good at is letting the user know when something crashes or why. In other words, I wanted to know when Weather Station Bot crashed so that I could shell in and take a look around.

Academically, I knew that crash handlers, functions or subsystems that do something when a system trips over its shoelaces and faceplants are a thing. Windows' blue screen of death is a fine example of a crash handler. Rather than just freeze up, leaving the display stuck on the last frame it showed with unresonsive keyboard or mouse, the BSoD gives you at least some idea of what happened 2 and a shot at saving your work before rebooting. Plenty of other software has crash handlers as well, so the question became "How do I write one of my own in Python?"

As it turns out it's pretty easy to do once you know the trick. Tucked away in the sys module is a function named sys.excepthook that you can attach a callback function to. If something terminal happens to the running process sys.excepthook is called by the Python interpreter, which in turn calls the function attached to it. Most of the time this doesn't do anything because there's nothing there by default. If the programmer attaches a callback to it, however, in theory anything could be done as a last ditch effort to save life, limb, and sanity. What I elected to do was stick with the running gag of six robots, all distinct 3 and send myself a message:

# scream_and_die(): A function that is registered with the sys.excepthook()
#   handler, and fires if and when the bot crashes.
def scream_and_die(type, value, traceback):
    logging.critical("Crash handler executed!")
    send_message_to_user("FC: So much for that robot.  Too bad.")

Adding a crash handler was far more simple than I imagined it'd be, to be honest. I'd been putting it off in part because I was concerned about how much work it would be to retrofit the functionality into my bots. In theory I could put anything in that function but what I wanted to do was get notice of a crash so I could examine the system state as quickly as possible (which helped me iron the problems out). Once written it was a simple matter to set scream_and_die() as the callback function and then get back to business:

# Set the crash handler.
sys.excepthook = scream_and_die
logger.debug("scream_and_die() set as sys.excepthook crash handler.")

I'll get around to retrofitting the rest of my bots and the bot template at some point RSN.

  1. The hostname of my weather station. 

  2. If you're a Windows developer, anyway. 

  3. Tip of the pin to MC Frontalot for the turn of phrase