Algorithm for implementing a dead man's switch.

09 March 2018

So, you're probably wondering why I'm posting this, because it's a bit off of my usual fare.  The reason is I think it would be useful to make available a fairly simple algorithm for implementing a general purpose dead man's switch in whatever language you want, which is to say a DMS that could conceivably do just about anything if it activated.

But what's a dead man's switch?  Ultimately, it's a mechanism that has to be manually engaged at all times if you want something to happen, and if that switch turns off for some reason, something else happens (like a failsafe).  A good example of this is the bar on the handle of a power lawnmower you have to hold down so it'll move while the engine's running.  If you let go of the bar the engine keeps running but the lawnmower doesn't keep rolling forward.  Another example can be found in locomotives; the conductor has to hold down a switch or lever so the engine will pull the train, and if that lever is ever let go (say the engineer has a heart attack or is otherwise incapacitated) the throttle closes and the train will grind to a halt.  More along the lines of what I'll be talking about are the watchdogs found in industrial controllers and realtime operating systems.  While running normally a software process inside the device flips a bit somehow - say, writing a 0 into a certain device node.  If the underlying hardware ever finds that the bit didn't get flipped within a certain period of time it reacts somehow to fix things (for example, it might reboot in an attempt to un-stick the gizmo).

Okay, enough theory, time for the good stuff.  Let's set down some vocabulary:

  • target - the thing the DMS is monitoring
  • payload - the thing the DMS will do if it goes off
  • timeout - the amount of time in seconds that must pass before the DMS can go off
  • delta - difference between the current time and the last time the target was updated
  • killswitch - a thing that can be set that prevents the DMS from going off endlessly (over and over)
  • scheduler - the thing that tells the DMS to check the target

So, the algorithm looks something like this.  With a little work, it can be implemented in pretty much any programming language you like.

  • The scheduler runs the DMS.
  • Get the timeout.  Let's say it's 600 seconds.
  • Find the target and make sure it exists.
  • If the target doesn't exist, throw an error message.  This should get caught and fixed during testing!
  • Get the current time and date and convert it into seconds since some standard epoch.
  • Get the time and datestamp of the last time the target was updated.
  • Convert that datestamp into seconds.
  • Compute delta: (current time in seconds)-(last time the target was updated in seconds)
  • If delta > timeout:
  • Check the existence of the killswitch.  If it exists, terminate because the payload has already run.
  • Run the payload.
  • Set the killswitch to keep the DMS from running over and over again endlessly.  Or don't, if the payload doesn't need to run once and only once.

Something you may wish to do is add a test feature such that you can run your DMS with a certain argument and it'll go through the motions without actually running the payload.  In fact, you may want to go whole-hog and implement unit tests for your DMS so that you can shake it down thoroughly before you decide to rely on it.  You'll also probably want to add more sanity checks than usual, say, of the value of delta and the last-modified time (e.g., make sure they're > 0) just in case something bizarre happens, like filesystem corruption or an RTC failure.  It goes without saying (but I'll say it anyway) that you'll want to test the hell out of both your payload as well as your implementation of the dead man's switch as benignly as possible, just to be safe.

There you go.  Good luck.