Migrating to Restic for offsite backups.

Apr 11 2020

20200426: UPDATE: Fixed the "pruned oldest snapshots" command.

A couple of years back I did a how-to about using a data backup utility called Duplicity to make offsite backups of Leandra to Backblaze B2. (referrer link) It worked just fine; it was stable, it was easy to script, you knew what it was doing.  But over time it started to show its warts, as everything does.  For starters, it was unusually slow when compared to the implementation of rsync Duplicity uses by itself.  I spent some time digging into it and benchmarking as many functional modules as I could and it wasn't that.  The bottleneck also didn't seem to be my network link, as much as I may complain about DSL from AT&T.  Even after upgrading Leandra's network interface it didn't really fix the issue.  Encryption before upload is a hard requirement for me but that didn't seem to be bogging backup runs down either upon investigation.  I even thought it might have been the somewhat lesser read performance of RAID-5 on Leandra's storage array adding up, which is one of the reasons I started using RAID-1 when I upgraded her to btrfs.  That didn't seem to make a difference, either.

Ultimately I decided that Duplicity was just too slow for my needs.  Initial full backups aside (because uploading everything to offsite storage always sucks), it really shouldn't take three hours to do an incremental backup of at most 500 megabytes (out of over 30 terabytes).  On top of that, Duplicity's ability to rotate out the oldest backups... just doesn't seem to work. I wasn't able to clean anything up automatically or manually.  Even after making a brand-new full backup (which I try to do yearly regardless of how much time it takes) I wasn't able to coax Duplicity into rotating out the oldest increments and had to delete the B2 bucket manually (later, of course).  So I did some asking around the Fediverse and listed my requirements.  Somebody (I don't remember whom, sorry) turned me on to Restic because they use it on their servers in production.  I did some research and decided to give it a try.

Rigging up Raspbian Buster to run on a Pi-Top

Jan 06 2020

It doesn't seem that long ago that I put together a Pi-Top and started tricking it out to use as a backup system.  It was problematic in some important ways (the keyboard's a bit wonky), but most of all the supported respin of Raspbian for use with the Pi-Top was really, really slow and a bit fragile.  While Windbringer was busy doing a full backup last week I took my Pi-Top for a spin while out and about, and to be blunt it was too bloody slow to use.  At first I figured that the microSD card I was using for the boot device was one of the lower-quality ones that bogs down once in a while, but that turned out not to be the case.  Out of desperation I started looking into possibly upgrading the RasPi in that particular shell to the latest and greatest version, which I happen to have received as a Yule gift last year.  Lo and behold, I was not the only person to think along these lines. (local mirror)  While the article in question talked at some length about the hardware challenges involved (mostly due to the different arrangement of connectors) the software part was the most valuable to me because it answered, concretely and concisely, how to get unmodified Raspbian working with a Pi-Top's unusual control hardware.  So that this information doesn't get lost in the ether I'm going to write up what I did.

Pictures from my trip to San Diego, summer 2019.ev

Nov 29 2019

Last summer my day job sent me down to San Diego, CA to attend the Linux Security Summit and report back.  Unfortunately just about all of the content there intersected in no way, shape, or form with anything we're working on so it was largely a dog wash.  I probably won't attend again because, balancing the cost against the information gotten it just wasn't worth it.  I did, however, take a couple of engineers from Oracle for their first good sushi dinner ever, took an amphibious boat tour of San Diego Bay, and hiked along the waterfront for a couple of hours.

Pictures.

Echoes of popular culture and open source.

Oct 03 2019

(Note: This post is well beyond the seven year limit for spoilers.  If you haven't seen 2001 or 2010 by now, I can't help you.)

Many years ago, as a loomling, one of my very first memories was of seeing the movie 2010: The Year We Make Contact on cable.  That the first 'real' record I ever listened to was the soundtrack to that movie should come as no surprise, but that's not really relevant.  I was quite young so I didn't get most of it, but I remembered enough about it that it gave me some interesting questions (so I thought; I was six, okay?) to ask at the library later.  The thing that struck me the most about the movies was, unsurprisingly, the monolith.  The universal alien device, which manipulated proto-hominids on Earth by teaching them how to hunt, gather, and make war, as well as making unspecified changes to their evolutionary path; which served as a monitoring outpost; which implemented the endpoints of a vast interstellar (intergalactic? interdimensional?) wormhole network; which turned a gas giant into a miniature star.  If you like, the monolith was a universal key to unlock the mysteries of the universe and inspire growth and change.

Many, many years later I was a computer geek in my late teens, just dumb enough to think I knew the right questions to ask, just smart enough to know that I didn't know nearly as much as I should.  I knew that college was coming up one way or another and I'd have to get my ducks in a row to do work there and hopefully get some research done.  I also knew that it wasn't going to be easy.  I'd just graduated from a hotwired Atari microcomputer with a modem to a modest PC clone, a 386 cobbled together out of hand-me-down components, stuff I'd scavenged out of dumpsters, and the odd weekend trip to the computer show.  I knew that there was this thing called Ethernet, and the college I was going to had just started rolling out connections of same to dorm rooms, and it was a pre-req for a comp.sci major.  I also knew that I needed an OS that could connect to the Net somehow, but I didn't have the connections to get my hands on the new hotness back then, nor did Leandra have the specs to run it if I did.

Summer vacation is rapidly coming to an end.

Aug 31 2019

It seems as if another summer is rapidly coming to an end.  The neighbors' kids are now back in school, school buses are now picking their way down the streets, and due to Burning Man coming up it's now possible to eat in a real restaurant in the Bay Area for the next couple of days.  I've been pretty quiet lately, not because I've been spending any amount of time offline but because I've been spending more time doing stuff and just not writing it up.  I've been tinkering with Systembot lately, adding functionality that I really have a need for at home, namely, remotely monitoring a wireless access point running OpenWRT in the same way that I watch the rest of my stuff.  Due to the extreme system constraints on your average high-end wireless access point (2 CPUs, 128 megs of storage, 512 megs of RAM) it's not feasible to install Python and a Halo checkout, so I had to figure out how to get the system stats I need remotely.  What I wound up doing was standing up another copy of the standard OpenWRT web server daemon and writing a bunch of tiny CGI scripts which run local commands and return the information to Systembot for processing and analysis.  It wound up being a fun exercise in working with tight constraints, though I think there are still some bugs to be shaken out.

Accelerating a RAID-5 array with a solid-state hard drive.

May 19 2019

A couple of weeks ago, one of my co-workers mentioned in passing that he'd surprised himself by adding an SSD (solid state drive) to his file server at home.  To recap a bit, Leandra, my primary server at home has a sizable RAID-5 array storing all of my data.  However, one of the tradeoffs is that stuff recently written to the array is a little slow to be read back.  It's really not noticeable unless you're logged in and running commands, and even then the lag is something like one or two seconds.  Noticeable but not actually problematic.  At any rate, I'd been wanting to do some tinkering lately and had an Amazon order planned because I wanted to do some electronic work on my warwalking rig so I figured that, depending on the cost, I might add an SDD to my order.  Much to my surprise, a 120 gigabyte SSD is incredibly cheap, I paid a hair under $20us for a Kingston A400.  Emminently affordable.

Linux on the Dell XPS 15 Touch (9570)

Mar 03 2019

UPDATED: 18 March 2019 - External display adapters that actually work with this model (and Arch Linux) added.

For various reasons, I found that I had a need to upgrade Windbringer's hardware very recently.  This might be the first time that a catastrophic failure of some kind was not involved, so it's kind of a weird feeling to have two laptops side by side, one in process and one to do research as snags cropped up.  This time around I bought a Dell XPS 15 Touch (9570) - I was expecting things to be substantially the same, but this did not seem to be the case.  Some things that I found myself ignoring because I had no use for them aren't in this newer model, and some things have changed as technology has advanced rather a lot in the last five years.

As before, first I'll post the hardware specs, and then follow up with everything I had to tinker with to get working as well as how I went about it.  As usual, I went with 64-bit Arch Linux (2019.02 installation build).

Systembot: Adventures in system monitoring.

Dec 28 2018

If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly.  This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work.  For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts.  No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough).  This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data.  125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree.  Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?

Let's go with optimistic.

What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second.  Some experiments with changing the delay were equally optimistic and futile.  I bit the bullet and made the delay-between-alerts configurable.  What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts).  Setting this value to 0 disables sending warnings entirely.  This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.

The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea.  My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity.  This seems reasonable at first scratch but when you dig a little deeper it's not.  85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive.  Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least.  There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.

Ansible: Reboot the server and pick up where it left off.

Nov 26 2018

Here's the situation: You're using Ansible to configure a machine on your network, like a new Raspberry Pi.  Ansible has done a bunch of things to the machine and needs to reboot it - for example, when you grow a Raspbian disk image so that it takes up the entire device, it has to be rebooted to notice the change.  The question is, how do you reboot the machine, have Ansible pick up where it left off, and do it in one playbook only (instead of two or more)?

I spent the last couple of days searching for specifics and found a number of techniques that just don't work. After some experimentation, however, I pieced together a small snippet of Ansible playbook that does what I need.  Because it was such a pain to figure out I wanted to save other folks the same trouble.  Here's the code, suitable for copying and pasting into your playbook:

...the first part of your playbook goes here.
    - name: Reboot the system.
      shell: sleep 2 && shutdown -r now
      async: 1
      poll: 0
      ignore_errors: true
    - name: Reconnect and resume.
      local_action: wait_for
      args:
        host: bob-newhart
        port: 22
        state: started
        delay: 10
        timeout: 30
...the rest of your playbook goes here.

Specifics of proof of concept for later reference:

  • Ansible v2.7.0
  • Raspberry Pi 3
  • Raspbian 2018-06-27

Build your own time server with a GPS receiver.

Nov 24 2018

If you've had your ear to the ground lately, you might have heard that the NIST timekeeping radio station used by devices all over the world as a time reference for Coordinated Universal Time as well as some experiments in signal propagation and geophysical event notices might be on the chopping block in 2019, leaving the HF bands quieter and, let's face it, we can't have nice things.  Clocks that rely on this time source signal won't have any way to stay in sync and the inevitable drift due to the imperfections in everything will cause fractions of second to be lost and a fresh outbreak of kinetic pattern baldness.  The ultimate effects of this latest bit of clueless petulance on the part of Donald Trump remain to be seen, but it seems likely that this isn't a sexy enough problem to catch brainshare like Y2k did.  If you work extensively with computers chances are you're not that worried because your machines use NTP - the Network Time Protocol - to synch their internal clocks with a known time reference server on the Net someplace.  Something to consider, however, is whether or not your upstream tier-one and tier-two time sources are actually using the NIST WWV time singnals as their reference signals.  There is, however, a nifty way around this: Build your own NTP server that uses a reference time source that can't be shut off as a source, the Global Positioning System.

First, I'll show you how to build your own GPS time server, and then I'll explain why it works.