Dec 28 2018
If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly. This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work. For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts. No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough). This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data. 125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree. Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?
Let's go with optimistic.
What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second. Some experiments with changing the delay were equally optimistic and futile. I bit the bullet and made the delay-between-alerts configurable. What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts). Setting this value to 0 disables sending warnings entirely. This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.
The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea. My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity. This seems reasonable at first scratch but when you dig a little deeper it's not. 85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive. Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least. There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.
Nov 26 2018
Here's the situation: You're using Ansible to configure a machine on your network, like a new Raspberry Pi. Ansible has done a bunch of things to the machine and needs to reboot it - for example, when you grow a Raspbian disk image so that it takes up the entire device, it has to be rebooted to notice the change. The question is, how do you reboot the machine, have Ansible pick up where it left off, and do it in one playbook only (instead of two or more)?
I spent the last couple of days searching for specifics and found a number of techniques that just don't work. After some experimentation, however, I pieced together a small snippet of Ansible playbook that does what I need. Because it was such a pain to figure out I wanted to save other folks the same trouble. Here's the code, suitable for copying and pasting into your playbook:
...the first part of your playbook goes here.
- name: Reboot the system.
shell: sleep 2 && shutdown -r now
- name: Reconnect and resume.
...the rest of your playbook goes here.
Specifics of proof of concept for later reference:
- Ansible v2.7.0
- Raspberry Pi 3
- Raspbian 2018-06-27
Nov 24 2018
If you've had your ear to the ground lately, you might have heard that the NIST timekeeping radio station used by devices all over the world as a time reference for Coordinated Universal Time as well as some experiments in signal propagation and geophysical event notices might be on the chopping block in 2019, leaving the HF bands quieter and, let's face it, we can't have nice things. Clocks that rely on this time source signal won't have any way to stay in sync and the inevitable drift due to the imperfections in everything will cause fractions of second to be lost and a fresh outbreak of kinetic pattern baldness. The ultimate effects of this latest bit of clueless petulance on the part of Donald Trump remain to be seen, but it seems likely that this isn't a sexy enough problem to catch brainshare like Y2k did. If you work extensively with computers chances are you're not that worried because your machines use NTP - the Network Time Protocol - to synch their internal clocks with a known time reference server on the Net someplace. Something to consider, however, is whether or not your upstream tier-one and tier-two time sources are actually using the NIST WWV time singnals as their reference signals. There is, however, a nifty way around this: Build your own NTP server that uses a reference time source that can't be shut off as a source, the Global Positioning System.
First, I'll show you how to build your own GPS time server, and then I'll explain why it works.
Jun 28 2018
So, here's the situation:
On Windbringer, I habitually run LXDE as my desktop environment because it's lightweight and does what I need: It manages windows, gives me a menu, and stays out of my way so I can do interesting things. For years I've been using a utility called GKrellm to implement not only system monitoring on my desktop (because I like to know what's going on), but to set and change my desktop background every 24 hours. However, GKrellm has gotten somewhat long in the tooth and I've started using something different for realtime monitoring (but that's not the point of this post). So, the question is, how do I set my background now? Conky doesn't have that capability.
I tried a few of the old standbys like feh and nitrogen, but they didn't seem to work. The reason for this appears to be that PCmanFM, which is both the file manager and the desktop... stuff... of LXDE. By this, I refer to the desktop icons as well as the background image. As it turns out, nothing I tried to change the background worked, and that is due to the fact that PCmanFM is a jealous desktop module and doesn't let other tools frob the settings it's in charge of. After some tinkering, here's how I did it:
Short form: pcmanfm -w `ls -d -1 /home/drwho/backgrounds/* | shuf -n 1`
Long form (from inside to outside):
ls -d -1 /home/drwho/backgrounds/* - List all of the files in /home/drwho/backgrounds. Show the full path to each file. List everything in a single column.
| - Feed the output of the last command to the input of the next command.
shuf -n 1 - shuf is a little-known GNU Coreutils tool which randomly shuffles whatever things you give it. It only returns one line of output, a randomly chosen image file.
- The output of the previous two commands (captured between back-ticks) is passed to...
pcmanfm -w - Set the current desktop background to whatever filename is passed on the command line as a free action.
To set an initial background when I log in, I added the following command to my ~/.config/lxsession/LXDE/autostart file: @pcmanfm -w `ls -d -1 /home/drwho/backgrounds/* | shuf -n 1`
This means that the command will run every time my desktop starts up. The @ symbol tells lxsession to re-run the command if it ever crashes. However, how do I change my background periodically?
The easiest way to set that up was to set a cron job that runs every day. Every user gets their own set of cron jobs (called a crontab) so you don't need any particular privileges to do this (unless your machine's really locked down). If you've never set a cronjob before, the command I used was this: crontab -e
My cronjob looks like this: 00 10 * * * pcmanfm -w `ls -d -1 /home/drwho/backgrounds/* | shuf -n 1`
"At 10:00 hours every day, run the following command..."
And there you have it. One randomly set desktop background in LXDE.
Incidentally, if you're curious about all the nifty things you can do with cron, I recommend playing around at crontab.guru, it's an online editor for crontab settings. It's good for experimenting in such a way that you don't have to worry about messing up your system, and it's also handy for figuring out particularly arcane cronjobs.
Mar 31 2018
GSCA - acronym, verb - Using grep, sed, cut, and awk on a Linux or UNIX box to chop up, mangle, or otherwise process data on the command line prior to doing anything serious with it. This is not to preclude the use of additional tools (such as sort).
Jan 14 2018
As frequent readers may or may not remember, I rebuilt my primary server last year, and in the process set up a fairly hefty RAID-5 array (24 terabytes) to store data. As one might reasonably expect, backing all of that stuff up is fairly difficult. I'd need to buy enough external hard drives to fit a copy of everything on there, plus extra space to store incremental backups for some length of time. Another problem is that both Leandra and the backup drives would be in the same place at the same time, so if anything happened at the house I'd not only not have access to Leandra anymore, but there's an excellent chance that the backups would be wrecked, leaving me doubly screwed.
Here are the requirements I had for making offsite backups:
- Backups of Leandra had to be offsite, i.e., not in the same state, ideally not on the same coast.
- Reasonably low cost. I ran the numbers on a couple of providers and paying a couple of hundred dollars a month to back up one server was just too expensive.
- Linux friendly.
- My data gets encrypted with a key only I know before it gets sent to the backup provider.
- A number of different backup applications had to support the provider, in case one was no longer supported.
- Easy to restore data from backup.
After a week or two of research and experimentation, as well as pinging various people to get their informed opinions, I decided to go with Backblaze as my offsite backup provider, and Duplicity as my backup software. Here's how I went about it, as well as a few gotchas I ran into along the way.