Happy New Year, everyone.
If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly. This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work. For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts. No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough). This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data. 125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree. Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?
Let's go with optimistic.
What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second. Some experiments with changing the delay were equally optimistic and futile. I bit the bullet and made the delay-between-alerts configurable. What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts). Setting this value to 0 disables sending warnings entirely. This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.
The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea. My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity. This seems reasonable at first scratch but when you dig a little deeper it's not. 85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive. Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least. There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.Click for the rest of the article...
I've updated my .plan file yet again. The usual warnings apply.
Q-hole - noun - A Nyquil hangover.
Here's the situation: You're using Ansible to configure a machine on your network, like a new Raspberry Pi. Ansible has done a bunch of things to the machine and needs to reboot it - for example, when you grow a Raspbian disk image so that it takes up the entire device, it has to be rebooted to notice the change. The question is, how do you reboot the machine, have Ansible pick up where it left off, and do it in one playbook only (instead of two or more)?
I spent the last couple of days searching for specifics and found a number of techniques that just don't work. After some experimentation, however, I pieced together a small snippet of Ansible playbook that does what I need. Because it was such a pain to figure out I wanted to save other folks the same trouble. Here's the code, suitable for copying and pasting into your playbook:
...the first part of your playbook goes here. - name: Reboot the system. shell: sleep 2 && shutdown -r now async: 1 poll: 0 ignore_errors: true - name: Reconnect and resume. local_action: wait_for args: host: bob-newhart port: 22 state: started delay: 10 timeout: 30 ...the rest of your playbook goes here.
Specifics of proof of concept for later reference:
- Ansible v2.7.0
- Raspberry Pi 3
- Raspbian 2018-06-27
This took me a while to figure out, so here's a fix for an annoying problem:
Let's say that you have a media box running Kodi on your local area network. You have uPNP turned on so you can stream videos from your media box across your LAN. You want to use VLC to watch stuff across your LAN.
Problem: When you select your Kodi box in VLC and double-click on the server to open the directory of media to watch, VLC crashes with no error message (even in debug mode).
Explanation: VLC is configured to exit when the current playlist is over. This includes downloading a playlist across the network, and is really irritating.
If you've had your ear to the ground lately, you might have heard that the NIST timekeeping radio station used by devices all over the world as a time reference for Coordinated Universal Time as well as some experiments in signal propagation and geophysical event notices might be on the chopping block in 2019, leaving the HF bands quieter and, let's face it, we can't have nice things. Clocks that rely on this time source signal won't have any way to stay in sync and the inevitable drift due to the imperfections in everything will cause fractions of second to be lost and a fresh outbreak of kinetic pattern baldness. The ultimate effects of this latest bit of clueless petulance on the part of Donald Trump remain to be seen, but it seems likely that this isn't a sexy enough problem to catch brainshare like Y2k did. If you work extensively with computers chances are you're not that worried because your machines use NTP - the Network Time Protocol - to synch their internal clocks with a known time reference server on the Net someplace. Something to consider, however, is whether or not your upstream tier-one and tier-two time sources are actually using the NIST WWV time singnals as their reference signals. There is, however, a nifty way around this: Build your own NTP server that uses a reference time source that can't be shut off as a source, the Global Positioning System.
First, I'll show you how to build your own GPS time server, and then I'll explain why it works.Click for the rest of the article...
ablumeditation - verb - The act of taking a bath or shower to take a break from working on a problem, in the hope that insight will strike.
sharkfinning - verb, gerund - Learning something from scratch in an entirely hands-on way, which is to say, "Swimming with the sharks." When you don't know what you're doing or how to do it, but you have a job to do.
Some time ago I began a search for a decent note-taking tool that I could carry around with me. For many years I was a devotee of the notes.txt file on my desktop, constantly open in a text editor so I could add and refer to it as necessary. When that ceased to scale I turned to software that replicated the legions of sticky notes on my desks at work and home, such as Tomboy. And that worked well enough for a while, but when I started relying upon my mobile more and more for things it too stopped being as useful as I wanted it to be. For about a year I turned to Simplenote, which is pretty much what it says on the tin: It's a note-taking system with a nice web interface, applications on all of the platforms that I use regularly, and even a command line utility which I used to back up my notes a couple of times a day. However, Simplenote is a centralized service and there is always a risk that it could go away at any time. At the very least, the switchover to the Simperium API could have caused problems in the near term for me, and I have enough on my plate these days that I didn't feel like fighting that particular war. So, the search for a replacement that relied more upon my own infrastructure than someone else's began.Click for the rest of the article...