It seems as if another summer is rapidly coming to an end. The neighbors' kids are now back in school, school buses are now picking their way down the streets, and due to Burning Man coming up it's now possible to eat in a real restaurant in the Bay Area for the next couple of days. I've been pretty quiet lately, not because I've been spending any amount of time offline but because I've been spending more time doing stuff and just not writing it up. I've been tinkering with Systembot lately, adding functionality that I really have a need for at home, namely, remotely monitoring a wireless access point running OpenWRT in the same way that I watch the rest of my stuff. Due to the extreme system constraints on your average high-end wireless access point (2 CPUs, 128 megs of storage, 512 megs of RAM) it's not feasible to install Python and a Halo checkout, so I had to figure out how to get the system stats I need remotely. What I wound up doing was standing up another copy of the standard OpenWRT web server daemon and writing a bunch of tiny CGI scripts which run local commands and return the information to Systembot for processing and analysis. It wound up being a fun exercise in working with tight constraints, though I think there are still some bugs to be shaken out.
I spend a lot of time digging around in other people's data. If I'm not hunting for anything in particular then it's a bit of a crapshoot, to be honest, if only because you never know what you're in for. You can pretty much take it to the bank that if you didn't assemble it yourself, you can't count on it being complete, well formed, or anything approximating the output of a human being (it usually came out of a database, but I think you see what I'm getting at). Sometimes, if I'm really lucky I'll just get hold of a JSON dump of the database, which to be fair is better than nothing when there isn't even an API to use. From time to time I'll make an attempt at fitting the data into a database of some kind, sometimes MySQL, sometimes SQLite, or occasionally an API layer like Sandman2. This is all well and good, but it winds up being more of an adventure than I'm looking for. I'd much rather be Indiana Jones prowling around in the temple than Rambo going through a preparation montage because Indy was actually getting stuff done.
Wow, this article went a little off the rails. I was never good at writing intros to new code... anyway.
Longtime readers are aware that I've been a customer of Dreamhost for quite a few years now, and by and large they've done all right by me. They haven't complained (much) about all the stuff I have running there, and I try to keep my hosted databases in good condition. However, the server they have my stuff on is starting to act wonky. Periodic outages mostly, but when my Wallabag installation started throwing all sorts of errors and generally not working right, that got under my skin in a fairly big hurry. I reinstalled. I upgraded to the latest stable release. I installed the latest commit from the source code repository. 401 and 500 errors as far as the eye could see whenever I tried to do anything regardless of what I did.
In a misguided attempt to figure out what was going on, I bit the bullet and installed PHP on one of my servers, along with all of the usual dependencies and tried to replicate my setup at Dreamhost. What that was a bit tricky and took some debugging I eventually got it to work. It was getting my data out of the sorta-kinda-broken setup that proved troublesome.
If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly. This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work. For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts. No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough). This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data. 125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree. Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?
Let's go with optimistic.
What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second. Some experiments with changing the delay were equally optimistic and futile. I bit the bullet and made the delay-between-alerts configurable. What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts). Setting this value to 0 disables sending warnings entirely. This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.
The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea. My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity. This seems reasonable at first scratch but when you dig a little deeper it's not. 85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive. Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least. There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.
As the title of this post implies, I've been working on some stuff lately that's been taking up enough compute cycles that I haven't been around to post much. Some of this is due to work, because we're getting into the really busy time of year and when I haven't been at work I've been relaxing. Some of this is due to yet another run of dental work that, while it hasn't really been worth writing about has resulted in my going to bed and sleeping straight through until the next day. And some of it's due to my hacking on a new project that wound up being... not as hard as I'd imagined it would be, but there certainly has been a steep learning curve.
A couple of weeks ago I ran into some of the functional limits of my web search bot, a bot that I wrote for my exocortex which accepts English-like commands ("Send me top 15 hits for HAL 9000 quotes.") and runs web searches in response using the Searx meta-search engine on the back end. This is to say that I gave my bot a broken command ("Send hits for HAL 9000 quotes.") and the parser got into a state where it couldn't cope, threw an exception, and crashed. To be fair, my command parser was very brittle and it was only a matter of time before I did something dumb and wrecked it. At the time I patched it with a bunch of if..then checks for truncated and incorrect commands, but if you look at all of the conditionals and ad-hoc error handling I probably made the situation worse, as well as much more difficult to maintain in the long run. Time for a rewrite.
Back to my long-term memory field. What to do?
I knew from comp.sci classes long ago that compilers use things called parsers and grammars to interpret code so that it can be converted into an executable. I also knew that the parser Infocom used in its interactive fiction was widely considered to be the best anyone had come up with in a long time, and it was efficient enough to run on humble microcomputers like the C-64 and the Apple II. For quite a few years I also ran and hacked on a MOO, which for the purposes of this post you can think of as a massive interactive fiction environment that the players can modify as well as play in; a MOO's command parser does pretty much the same thing as Infocom's IF parser but is responsive to the changes the user's make to their environments. I also recalled something called a parse tree, which I sort-of-kind-of remembered from comp.sci but because I'd never actually done anything with them, I only recalled a low-res sketch. At least I had someplace to start from so I turned my rebooted web search bot loose with a couple of search terms and went through the results after work. I also spent some time chatting with a colleague whose knowledge of the linguistics of programming languages is significantly greater than mine and bouncing some ideas off of him (thanks, TQ!)
But how do I do something with all this random stuff?
A couple of days ago I got it into my head to upgrade one of my Exocortex servers from Ubuntu Server 14.04 LTS to 16.04 LTS, the latest stable release. While Ubuntu long-term support releases are good for a couple of years (14.04 LTS would be supported until at least 2020) I had some concerns about the packages themselves being too stale to run the later releases of much of my software. To be more specific, I could continue to hope that the Ruby and Python interpreters I have installed could be upgraded as necessary but at some point the core system libraries would be too old and they'd no longer compile. Not good for long-term planning.
First off, whenver you're about to do a major upgrade of anything, read the release notes so you know what you're getting yourself into. You'll also usually find some notes about all the new goodies you'll be able to play with.
In the past I've had nothing but trouble using the documented Ubuntu release upgrade process, so much so that I've had clients sign "I told you so," documents when they pressured me to do so because the procedure could reliably be expected to leave the system completely trashed, and a full rebuild was the only recourse. This time I set up a testbed in Virtualbox which consisted of a fully patched Ubuntu Server 14.04.5 LTS install. I ran through the documented upgrade process, and much to my surprise it went smoothly, leaving me with a functional virtual machine at the end of a 45 minute procedure (most of which was automatic, I only had to answer a few questions along the way). The process consisted of logging in as the root user (sudo -s) and running the updater (do-release-upgrade).
So, if it's so easy, why am I writing a blog post about it? Why worry?
Why worry, indeed. Read on.
In my last post on the topic of exocortices I discussed the Huginn project, how it works, what the code for the agents actually look like, and some of the stuff I use Huginn's agent networks for for in my everyday life. In short, I call it my exocortex - an extension of the information processing capabilities of my brain running in silico instead of in vivo. Now I'm going to talk about Exocortex Halo, a separate suite of bots which augment Huginn to carry out tasks that Huginn by itself isn't designed to carry out very easily, and thus extend my personal capabilities significantly.
Now, don't get me wrong, Huginn has a fantastic suite of agents built into it already and more are being added every day. However, good design techniques require one to realize when an existing software architecture is suited for some things and not others, and allowances should be made for that. To put it another way, it was highly unlikely that I would be able to shoehorn the additional functionality I wanted into Huginn and have a hope in hell of it working. However, what Huginn has a multitude of are interfaces for getting events into and out of itself, and I could make use of those interfaces for plugging my own bots into it. The Website Agent is ideal for pinging REST API interfaces of my own design; Jabber Agent implements a simple XMPP client which can send events to an address on an XMPP server (assuming that it has its own login credentials); oversimplifying a bit, Webhook Agent basically sets up a custom REST API rail that external software can use to send events into Huginn for processing; Data Output Agent is used for sending events out of Huginn in the form of an RSS feed or a JSON document that can be consumed and parsed by other software.