Jan 11 2020
A couple of years ago I spent some time trying to set up Matrix, a self-hosted instant messaging and chat system that works a little like Jabber, a little like IRC, a little like Discord and a little like Slack. The idea is that anyone can set up their own server which can federate with other servers (in effect making a much larger network), and it can be used for group chat or one-on-one instant messaging. Matrix also has voice and video conferencing capabilities so you could hold conference calls over the network if you wanted. For example, one possible use case I have in mind is running games over the Matrix network. You could even build more exotic forms of conferencing on top of Matrix if you wanted to. Even more handy is that the Matrix protocol supports end-to-end encryption of message traffic between everyone in a channel as well as between private chats between pairs of people. If you turn encryption on in a channel it can't be turned off; you'd have delete the channel entirely (which would then cause the chat history to be purged).
Chat history is something that was a stumbling block in my threat model the last time I ran a Matrix server, somewhen in 2016. Things have changed quite a bit since then. For usability Matrix servers store chat history in their database, in part as a synchronization mechanism (channels can exist across multiple servers at the same time) and in part to provide a history that users can search through to find stuff, especially if they've just joined a channel. For some applications, like collaboration inside a company this can be a good thing (and in fact, may be legally required). For other applications (like a bunch of sysadmins venting in a back channel), not so much. This is why Matrix has three mechanisms for maintaining privacy: End to end encryption of message traffic (of entire channels as well as private chats), peer-to-peer voice and video using WebRTC (meaning that there is no server that can record the traffic, it merely facilitates the initial connection), and deleting the oldest chat logs from the back-end database. While it is true that there is no guarantee that other servers are also rotating out their message databases, end-to-end encryption helps ensure that only someone who was in the channel would have the keys to decrypt any of it. It also seems feasible to set up Matrix channels such that all of the users are on a single server (such as an internal chat) which means that the discussion will not be federated to other servers. Channels can also be made invite-only to limit who can join them. Additionally, who can see a channel's history and how much of it can be set on a by-channel basis.
For the record, on the server I built for writing this article the minimum lifetime of conversation history is one calendar day, and the maximum lifetime of conversation history is seven calendar days. If I could I'd set it to Signal's default of "delete everything before the last 300 messages" but Synapse doesn't support that so I tried to split the difference between usability and privacy (maybe I should file a pull request?) A maintenance mole crawls through the database once every 24 hours and deletes the oldest stuff. I could probably make it run more frequently than that but I don't yet know what kind of performance impact that would have.
One of the things I'm going to do in this article is gloss over the common fiddly stuff. I'm not going to explain how to create an account on a server because I'm going to assume that you know how to look up instructions for doing that. Hell, I google it from time to time because I don't do it often. I'm also going to break this process up into a couple of articles. This one will give you a basic, working install of Synapse (a minimum viable server, if you like). I also won't go over how to install Certbot (the Let's Encrypt client) to get SSL certificates even though it's a crucial part of the process. I will explain how to migrate Synapse's database off of SQLite and over to Postgres for better performance in a subsequent article. For what it's worth I have next to no experience with Postgres, so I'm figuring it out as I go along. Seasoned Postgres admins will no doubt have words for me. After that I'll talk about how to make Matrix's VoIP functionality work a little more reliably by installing a STUN server on the same machine. Later, I'll go over a simple integration of Huginn with a Matrix server (because you just know it's not a technical article unless I bring Huginn into it).
A piece of advice: Don't try to go public with a Matrix server all at once. The instructions are complex and problematic in places, so this article is written from my notes. Take your time. If you rush it you will screw it up, just like I did. Get what you need working, then move on to the next bit in a day or so. There's no rush.
Nov 20 2019
This is the initial announcement of a new project pwnagotchi-bt-power (mirrored at Gitlab), a short utility written in Python which will cleanly shut down a Pwnagotchi from an Android phone over Bluetooth using the Bluedot app.
Share and enjoy!
Aug 31 2019
It seems as if another summer is rapidly coming to an end. The neighbors' kids are now back in school, school buses are now picking their way down the streets, and due to Burning Man coming up it's now possible to eat in a real restaurant in the Bay Area for the next couple of days. I've been pretty quiet lately, not because I've been spending any amount of time offline but because I've been spending more time doing stuff and just not writing it up. I've been tinkering with Systembot lately, adding functionality that I really have a need for at home, namely, remotely monitoring a wireless access point running OpenWRT in the same way that I watch the rest of my stuff. Due to the extreme system constraints on your average high-end wireless access point (2 CPUs, 128 megs of storage, 512 megs of RAM) it's not feasible to install Python and a Halo checkout, so I had to figure out how to get the system stats I need remotely. What I wound up doing was standing up another copy of the standard OpenWRT web server daemon and writing a bunch of tiny CGI scripts which run local commands and return the information to Systembot for processing and analysis. It wound up being a fun exercise in working with tight constraints, though I think there are still some bugs to be shaken out.
Jul 28 2019
I spend a lot of time digging around in other people's data. If I'm not hunting for anything in particular then it's a bit of a crapshoot, to be honest, if only because you never know what you're in for. You can pretty much take it to the bank that if you didn't assemble it yourself, you can't count on it being complete, well formed, or anything approximating the output of a human being (it usually came out of a database, but I think you see what I'm getting at). Sometimes, if I'm really lucky I'll just get hold of a JSON dump of the database, which to be fair is better than nothing when there isn't even an API to use. From time to time I'll make an attempt at fitting the data into a database of some kind, sometimes MySQL, sometimes SQLite, or occasionally an API layer like Sandman2. This is all well and good, but it winds up being more of an adventure than I'm looking for. I'd much rather be Indiana Jones prowling around in the temple than Rambo going through a preparation montage because Indy was actually getting stuff done.
Wow, this article went a little off the rails. I was never good at writing intros to new code... anyway.
May 30 2019
Longtime readers are aware that I've been a customer of Dreamhost for quite a few years now, and by and large they've done all right by me. They haven't complained (much) about all the stuff I have running there, and I try to keep my hosted databases in good condition. However, the server they have my stuff on is starting to act wonky. Periodic outages mostly, but when my Wallabag installation started throwing all sorts of errors and generally not working right, that got under my skin in a fairly big hurry. I reinstalled. I upgraded to the latest stable release. I installed the latest commit from the source code repository. 401 and 500 errors as far as the eye could see whenever I tried to do anything regardless of what I did.
In a misguided attempt to figure out what was going on, I bit the bullet and installed PHP on one of my servers, along with all of the usual dependencies and tried to replicate my setup at Dreamhost. What that was a bit tricky and took some debugging I eventually got it to work. It was getting my data out of the sorta-kinda-broken setup that proved troublesome.
Dec 28 2018
If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly. This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work. For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts. No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough). This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data. 125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree. Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?
Let's go with optimistic.
What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second. Some experiments with changing the delay were equally optimistic and futile. I bit the bullet and made the delay-between-alerts configurable. What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts). Setting this value to 0 disables sending warnings entirely. This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.
The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea. My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity. This seems reasonable at first scratch but when you dig a little deeper it's not. 85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive. Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least. There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.
Sep 18 2018
As the title of this post implies, I've been working on some stuff lately that's been taking up enough compute cycles that I haven't been around to post much. Some of this is due to work, because we're getting into the really busy time of year and when I haven't been at work I've been relaxing. Some of this is due to yet another run of dental work that, while it hasn't really been worth writing about has resulted in my going to bed and sleeping straight through until the next day. And some of it's due to my hacking on a new project that wound up being... not as hard as I'd imagined it would be, but there certainly has been a steep learning curve.
Feb 04 2017
A couple of weeks ago I ran into some of the functional limits of my web search bot, a bot that I wrote for my exocortex which accepts English-like commands ("Send me top 15 hits for HAL 9000 quotes.") and runs web searches in response using the Searx meta-search engine on the back end. This is to say that I gave my bot a broken command ("Send hits for HAL 9000 quotes.") and the parser got into a state where it couldn't cope, threw an exception, and crashed. To be fair, my command parser was very brittle and it was only a matter of time before I did something dumb and wrecked it. At the time I patched it with a bunch of if..then checks for truncated and incorrect commands, but if you look at all of the conditionals and ad-hoc error handling I probably made the situation worse, as well as much more difficult to maintain in the long run. Time for a rewrite.
Back to my long-term memory field. What to do?
I knew from comp.sci classes long ago that compilers use things called parsers and grammars to interpret code so that it can be converted into an executable. I also knew that the parser Infocom used in its interactive fiction was widely considered to be the best anyone had come up with in a long time, and it was efficient enough to run on humble microcomputers like the C-64 and the Apple II. For quite a few years I also ran and hacked on a MOO, which for the purposes of this post you can think of as a massive interactive fiction environment that the players can modify as well as play in; a MOO's command parser does pretty much the same thing as Infocom's IF parser but is responsive to the changes the user's make to their environments. I also recalled something called a parse tree, which I sort-of-kind-of remembered from comp.sci but because I'd never actually done anything with them, I only recalled a low-res sketch. At least I had someplace to start from so I turned my rebooted web search bot loose with a couple of search terms and went through the results after work. I also spent some time chatting with a colleague whose knowledge of the linguistics of programming languages is significantly greater than mine and bouncing some ideas off of him (thanks, TQ!)
But how do I do something with all this random stuff?
Oct 29 2016
A couple of days ago I got it into my head to upgrade one of my Exocortex servers from Ubuntu Server 14.04 LTS to 16.04 LTS, the latest stable release. While Ubuntu long-term support releases are good for a couple of years (14.04 LTS would be supported until at least 2020) I had some concerns about the packages themselves being too stale to run the later releases of much of my software. To be more specific, I could continue to hope that the Ruby and Python interpreters I have installed could be upgraded as necessary but at some point the core system libraries would be too old and they'd no longer compile. Not good for long-term planning.
First off, whenver you're about to do a major upgrade of anything, read the release notes so you know what you're getting yourself into. You'll also usually find some notes about all the new goodies you'll be able to play with.
In the past I've had nothing but trouble using the documented Ubuntu release upgrade process, so much so that I've had clients sign "I told you so," documents when they pressured me to do so because the procedure could reliably be expected to leave the system completely trashed, and a full rebuild was the only recourse. This time I set up a testbed in Virtualbox which consisted of a fully patched Ubuntu Server 14.04.5 LTS install. I ran through the documented upgrade process, and much to my surprise it went smoothly, leaving me with a functional virtual machine at the end of a 45 minute procedure (most of which was automatic, I only had to answer a few questions along the way). The process consisted of logging in as the root user (sudo -s) and running the updater (do-release-upgrade).
So, if it's so easy, why am I writing a blog post about it? Why worry?
Why worry, indeed. Read on.
Mar 26 2016
In my last post on the topic of exocortices I discussed the Huginn project, how it works, what the code for the agents actually look like, and some of the stuff I use Huginn's agent networks for for in my everyday life. In short, I call it my exocortex - an extension of the information processing capabilities of my brain running in silico instead of in vivo. Now I'm going to talk about Exocortex Halo, a separate suite of bots which augment Huginn to carry out tasks that Huginn by itself isn't designed to carry out very easily, and thus extend my personal capabilities significantly.
Now, don't get me wrong, Huginn has a fantastic suite of agents built into it already and more are being added every day. However, good design techniques require one to realize when an existing software architecture is suited for some things and not others, and allowances should be made for that. To put it another way, it was highly unlikely that I would be able to shoehorn the additional functionality I wanted into Huginn and have a hope in hell of it working. However, what Huginn has a multitude of are interfaces for getting events into and out of itself, and I could make use of those interfaces for plugging my own bots into it. The Website Agent is ideal for pinging REST API interfaces of my own design; Jabber Agent implements a simple XMPP client which can send events to an address on an XMPP server (assuming that it has its own login credentials); oversimplifying a bit, Webhook Agent basically sets up a custom REST API rail that external software can use to send events into Huginn for processing; Data Output Agent is used for sending events out of Huginn in the form of an RSS feed or a JSON document that can be consumed and parsed by other software.