Feb 03 2016
In the last post in this series I talked about the origins of my exocortex and a few of the things I do with it. In this post I'm going to dive a little deeper into what my exocortex does for me and how it's laid out.
My agent networks ("scenarios" in the terminology of Huginn) are collections of specialized agents which each carry out one function (like requesting a web page or logging into an XMPP server to send a message). Those agents communicate by sending events to one another; those events take the form of structured, packaged pieces of information that the receiving agent can pick values out of or pass along depending on how it's configured. Below the cut is what one kind of event looks like.
"pretty": "7:00 PM PST on January 10, 2016",
"conditions": "Chance of Rain",
That looks like a king-hell mess of acronyms and numbers but if you take it one line or block (delineated by curly braces) at a time it's pretty easy to puzzle out. The event contains many different kinds of information from the web service the agent queried, all of which is accessible by agents but not all of which needs to be used; you can pick and choose the stuff you need out of the event and ignore the rest. For example, the date the event was received by Huginn can be seen at the very top in several different formats (time_t, the standard parts of a time/date stamp broken into separate pieces, and the time zone in a few different ways) so that you don't have to write code to transform one date/timestamp into a different format, the predicted high and low temperatures in Fahrenheit and Centigrade, projected weather conditions for the day, the wind speed and direction, humidity, and zip code of the city in question. My agent network uses a few elements of these events to build the weather report I read every morning before getting dressed for work. This agent network, called Butterfly In China also monitors the air quality index in the same locations at the same time it pulls the weather forecasts. The AQI projection is filtered through a series of if-then conditionals that determine what AQI message should be merged, if any (none, moderate, unhealthy, hazardous, or dangerous). Then the resulting message is e-mailed to my phone while I'm in the shower so I have some idea of what the weather will be like that day.
After a decidedly anomalous event in 2011 I built a simple but functional agent network called Shake, Rattle, and Roll which monitors the United States Geological Service's tectonic activity surveillance network. Every few minutes, Shake, Rattle, and Roll polls their API service for earthquake activity stronger than a 4.0 on the Richter scale and if a positive event comes back (meaning that an earthquake was detected) it is fed into a Data Formatting Agent which turns the event from the API service into a human-readable message suitable for transmission through one or more instant messaging services. I've found it useful to extend Shake, Rattle, and Roll to monitor other geographic locations but, as it is now it makes a good example of a relatively simple agent network and demonstration of interfacing with a public web API service.
Unfortunately, not all websites or services have nice, neat APis that you can request information from periodically. In point of fact, not all websites have update feeds, like RSS or ATOM feeds for whatever reason (and yes, some organizations do this maliciously). However, if you can load a web page in a web browser you can write code to load a web page and take it apart to find just the bits you're interested in using a procedure called web scraping or web harvesting. Huginn's WebsiteAgent implements a fairly sophisticated HTML scraping engine that lets you very precisely specify which parts of a web page you're interested in and emit them as events for processing by other agents. It gets kind of technical here (and it's a pain in the ass to get right, really) but I'll try to simplify it if you're not a web developer (and I most certainly am not, I really hope I'm doing this right) - there are things called CSS selectors, which tell your web browser which parts of the HTML to apply which CSS entities to. CSS selectors can be used more generally to identify and pick out parts of web pages. If you're very skilled with HTML and the CSS selector langauge you can do this by hand; I'm not and I suck at both so I used a tool called Selector Gadget to visually pick out what part of a page I want and it generates the CSS selector for me. One of them looks like this:
Huginn's WebsiteAgent class also implements XPath selectors for taking apart XML documents (some servers return XML rather than HTML and some API services send XML rather than JSON), but I don't want to go too far afield here. Anyway, as an exercise in learning how to do practical website scraping I decided to build an agent network named Tripwire to monitor some pages on the websites of the Federal Bureau of Investigation and the Bureau of Industry and Security. Nothing sensitive, mind you, this is all public stuff but it's also not necessarily easy to watch automatically. So, I wrote a couple of agents that download the page once a day, pick out the parts I want and diff them to see if they've changed. If they have, I get an e-mailed message and if they haven't I have one less message to glance at.
As a long-time radio scanner listener since childhood (dzięki, babcia) I like to listen to what's happening in my city. If I'm at home I usually use a Uniden hand-held like the Uniden BC75XLT, but if I'm traveling I employ an ultra-small software defined radio like the super-cheap but incredibly useful Ronsit TV28T USB tuner, based on the RTL2832U chip along with a copy of GQRX for Linux. However, there are certain functional limits to these radios, chief among them that they can't be used to listen to radio broadcasts several states distant. So I built an agent network that monitors several online radio scanner websites like Broadcastify, DX Zone, and police-scanner.net. My working hypothesis, backed up with several years of data is this: Other people listen to these online scanners which typically have tickers showing the number of listeners at any given time. When something goes down - a fire, a police action, anything that would cause significant radio traffic - the number of listeners goes up. So, my agents periodically peek at the number of people listening to scanner feeds and if the numbers spike I get an alert and a link to the online scanner in question. I've named this agent network Cherrybomb as a reference to the game Gunshots or Fireworks?
Something else of interest to me are the vicissitudes of various cryptocurrency networks like Bitcoin and Litecoin and all the shenanagains that come from people with stupid amounts of money and nothing holding them back butting heads. By this I mean DDoS attacks that take down entire ISPs and death threats because someone dares disagree. But, I digress. So, to that end I built a fairly large agent network (a little over one hundred agents) that monitors exchange rates on a number of popular and somewhat trustworthy gateways and exchanges, transaction fees, and hash rates to monitor the market and see what direction it's going as well as pick up a little security-related OSINT. The nice thing about cryptocurrencies is that you can't swing a stuffed mouse without hitting at least a dozen websites that offer free APIs that offer information about this, that, or the other blockchain (note that I do not care to argue about what does or does not constitute a blockchain). After discovering and spending way too much time listening to Bitlisten I named the agent network Firefly, because after a while all of those data points start looking like a gallon pickle jar full of fireflies flashing on and off and dancing around. It took me a while to fine-tune all of the detection metrics to get the noise down, but once I figured out what was and was not appropriate data Firefly started sending events which are informative and useful.
The last of the constructs I'd like to talk about is called Switchboard. As you might expect I keep many lines of communication open for both personal and professional purposes: E-mail, instant messaging, private messages, and chat systems of several different kinds. It got to be way too big a problem to triage and manage everything a long time ago, so I started building a construct which analyzed and kept track of the whole mess. A software-based secretary, if you will. Switchboard carries out many secretarial tasks on my behalf, and while she's not perfect and there are things she is not yet capable of (though I'm working on it - I've been studying recent advances in data science and AI to see if I can make them happen) there are many things that she does extremely well that I find utterly essential. For example, Switchboard regularly sorts my e-mail into folders and transmits prioritized alerts depending on topics found in messages and who sent them, she enforces my killfile with ruthless efficiency (an essential task), lets me know when private messages come in that I haven't acknowledged in a certain period of time (this is usually work-related stuff because we practically run on instant messaging), and pokes me if certain keywords or events happen in any of the chats she monitors while I wasn't paying attention (while chats are useful I also tend to relegate them to lowest priority so I can get more important things done). While Switchboard isn't yet capable of responding on my behalf (experiments are promising but I don't think she's sophisticated enough yet) triaging and organizing communications is about 80% of the work that goes into managing communications today (90% if you count deleting the crap).
And with that, I don't have anything more for this post. This has been a whirlwind tour through the core of my exocortex, the hundreds of software agents that abstract away a lot of tedious and time-consuming work so that I can get on with the interesting stuff.... like living my life. It's a fact of life in the twenty-first century that, to quote the band Plan-B, we're drowning in information but still starving for knowledge, and not being able to at least keep your head above water if not swim constitutes a serious handicap. Like it or not, we live in a world where we more or less always have to be online; work doesn't sleep and everybody's on call at least part of the time regardless of position. Having close friends that we've never actually met face-to-face because they're on the other side of the world is no longer an aberration, it's a fact of life, and while it's all well and good to chat from time to time sometimes real friends need real help, and real friends are there for each other. Unlike on television, emergencies are not known for their propensity to wait for convenient times before occurring and being able to respond (even if it means broadcasting a distress call on someone's behalf) is essential. And, let's be honest, the amount of crap that everyone has to deal with in their inboxes is such that nobody in their right mind wants to deal with it manually. I hope that the agent networks in the Github repository are useful in learning how to build your own and interact with APIs. I'll probably come up with a few other ones later. If you're curious about Huginn by all means get yourself a cheap VPS or a RaspberryPi, set it up, and play around with it a little bit. I encourage you to experiment with the example agents I wrote to get a feel for what Huginn can do, and if you're already a Github user I invite you to hack on the repository, add your own agent networks, and file pull requests for enhancements or new agent networks to grow the library of examples to play with. In my next post I'll talk a little bit about the Halo, which is a suite of code I'm writing to interface with Huginn and carry out other tasks that it isn't yet well suited for.