Pretty serious anomalies in the stock market on Monday.

Feb 07 2017

As I've mentioned a few times in the past, diverse parts of my exocortex monitor many different aspects of the world.  One of them, called Ironmonger, constantly data mines the global stock markets looking for anomalies.  Ordinarily, Ironmonger only triggers when stock trading events greater than three standard deviations hit the market.  On Monday, 6 Feb at 14:50:38 hours UTC-0800 (PST), Ironmonger did an acrobatic pirouette off the fucking handle.  Massive trades of three different tech companies (Intel, Apple, and Facebook) his the US stock market within the same thirty second period.  By "massive," I mean that 3,271,714,562 shares of Apple, 3,271,696,623 shares of Intel, and 2,030,897,857 shares of Facebook all hit the market at the same time.  The time_t datestamps of the transactions were 1486421438 (Intel), 1486421431 (Apple), and 1486421442 (Facebook) (I use time.is to convert them back into organic-readable time/date specifiers).  I grabbed some screenshots from the Exocortex console at the time - check them out:

Intel ; Apple ; Facebook

The tall blue slivers at the far right-hand edges of each graph represent the stock trades. I waited a couple of hours and took another set of screenshots (Intel, Apple, Facebook) because the graph had moved on a bit and the transaction spikes were much more visible.  While my knowledge of the stock market is limited, I have to admit that I've never seen multi-billion share stock trades happen before.  Out of curiosity, I took a look at the historical price per share of each of those stocks to see what those huge offers did to them.  The answer, somewhat surprisingly, was "not much."  Check out these extracts from Ironmonger's memory: Facebook, Intel, and Apple.

Because I am a paranoid and curious sort, I immediately wondered if there was a correlation with the large spike in the Bitcoin transaction fee earlier that day (at 13:19:16 UTC-0800, to be precise).  The answer is... probably not.  A transaction fee of 2.35288902 BTC (approximately $2510.93us as of 22:32 hours UTC-0800 on 7 February 2017, as I write this article ahead of time), while a sizeable sum that would certainly guarantee that someone's transaction made it into a block at that very instant does not mean that it was involved.  There just isn't enough data, but it stands on its own as another anomaly that day.  I wish I knew who put those huge blocks of stock up for sale all at once.  The only thing they seem to have in common is that they're all listed on the Singularity Index, which is mildly noteworthy.

Anybody have any ideas?

Parsing simple commands in Python.

Feb 04 2017

A couple of weeks ago I ran into some of the functional limits of my web search bot, a bot that I wrote for my exocortex which accepts English-like commands ("Send me top 15 hits for HAL 9000 quotes.") and runs web searches in response using the Searx meta-search engine on the back end.  This is to say that I gave my bot a broken command ("Send hits for HAL 9000 quotes.") and the parser got into a state where it couldn't cope, threw an exception, and crashed.  To be fair, my command parser was very brittle and it was only a matter of time before I did something dumb and wrecked it.  At the time I patched it with a bunch of if..then checks for truncated and incorrect commands, but if you look at all of the conditionals and ad-hoc error handling I probably made the situation worse, as well as much more difficult to maintain in the long run.  Time for a rewrite.

Back to my long-term memory field.  What to do?

I knew from comp.sci classes long ago that compilers use things called parsers and grammars to interpret code so that it can be converted into an executable.  I also knew that the parser Infocom used in its interactive fiction was widely considered to be the best anyone had come up with in a long time, and it was efficient enough to run on humble microcomputers like the C-64 and the Apple II.  For quite a few years I also ran and hacked on a MOO, which for the purposes of this post you can think of as a massive interactive fiction environment that the players can modify as well as play in; a MOO's command parser does pretty much the same thing as Infocom's IF parser but is responsive to the changes the user's make to their environments.  I also recalled something called a parse tree, which I sort-of-kind-of remembered from comp.sci but because I'd never actually done anything with them, I only recalled a low-res sketch.  At least I had someplace to start from so I turned my rebooted web search bot loose with a couple of search terms and went through the results after work.  I also spent some time chatting with a colleague whose knowledge of the linguistics of programming languages is significantly greater than mine and bouncing some ideas off of him (thanks, TQ!)

But how do I do something with all this random stuff?

Huginn: Writing a simple agent network.

Jan 15 2017

EDIT: 20170123 - My reviewers have suggested some edits to the article, many of which I've applied.

It's been a while since I wrote a Huginn tutorial, so let's start with a basic one to get you comfortable with the idea of building an agent network.  This agent network will run every half hour, poll a REST API endpoint, and e-mail you what it gets.  You'll have to have access to an already running Huginn instance that can send outbound e-mail.  This post is going to be kind of lengthy, but that's because I'm laying out some fundamentals.  Once you understand those you can skip past the explanations and move on to the good stuff.

First, a little background - what's a REST API?  If you already know just skip down past the cut and move on, but if you don't know what I'm talking about I'll try to explain.  I'm going to assume that you've been able to install Huginn using my instructions or someone else's, or you've got access to a running instance.  I'm also going to assume that you're not a hardcore coder, you're someone who's trying to apply a useful tool to your everyday life.

At its simplest, an API (Application Program Interface) is a way to interact with a system or part of a system.  It's (hopefully) designed to be regular, which means that once you understand the basics you can apply that knowledge to figure out the more complex parts with a little messing around because the basics continue to apply.  Let's say that I've written a library called myLib, which implements a bunch of really annoying stuff (like opening and closing files and sorting elements of data) so you don't have to.  My library has a bunch of functions that carry out those tasks (openStupidFile(), readAllOfFilesContents(), sortIntegers(), sortFloatingPointValues(), searchThisCrapForAString()) when you call them in your own code.  Those functions are part of my library's API.  In the documentation are instructions for calling each function, which includes the arguments you need to pass to each function (e.g., openStupidFile() takes two arguments, a full path to a file and 'r' for read-only or 'rw' for read-write, and it returns a handle to the file that you can pass to another function or NULL if it failed).  The data type each function returns (the file handle or NULL value) is part of the API, as are the arguments each function takes (path to the file and 'r' or 'rw').

The same principle has been applied to the Web in several different ways.  What concerns us right now is something called the RESTful API (REpresentational State Transfer), which basically means interacting with a web service using HTTP verbs (GET, PUT, POST, and so forth) and referencing  URLs instead of functions in a library.  Like HTTP, requests are stateless, which means that you make a request, the server responds, and there's no further context beyond that.  You can think of RESTful APIs as fire-and-forget.  The general idea is that there is a web server of some kind, which could be a traditional one like Apache or a specialized one running inside a web app built around a server like web.py which responds to those URLs in some way.  If you make a GET request to a URL, it'll send you some data.  If you make a PUT request you replace something on the server at that URL with something you send it.  If you make a POST request you create a new something on the server.  If you make a DELETE request that something on the server gets erased.  All of this depends on the HTTP verbs the server supports (not all REST APIs need to support all of them), your access permissions (not every account can do everything), whether or not you've authenticated to the server (it is sometimes the case that read-only access doesn't require an account but read-write access does require an account or an API token or something else along those lines), or who owns a particular resource (Alice's resources are read-only for every other account on the server, but read-write for her alone), of course.  REST makes life easier but it's not carte blanche to run hog wild.  Additionally, many REST API services enforce access limits - you get so many requests per minute, hour, or day and after that it returns errors.  For example, Twitter's API will return an Error 420 (enhance your calm) if you trip their rate limiter.

The 2016 election and weird patterns on Twitter.

Dec 18 2016

You've already read my opinion of the 2016 election's outcome so I'll not subject you to it again. However, I would like to talk about some weird stuff I (we, really) kept noticing on Twitter in the days and weeks leading up to Election Day.

As I've often spoken of in the past, a nontrivial portion of my Exocortex is tasked with monitoring global activity on Twitter by hooking into the back-end API service and pulling raw data out to analyze. Those agents fire on a stagged schedule, anywhere from every 30 minutes to every two hours; a couple of dozen follow specific accounts while others use the public streaming API and grab large samples of every tweet that hits Twitter around the world.

If you want to look at a simplified version of that agent network to see how it works I've made it available on Github. As you can see, the output of that particular agent network is batched into e-mails of arbitrary size using the Email Digest Agent and is sent to one of my e-mail addresses as a single batch. The reason for this is twofold; it's easier to scan through a large e-mail and look for patterns visually than it is to scan through several dozen to several hundred separate messages in sequence, and it uses fewer system resources on my e-mail provider to store and present to me that output.

Six or seven weeks before Election Day, Lifeline (the recognition code for the agent network which carries out these sorts of tasks for me) started sending me gigantic e-mail digests every hour or so, containing something like several hundred tweets at a time (the biggest was nearly a thousand, as I recall). Scanning through those e-mails showed that most of the tweets were largely identical, save for the @username that sent them. Tweets about CNN and the Washington Post being GRU and SVR disinformation projects or on-the-ground reporting tagged with #fakenews. Links pointing to Infowars articles (the tweets consisted of the titles of posts, links, and the same sets of hashtags; if you ran the Twitter-compressed URLs through a URL unshortener they all pointed to the same posts). Anti-Bernie and anti-Hillary tweets that all had the same content and the same hashtags. Trump as the second coming messages and calls to action. Rivers of bile directed at political comentators and reporters. Links to fake Wikileaks Podesta e-mails that went to Pastebin or other post-and-forget sites (there wasn't even enough data in the fakes to attempt to validate them (by the bye, the method linked to is really easy to automate)). I saw the same phenomenon with #pizzagate tweets, only the posts came in shorter bursts more irregularly. It went on and on, day and night for weeks, hundreds upon hundreds of unique copies of the same text from hundreds of different accounts. I had to throw more CPUs at Exocortex to keep up with the flood.

All of these posts, when taken together as groups or families consisted of exactly the same text each and every time, though the t.co URLs were different (a brief digression: Twitter's URL shortening service seems to generate different outputs for the same input URL to implement statistics gathering and user tracking as part of its business strategy). Additionally, all of those posts went up more or less within the same minute. The Twitter API doesn't let you pull the IP addresses tweets were sent from but the timestamps are available to the second. If you looked at the source field of each tweet (you'll need to scroll down a bit), they were all largely the same, usually empty (""), with a few minor exceptions here and there. The activity pattern strongly suggests that bots were used to strafe circles of human-controlled accounts on Twitter that roughly correspond to memetic communities. Figuring that somebody had already done some kind of visualization analysis (which I suck at), I had Argus (one of my web search bots) do some digging and he found a bunch of pages like this study, which seem to back up my observations.

The sort of horsepower needed to create such an army of bots would be very easy to assemble: Buy a bunch of virtual machines on Amazon's EC2. Write a couple of bots using Ruby or Python. Sign up for a bunch of Twitter accounts or just buy them in bulk. Make a Docker image that'll effectively turn one EC2 instance into as many as you can reasonably run without crashing the VM. Deploy lots and lots of copies of your bots into those Docker containers. Use an orchestration mechanism like Ansible to configure the bots with API keys and command them en masse; if you're in a time crunch you could even use something like pssh to fire them all up with a single command. Turn them loose. If you've been in IT for a year, this is a Saturday afternoon project that won't cost you a whole lot, but could make you a lot of money.

"Well, yeah, there was an army of bots advertising on Twitter. What else is new?" you're probably saying.

What I am saying is simply this: This post describes a little bit about how this sort of media strategy works, what the patterns look like at the 50000 foot view, and my/our observations. I don't think I did anything really ground-breaking here, only in the sense that I used a bunch of AI systems that stumbled across what was going on by accident. It was the hardcore data scientists who did the real academic work on it (though that work is a bit inaccessible unless you're a computer geek).

Memetic warfare is here, and our social networks at the battlegrounds. Armor up.

Upgrading Ubuntu Server 14.04 to 16.04.

Oct 29 2016

A couple of days ago I got it into my head to upgrade one of my Exocortex servers from Ubuntu Server 14.04 LTS to 16.04 LTS, the latest stable release. While Ubuntu long-term support releases are good for a couple of years (14.04 LTS would be supported until at least 2020) I had some concerns about the packages themselves being too stale to run the later releases of much of my software. To be more specific, I could continue to hope that the Ruby and Python interpreters I have installed could be upgraded as necessary but at some point the core system libraries would be too old and they'd no longer compile. Not good for long-term planning.

First off, whenver you're about to do a major upgrade of anything, read the release notes so you know what you're getting yourself into. You'll also usually find some notes about all the new goodies you'll be able to play with.

In the past I've had nothing but trouble using the documented Ubuntu release upgrade process, so much so that I've had clients sign "I told you so," documents when they pressured me to do so because the procedure could reliably be expected to leave the system completely trashed, and a full rebuild was the only recourse. This time I set up a testbed in Virtualbox which consisted of a fully patched Ubuntu Server 14.04.5 LTS install. I ran through the documented upgrade process, and much to my surprise it went smoothly, leaving me with a functional virtual machine at the end of a 45 minute procedure (most of which was automatic, I only had to answer a few questions along the way). The process consisted of logging in as the root user (sudo -s) and running the updater (do-release-upgrade).

So, if it's so easy, why am I writing a blog post about it? Why worry?

Why worry, indeed. Read on.

Exocortex: Setting up Huginn

Sep 11 2016

In my last post I said that I'd describe in greater detail how to set up the software that I use as the core of my exocortex, called Huginn.

First, you need someplace for the software to live. I'll say up front that you can happily run Huginn on your laptop, desktop workstation, or server so long as it's not running Windows. Huginn is developed under Linux; it might run under one of the BSDs but I've never tried. I don't know if it'll run as expected in MacOSX because I don't have a Mac. If you want to give Huginn a try but you run Windows, I suggest installing VirtualBox and build a quick virtual machine. I recommend sticking with the officially supported distributions and use the latest stable version of Ubuntu Server. At the risk of sounding self-serving, I also suggest using one of my open source Ubuntu hardening sets to lock down the security on your new VM all in one go. If you're feeling adventurous you can get a VPS from a hosting provider like Amazon's AWS or Linode. I run some of my stuff at Digital Ocean and I'm very pleased with their service. If you'd like to give Digital Ocean a try here's my referral link which will give you $10us of credit, and you are not obligated to continue using their service after it's used up. If I didn't like their service (both commercial and customer) that much I wouldn't bother passing it around.

As serious web apps go, Huginn's system requirements aren't very high so you can build a very functional instance without putting a lot of effort or money toward it. You can run Huginn in about one gigabyte of RAM and one CPU, with a relatively small amount of disk space (twenty gigabytes or so, a fairly small amount for servers these days). Digital Ocean's $10us/month droplet (one CPU, one gigabyte of RAM, and 30 gigabytes of storage) is sufficient for experimentation and light use. To really get serious usage out of Huginn you'll need about two gigabytes of RAM to fit multiple worker daemons into memory. I personally use the following specs for all of my Huginn virtual machines: At least two CPUs, 60 gigabytes of disk space, and at least four gigabytes of RAM. Chances are, any physical machine you have on your desk exceeds these requirements so don't worry too much about it (but see these special instructions if you plan on using an ultra-mini machine like the Raspberry Pi). If you build your own virtual machine, take into account these requirements.

Exocortex: Identity and Agency

Sep 05 2016

Some time ago I was doing a longform series on Exocortex, my cognitive prosthetic system. I left off with some fairly broad and open-ended questions about the implications of such a software system for identity and agency. Before I go on, though, I think I'd better define some terms. Identity is one of those slippery concepts that you think you get until you have to actually talk about it. One possible definition is "the arbitrary boundry one draws between the self and another," or "I am me and you are you." A more technical definition might be "the condition or character as to who a person or what a thing is; the qualities, beliefs, et cetera that distinguish or identify a person or thing." That said, in this context I think that a useful working definition for the word 'identity' might consist of "the arbitrary boundry one draws between the self and another being that may or may not incorporate the integration of tools or other augmentations." Let us further modify the second, technical definition to include "the condition or character as to who a person or what a thing is or consists of due to the presence or absence of augmentations that modify the capabilities and/or attributes thereof," due to the fact that the definition should explicitly take into account the presence or absence of software or hardware augmentations. We also need to examine the definition of the word agency, which seems even more problematic. The Free Dictionary says that one definition is "the condition of being in action or operation," or loosely "being able to do stuff." The Stanford Encyclopedia of Philosophy says (among other things) the following about agency as a concept: The exercise or manifestation of the capacity to act. Of course, there are also arguments about the philosophy of agency that involve actors that should not be capable of having the intention to act doing so anyway, sometimes in ways that are functionally indistinguishable from organic life (which we usually think of as actors in the philosophical sense, anyway). And that's where things start getting tangled up.

Before I move on, I should set up two additional definitions. For the purposes of this post, 'agent' will refer to one of the functional units of Huginn used to construct solutions to larger problems. 'Constructs' will refer to the separate pieces of more complex software that plug into Huginn from outside.

Video from my HOPE XI talk is now online.

Aug 02 2016

The Internet Society has re-uploaded the video from my HOPE XI talk. Here it is:

Feel free to get a chuckle out of how nervous I am, but I hope you enjoy my talk, too.

HOPE XI - This one went to eleven!

Jul 30 2016

It's mostly been radio silence for the past couple of days. If you're reading this you've no doubt noticed that Switchboard (one of my constructs) posted the slides from my talk earlier this week. As sophisticated and helpful as she is, Switchboard can't yet pick thoughts out of my wetware to write blog posts. And so, here I am, my primary organic terminal sitting at Windbringer's console keying in notes, saving them, and then going back to turn them into something approaching prose. I've just now had the time to sit down and start writing stuff about HOPE XI, largely because after getting back all hell broke loose at my dayjob (per usual) so I haven't had the time. In point of fact, this writeup will probably happen over the course of a couple of days so it might come off as a bit disjointed.

It felt kind of strange attending this HOPE. I missed the last one two years ago because I was in the middle of moving into our new place on the other coast so I felt a little out of the loop. I missed just about everything that happened there and I keep forgetting to go back and track down the video recordings (so I'll have another part of me do that). It didn't take long to get back into the stride, though. Once you start attending hacker cons regularly it's easy to find how everything comes together, dive in, and get out of it what you're looking for. There weren't many vendors there because HOPE is largely a talks-and-talking to people kind of conference but I did come home with a few things to practice with as I always do. I also went out of my way to not buy another full wardrobe of t-shirts because, even after getting rid of 4/5 of my collection (including, I hasten to add, much of my collection of hacker convention shirts) space in my dresser is still at a premium. So goes the life of a self-admitted clothes horse. I also found one of Seeed Studio's FST-01 ultra-miniature 32-bit computers for sale at a table and snapped it up to use it with NeuG as a random number generator in a few of my projects because my Geiger counter died some months ago, but that's a writeup for another time.

After landing, picking up my luggage, and catching a cab to the hotel I met up with Seele, Genetik, and Nuke, whom I was splitting a hotel room with. I was a bit chagrined when Seele told me that there'd been a booking mixup and the Hotel Pennsylvania had to give us a different room. What I hadn't expected was that they gave us what amounted to a con suite, two full-sized rooms hooked together like a smallish apartment that easily had room for twice as many people as would be staying there. There was sufficient room that we were able to spread out as much as we liked with room left over so sleeping was quite comfortable. I never really got over the jet lag this time so my sleep schedule was all messed up. I may have averaged about four hours of sleep a night all weekend, modulo having to take a nap for a couple of hours on Saturday afternoon because I could neither concentrate on anything nor tune out background noise for very long. Either one left me with a dizzying sense of sensory overload which left me unable to see straight. It also meant that I spent the next couple of days trying to catch up and crashing hard after work for ten to twelve hours, with very strong but fragmentary dreams as my primary long-term memory optimized itself. It was the kind of sleep deprivation that you didn't know you had, as opposed to the kind of sleep deprivation where you know full well you've been awake for three days straight and you feel it in your bones, your fingers, and even in your hair. I didn't make it to all of the talks I wanted to but I did make a point of picking up a couple of DVDs before I left of the ones I really wanted to hit; I also downloaded most of the livestream recordings to watch later on the media box, probably after I get off the road the week after next.

A colleague of mine once remarked that there comes a point where you pretty much level out of most of the stuff that happens at hacker cons and you get more out of interacting with everyone there than you do from attending talks or seminars. I was somewhat skeptical at the time but open-minded about the possibility. Now I'm wondering if that's not the case because, from reading a whitepaper or two and having part of me do a search I can pretty much reconstruct the content of the talk (as verified by actually watching a recording of the talk later) and get the same thing out of it. I definitely came away from most of the discussions I found myself in with new perspectives on a lot of things.

So it goes.