Distributing Huginn workers across servers.

For quite a few years I've written about strange and sundry things you can do with Huginn, but not a lot about what to do when you run into systemic limitations. The nice thing about Huginn is that you can spin up as many workers (subprocesses that execute agents from the database) as you want, subject to the limitations of what you happen to be running it on. The downside, however, is that it's easy to accidentally upgrade your VPS to the point where it's just really expensive. I just ran into this purely by accident and spent a day or two pulling back on the reins as it were.

While fixing something the other day (namely, re-cutting and re-issuing Nebula certificates because it doesn't tell you when they expired) I realized that it would be possible to use Nebula to set up a tunnel to my managed database server and run a couple of job workers on Leandra (who has more than enough processing power to handle the load).

It took some research, trial and error, and outages on my end of things to figure it out because the existing documentation out there wasn't complete enough for my purposes, so here's my attempt to compose a more complete document for the community. Much of this text came from hanging out on the Nebula Slack chat and The Orange One's writeup (which was almost but not quite enough for my purposes). I'll put the steps in the order in which it seems easiest to carry them out.

Let's say that we have three machines: Alpha (a server at Digital Ocean with the IP address, Beta (a managed database server at Digital Ocean with the IP address, and Gamma (a server at home with the IP address For security reasons the access policy on Beta is such that only Alpha can contact it directly. Gamma has a dynamic IP address that can change at any moment so we can't add it to the access policy. But what we can do is use Nebula to make a bidirectional connection between Alpha and Gamma (c.f., my Nebula tutorial over here) as a pre-requisite. Then, as part of using the unsafe_routes feature of Nebula, we have to cut a new certificate for Alpha. This won't mess with the other servers in your Nebula network. So, log into Alpha, shut down Nebula, and generate a new certificate:

root@alpha:/home/drwho# systemctl stop nebula.service
root@alpha:/home/drwho# cd /etc/nebula
root@alpha:/etc/nebula# mv alpha.crt alpha.crt.old
root@alpha:/etc/nebula# mv alpha.key alpha.key.old
root@alpha:/etc/nebula# /usr/local/sbin/nebula-cert sign -name "alpha" -ip ""
    -subnets ""
root@alpha:/home/drwho# systemctl start nebula.service

We don't have to edit the /etc/nebula/config.yml file on Alpha to make this work. Another gotcha here is that Nebula's unsafe_routes mode will only work with network subnets and not IP addresses. So, even though the database server's IP address is we have to tell Nebula about Ordinarily, allowing more broad access than we absoltely need is a bad thing, but in this case it's traffic from our network only, and we only have access to just one IP address in that network because the other ones are firewalled off from us on Digital Ocean's side of things. What putting the network for the database server in the certificate does is announce its presence to the rest of the Nebula servers in your network. It's basically saying "Hey, I know about a service you can connect to over here."

Next, we have to set up IP forwarding on Alpha, so that if it recieves any network traffic fitting specific criteria from Gamma it'll forward the traffic, and relay responses back to Gamma.

root@alpha:/etc/nebula# sysctl -n net.ipv4/ip_forward 1
root@alpha:/etc/nebula# vim /etc/sysctl.conf
# Turn on the ip_forward option in this file and save it.

We're almost done on Alpha. Now we have to set up NAT on Alpha using iptables so that Alpha will transparently rewrite traffic from Gamma, which came across the Nebula connection, so that it looks like it came from Alpha. This is so that Digital Ocean's firewall in front of Beta will accept the traffic, as well as so that it'll know where to send traffic in the other direction. As far as it cares, it's just traffic from Alpha. Run this command on Alpha, and also append it to the /etc/rc.local file so it'll be executed at boot-time:

root@alpha:/etc/nebula: /usr/sbin/iptables -t nat -A POSTROUTING -s
    -o eth0 -j MASQUERADE

Now we bounce over to Gamma, our server at home to do a little reconfiguration. Edit the /etc/nebula/config.yml file on Gamma and add the following:

    # This is the netblock we baked into Alpha's cert.
    - route:
      # This is Alpha's Nebula IP address.

Restart Nebula on Gamma so the changes take effect. If you check the routing table on Gamma you should see a new route matching the unsafe route configured on Alpha (see what they did there?)

root@gamma:/etc/nebula# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface         UG    3004   0        0 wlp6s0   U     0      0        0 nebula1   U     0      0        0 nebula1   U     3004   0        0 wlp6s0

If you see that you're almost there. Let's give MySQL a try, just to be sure:

drwho@gamma:/home/drwho$ mysql -h -u huginn -P 25060 -p huginn
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 24299
Server version: 8.0.23 Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [huginn]>


Now you can set up Huginn on Gamma, but we're going to do it a little differently. We're only going to run delayed_job workers on Gamma; we don't need to start the Ruby On Rails control panel or the scheduler or any of that other stuff (well, you could if you wanted to... I wouldn't run more than one scheduler process because that might mess with the database). When you set up the .env configuration file you're going to set the DATABASE_HOST and DATABASE_PORT variables to match the ones on Alpha. The idea here is that Gamma will see the connection attempt to, which matches the route in the routing table, and runs the MySQL traffic from the worker processes over your Nebula connection to Alpha. Then Alpha transparently forwards the MySQL traffic to the database server The settings in .env will look like this:


Now try starting a job runner:

drwho@gamma:/home/drwho/huginn$ RAILS_ENV=production bundle exec script/delayed_job -i 00 run
delayed_job.00: process with pid 3729528 started.

When the delayed_job runner starts outputting messages from the agents you know it's up and running. Success. You did it.

Once you're sure it's working, you'll want to arrange for the job workers to start up automatically on Gamma. When I put this into production on Leandra I created a new GNU Screen configuration file called ~/.screenrc.huginn that looks a little like this:

# Make the default login shell bash.
shell "/usr/bin/bash"
defshell -bash

# Turn off the startup message.
startup_message off

# Start up Huginn job workers.
chdir /home/drwho/huginn
screen -t "Job Runner 0" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 00 run'
screen -t "Job Runner 1" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 01 run'
screen -t "Job Runner 2" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 02 run'
screen -t "Job Runner 3" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 03 run'
screen -t "Job Runner 4" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 04 run'
screen -t "Job Runner 5" bash -icl 'RAILS_ENV=production bundle exec script/delayed_job -i 05 run'

That Screen session gets started at boot time automatically from a cron job with the following command:

@reboot . $HOME/.bashrc ; /usr/bin/screen -d -l -m -c /home/drwho/.screenrc.huginn

Of course, if you have a method for managing Huginn processes that works better for you, by all means use it.

At any rate, that's it. You're up and running.

Incidentally, I'm going to file a pull request to fix this in the documentation.