smoke and mirrors system administration - noun phrase - When you bring a problem to your support team and they go silent for hours to days at a time. No amount of poking and prodding is sufficient to get anyone on the team to respond to your requests for status updates. When they finally get back to you they say that nothing's wrong and you must have made a mistake. Your thing is now unbroken. They never tell you (or anyone, for that matter) what they fixed or how they fixed it.
basketball mode - noun phrase - When a service or application crashes and restarts itself over and over, i.e., bouncing like a basketball every few seconds. Considered an outage.
Here's the situation: You're using Ansible to configure a machine on your network, like a new Raspberry Pi. Ansible has done a bunch of things to the machine and needs to reboot it - for example, when you grow a Raspbian disk image so that it takes up the entire device, it has to be rebooted to notice the change. The question is, how do you reboot the machine, have Ansible pick up where it left off, and do it in one playbook only (instead of two or more)?
I spent the last couple of days searching for specifics and found a number of techniques that just don't work. After some experimentation, however, I pieced together a small snippet of Ansible playbook that does what I need. Because it was such a pain to figure out I wanted to save other folks the same trouble. Here's the code, suitable for copying and pasting into your playbook:
...the first part of your playbook goes here. - name: Reboot the system. shell: sleep 2 && shutdown -r now async: 1 poll: 0 ignore_errors: true - name: Reconnect and resume. local_action: wait_for args: host: bob-newhart port: 22 state: started delay: 10 timeout: 30 ...the rest of your playbook goes here.
Specifics of proof of concept for later reference:
- Ansible v2.7.0
- Raspberry Pi 3
- Raspbian 2018-06-27
A couple of weeks back, somebody I know asked me how I went about deploying SSL certificates from the Let's Encrypt project across all of my stuff. Without going into too much detail about what SSL and TLS are (but here's a good introduction to them), the Let's Encrypt project will issue SSL certificates to anyone who wants one, provided that they can prove somehow that they control what they're cutting a certificate for. You can't use Let's Encrypt to generate a certificate for google.com because they'd try to communicate with the server (there isn't any such thing but bear with me) google.com to verify the request, not be able to, and error out. The actual process is complex and kind of involved (it's crypto so this isn't surprising) but the nice thing is that there are a couple of software packages out there that automate practically everything so all you have to do is run a handful of commands (which you can then copy into a shell script to automate the process) and then turn it into a cron job. The software I use on my systems is called Acme Tiny, and here's what I did to set everything up...
I know I haven't posted much this month. The holiday season is in full effect and life, as I'm sure you know, has been crazy. I wanted to take the time to throw a quick tip up that I just found out about which, if nothing else, will make it easier to get up and running on a Raspberry Pi that you've received as a gift. Here's the situation:
You have a new account on a machine that you want to SSH into easily. So, you want to quickly and easily transfer over one or more of your SSH public keys to make it easier to log in automatically, and maybe make running Ansible a bit faster. Now, you could do it manually (which I did for many, many years) but you'll probably mess it up at least once if you're anything like me. Or, you could use the ssh-copy-id utility (which comes for free with SSH) to do it for you. Assuming that you already have SSH authentication keys this is all you have to do:
[drwho@windbringer ~]$ ssh-copy-id -i .ssh/id_ecdsa.pub pi@jukebox /bin/ssh-copy-id: INFO: Source of key(s) to be installed: ".ssh/id_ecdsa.pub" /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys pi@jukebox's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'pi@jukebox'" and check to make sure that only the key(s) you wanted were added.
Now let's try to log into the new machine:
[drwho@windbringer ~]$ ssh pi@jukebox Linux jukebox 4.9.70-v7+ #1068 SMP Mon Dec 18 22:12:55 GMT 2017 armv7l The programs included with the Debian GNU/Linux system are free software; # I didn't have to enter a password because my SSH pubkey authenticated me # automatically. pi@jukebox:~ $ cat .ssh/authorized_keys ecdsa-sha2-nistp521 AAAAE....
You can run this command again and again with a different pubkey, and it'll append it to the appropriate file on the other machine (~/.ssh/authorized_keys). And there you have it; your SSH pubkey has been installed all in one go. I wish I'd known about this particular trick... fifteen years ago?
Difficulty rating: 8. Highly specific use case, highly specific setup, assumes that you know what these tools are already.
Let's assume that you have a couple of servers that you can SSH into over Tor as hidden services.
Let's assume that your management workstation has SSH, the Tor Browser Bundle and Ansible installed. Ansible does all over its work over an SSH connection, so there's no agent to install on any of your servers.
Let's assume that you only use SSH public key authentication to log into those servers. Password authentication is disabled with the directive PasswordAuthentication no in the /etc/ssh/sshd_config file.
Let's assume that you have sudo installed on all of those servers, and at least one account can use sudo without needing to supply a password. Kind of dodgy, kind of risky, mitigated by only being able to log in with the matching public key. That seems to be the devopsy way to do stuff these days.
Problem: How to use Ansible to log into and run commands on those servers over the Tor network?
EDIT - 20171011 - Added a bit about getting real login shells inside of this Screen session, which fixes a remarkable number of bugs. Also cleaned up formatting a bit.
To keep the complexity of parts of my exocortex down I've opted to not separate everything into larger chunks using popular technologies these days, such as Linux containers (though I did Dockerize the XMPP bridge as an experiment) because there are already quite a few moving parts, and increasing complexity does not make for a more secure or stable system. However, this brings up a valid and important question, which is "How do you restart everything if you have to reboot a server for some reason?"
A valid question indeed. Servers need to be rebooted periodically to apply patches, upgrade kernels, and generally to blow the cruft out of the memory field. Traditionally, there are all sorts of hoops and gymnastics one can go through with traditional initscripts but for home-grown and third party stuff it's difficult to run things from initscripts in such a way that they don't have elevated privileges for security reasons. The hands-on way of doing it is to run a GNU Screen session when you log in and start everything up (or reconnect to one if it's already running). This process, also, can be automated to run when a system reboots. Here's how:
A persistent risk of websites is the possibility of somebody finding a vulnerability in the CMS and backdooring the code so that commands and code can be executed remotely. At the very least it means that somebody can poke around in the directory structure of the site without being noticed. At worst it would seem that the sky's the limit. In the past, I've seen cocktails of browser exploits injected remotely into the site's theme that try to pop everybody who visits the site, but that is by no means the nastiest thing that somebody could do. This begs the question, how would you detect such a thing happening to your site?
I'll leave the question of logfile monitoring aside, because that is a hosting situation-dependent thing and everybody has their own opinions. What I wanted to discuss was the possibility of monitoring the state of every file of a website to detect unauthorized tampering. There are solutions out there, to be sure - the venerable Tripwire, the open source AIDE, and auditd (which I do not recommend - you'd need to write your own analysis software for its logs to determine what files, if any, have been edited. Plus it'll kill a busy server faster than drilling holes in a car battery.) If you're in a shared hosting situation like I am, your options are pretty limited because you're not going to have the access necessary to install new packages, and you might not be able to compile and install anything to your home directory. However, you can still put something together that isn't perfect but is fairly functional and will get the job done, within certain limits. Here's how I did it:
Most file monitoring systems store cryptographic hashes of the files they're set to watch over. Periodically, the files in question are re-hashed and the outputs are compared. If the resulting hashes of one or more files are different from the ones in the database, the files have changed somehow and should be manually inspected. The process that runs the comparisons is scheduled to run automatically, while generation of the initial database is normally a manual process. What I did was use command line utilities to walk through every file of my website, generate a SHA-1 hash (I know, SHA-1 is considered harmful these days; my threat model does not include someone spending large amounts of computing time to construct a boobytrapped index.php file with the same SHA-1 hash as the existing one; in addition, I want to be a good customer and not crush the server my site is hosted on several times a day when the checks run), and store the hashes in a file in my home directory.
We seem to have reached a unique point in history: Available to your average home user are gargantuan amounts of disk space (8 terabyte hard drives are a thing, and the prices are rapidly coming down to widespread affordability) and enough processing power is available for the palm of your hand that makes the computational power that put the human race on the moon compare in the same was that a grain of sand does to a beach. For most people, it's the latest phone upgrade or more space for your media box. For others, though, it poses an unusual challenge: How to make the best use of the hardware without wasting it needlessly. By this, I mean how one might build a server that doesn't result in wasted hard drive space, wasted SATA ports on the mainboard, or having enough room to put all of that lovely (and by "lovely" I really mean "utterly disorganized") data that accumulates without even trying. I mentioned last year that I rebuilt Leandra (specs in here) so I could work on some machine learning and search engine projects. What I didn't mention was that I had some design constraints that I had to follow so that I could get the most out of her.
To get the best use possible out of all of those hard drives I had to figure out how to structure the RAID, where to put the guts of the Arch Linux install, and most importantly figure out how to set everything up so that if Leandra did blow a hard drive the entire system wouldn't be hosed. If I partitioned all of the drives as described here and used one as the /boot and / partitions, and RAIDed the rest, if the first drive blew I'd be out an entire operating system. Also, gauging the size of the / partition can be tricky; I like to keep my system installs as small as possible and add only packages that I absolutely need (and ruthlessly purge the ones that I don't use anymore). 20 gigs is way too big (currently, Leandra's OS install is 2.9 gigabytes after nearly a year of experimenting with this and that) but it would leave room to grow.
So, what did I finally decide on?
Since PivotX went out of support I've been running the Bolt CMS for my website at Dreamhost (referral link). A couple of weeks back you may have noticed some trouble my site was having, due to my running into significant difficulty encountered when upgrading from the v2.x release series to the v3.x release series. Some stuff went sideways, and I had to restore from backup at least once before I managed to get the upgrade procedure straightened out with the help of some of the developers in the Bolt IRC channel on Freenode. If it wasn't for help from rossriley it would have taken significantly longer to un-fuck my website.
Here's the procedure that I used to get my site upgraded to the latest release of Bolt.