Fully remote backups of websites.
A couple of weeks ago my webhosting provider sent me a polite e-mail to inform me that I was using too much disk space. A cursory examination of their e-mail showed that they were getting upset about the daily backups of my site that I was stashing in a hidden directory, and they really prefer that all files in your home directory be accessible. I ran a quick check and, sure enough, about twenty gigabytes times two weeks of daily backups adds up to a fair amount of disk space. So, the question is, how do I keep backing up all my stuff and not bother the admins any more than I have to?
Thankfully, that's a fairly straightforward operation. Beneath the cut is how I did it.
A friendly heads-up: Much of this howto will reference drwho.virtadpt.net, the website you're reading right now. Please substitute your own website where appropriate.
The MySQL databases that store most of the content for my website are still set up the way I described in this post so I didn't have to do any more database surgery. If you haven't created and set up permissions on the backup@mysql.database user, follow those instructions. That'll be one less thing to worry about.
The first thing to do is set up someplace for the backed up files to go. Easy enough: mkdir ~/backups
Next, we have to set up public key authentication for SSH so we can run everything over a very flexible and encrypted network connection that will let us do some interesting tricks. What this means is that we don't have to type in a password to log into the server at Dreamhost, the crypto key will do that automagically for us. Create someplace to keep the keys: mkdir ~/backups/.ssh
Generate an SSH keypair that we'll only use for backups: ssh-keygen -t ecdsa -b 521 -f ~/backups/.ssh/backup
"ssh-keygen, generate ecdsa crypto keys. Make it 521 bits in size (the maximum for this type of key). Write the private and public keys to ~/backups/.ssh with the filename 'backup'."
(Note: SSH on Dreamhost's servers doesn't seem to support Ed25519 crypto keys, or at least they didn't a month ago when I tried using them. Stick with ecdsa keys.)
When prompted for a passphrase, just hit the enter key twice because we don't want a passphrase on this key. While ordinarily it's a very bad idea, we want the backups to run automatically, i.e., without personal intervention. Plus, if we did rig up some way of storing the passphrase and passing it to SSH, wherever we stored the passphrase for the key would be just as vulnerable. We can take some steps to mitigate the risk, though. chmod 0700 ~/backups && chmod 0700 ~/backups/.ssh && chmod 0600 ~/backups/.ssh/backup*
Translated, "The owner of the directory ~/backups can read from, write to, and execute things inside that directory; nobody else can." "The owner of the directory ~/backups/.ssh can read from, write to, and execute things inside that directory; nobody else can." "The owner of the files backup and backup.pub inside the directory ~/backups/.ssh can read and write those files; nobody else can." That will do a lot to prevent other unprivileged users on the system from abusing that access.
Now, to prime the pump. SSH requires that the hostkey be accepted every time a new connection to a server is made. I'm backing up to a brand new server that I'd not used to log into my server at Dreamhost before, so this is an essential step: ssh -i ~/backups/.ssh/backup username@server.dreamhost.com
When prompted, accept the hostkey. The log out. Now we've got our SSH connection ready to rock.
I'm using rsync to make full backups of the files that comprise my website. It's smart about only downloading what's changed and has enough features that you can figure out how to make it do what you mean (instead of just what you say) and grab everything from the server. We're going to run rsync over SSH for two reasons. First, SSH will provide the transport tunnel for the data. Second, Dreamhost doesn't run an rsync server so the only way we're going to pull this off is SSH tunnel shenanagains. For our purposes, SSH providing encryption of the connection is really secondary to our purpose, but of course we won't say no to such fringe benefits. Something to keep in mind is that the first time you back up your website, rsync is going to pull down a copy of every file there and that's going to take time, anywhere from a couple of hours to a day. So, before you set about writing a script to make this happen run a GNU Screen session, run your initial rsync inside of it, then disconnect from screen and go do something else. Here's how I did it: rsync -ah -e "ssh -l my-username -i ~/backups/.ssh/backup" server.dreamhost.com:drwho.virtadpt.net/ ~/backups/drwho.virtadpt.net/
Translation, "rsync, recurse into every subdirectory of drwho.virtadpt.net/ on server.dreamhost.com and preserve all file attributes. Use human-readable values in all user output. Use the command in double quotes as your transport mechanism (i.e., SSH). As a sidebar, SSH will use my-username as the username to log in and authenticate with the crypto key ~/backups/.ssh/backup. Connect to server.dreamhost.com. Mirror the directory drwho.virtadpt.net/ on server.dreamhost.com to the local directory ~/backups/drwho.virtadpt.net/."
Now go do something else. If your site's as big as mine, this will take some time. Thankfully, this is the only time it'll take an extended period of time to execute.
...
Good, you're back, and the initial backup should be complete. Now we need to back up the database, which we're going to do over SSH as well. Create a directory to hold the database dumps: mkdir ~/backups/drwho.virtadpt.net-databases
Now create a configuration file that the mysqldump utility will use to log into the MySQL server as the backup user: vi ~/backups/server.my.cnf
[mysqldump] password=mysqldatabasebackupuserspassword
Where mysqldatabasebackupuserpassword is the password you set on the user 'backup' in your MySQL database. Set some restrictive file permissions on that file, also: chmod 0600 ~/backups/server.my.cnf
Now here's where things get tricky, so I'll break it down into discrete steps. Don't worry about writing a shell script to do this for you, I'll get to that later. Get your proof-of-concept working first. Set up an SSH connection to the back-end MySQL server and have SSH drop into the background: ssh -i ~/backups/.ssh/backup -f -N -L 3306:mysql.server:3306 my-username@server.dreamhost.com
"SSH, authenticate with the crypto key ~/backups/.ssh/backup. After the connection is set up, drop into the background and execute silently. Do not execute a remote command, just sit there. Forward local port 3306 to the machine mysql.server, port 3306 on the far end. Log in as my-username to server.dreamhost.com." The "Do not execute a remote command, just sit there" is the important bit, because it frees up the shell to do other stuff, like run mysqldump:
mysqldump --defaults-file=~/backups/server.my.cnf --opt --add-drop-database -h 127.0.0.1 -l -v my_database_here --single-transaction -u database_backup_user
Translation: "mysqldump, use the contents of file ~/backups/server.my.cnf to configure yourself after you've automatically read in the rest of the MySQL config files you know about. Use the standard set of command line options when you run (--opt). Put the SQL command DROP DATABASE before each CREATE DATABASE statement in the output file. Connect to host 127.0.0.1, or the localhost. Lock each table so it can't be updated before you dump its contents. Run verbosely so we can keep an eye on what you're doing. Connect to the database my_database_here. When you send SQL statements to run the backup, send them all at once, i.e. as a single transaction, so they all hit the server at the same time. Log into the database as database_backup_user; you'll use the password for this user in the ~/backups/server.my.cnf file, incidentally."
When this command runs, your screen will be flooded with SQL statements and the contents of your database. This is good, this means it works. Now we're going to make a real backup of the database, so hit the up-arrow to repeat the mysqldump command but add the following to the end before you hit the enter key:
mysqldump --defaults-file=~/backups/server.my.cnf --opt --add-drop-database -h 127.0.0.1 -l -v my_database_here --single-transaction -u database_backup_user | bzip2 -9 -c > ~/backups/drwho.virtadpt.net-databases/my_database_here-`date +%Y%m%d`.sql.bz2
When the mysqldump utility runs, its output (the dumped database) will be run through the bzip2 utility, which will compress it and write the dump to the file ~/backups/drwho.virtadpt.net-databases/my_database_here-[year][month][day].sql.bz2. Congratulations, you've made your first point-in-time database dump. Now let's pull it all together into a shell script, with a little extra magick thrown in to rotate out the oldest database dumps and save space:
#!/bin/bash BASE="/home/user/backups" SSH="$BASE/.ssh/backup" DATE=`date +%Y%m%d` PID="" echo "Starting download of backups from offsite systems." # Change to the base backup directory. cd $BASE # Start backing everything up. echo "Backing up website.example.com..." rsync -ah -e "ssh -l my_username -i $SSH" \ server.dreamhost.com:website.example.com/ website.example.com/ ssh -i $SSH -f -N -L3306:mysql.example.com:3306 my_username@server.dreamhost.com PID=`ps ax | grep [3]306 | awk '{print $1}'` mysqldump --defaults-file=/home/user/backups/user.my.cnf --opt \ --add-drop-database -h 127.0.0.1 -l -v my_database --single-transaction -u \ my_database_backup | bzip2 -9 -c > \ website.example.com-databases/my_database-$DATE.sql.bz2 kill $PID PID="" find website.example.com-databases/ -type f -mtime +14 -exec rm -v {} \; echo # Fin. exit 0
Now, you should be able to copy-and-paste the above into a shell script as-is, make a few minor modifications so it fits the specifics of your website on Dreamhost, and it'll back up your website to your local server or laptop computer. If you're running it on a box at home you can set it up as a cronjob to run once every day or so, and it'll make a perfect mirror (insofar as the server side of your website is concerned) of your website. It'll prune any database dumps older than 14 days to conserve disk space and make restoration easier to manage if it comes to that. Please note that this script doesn't do anything like delete the oldest files of your website because there is a significantly greater than zero probability that it would wreck your backup and make it useless. You can also extend this script to back up multiple websites and databases by copy-and-pasting the relevant commands in order. I don't recommend doing any nifty loop tricks in this script because you'll trip over your own two feet and it'll be harder to debug; trust me on this, okay?