UPDATE: 20191229 - Added how to rotate out the oldest backups.
As frequent readers may or may not remember, I rebuilt my primary server last year, and in the process set up a fairly hefty RAID-5 array (24 terabytes) to store data. As one might reasonably expect, backing all of that stuff up is fairly difficult. I'd need to buy enough external hard drives to fit a copy of everything on there, plus extra space to store incremental backups for some length of time. Another problem is that both Leandra and the backup drives would be in the same place at the same time, so if anything happened at the house I'd not only not have access to Leandra anymore, but there's an excellent chance that the backups would be wrecked, leaving me doubly screwed.
Here are the requirements I had for making offsite backups:
- Backups of Leandra had to be offsite, i.e., not in the same state, ideally not on the same coast.
- Reasonably low cost. I ran the numbers on a couple of providers and paying a couple of hundred dollars a month to back up one server was just too expensive.
- Linux friendly.
- My data gets encrypted with a key only I know before it gets sent to the backup provider.
- A number of different backup applications had to support the provider, in case one was no longer supported.
- Easy to restore data from backup.
After a week or two of research and experimentation, as well as pinging various people to get their informed opinions, I decided to go with Backblaze as my offsite backup provider, and Duplicity as my backup software. Here's how I went about it, as well as a few gotchas I ran into along the way.
First of all, I signed up for a personal account at Backblaze. If you want to give them a try, here's my referral link. Full disclosure: For every month someone pays for if they sign up using that link, I get a free month. Just to be safe I also set up two-factor authentication on my account for additional protection (and if you don't do this often, you really should).
Next, I installed Duplicity from the Arch Linux package repository:
drwho@leandra:(9) ~$ sudo pacman -S duplicity
drwho@leandra:(9) ~$ yaourt -S backblaze-b2
For Debian and Ubuntu users out there, it might look a little more like this:
deb@bian $~ sudo apt-get install -y backblaze-b2
At Backblaze I set up a bucket to back Leandra up into: Buckets -> Create a Bucket -> named "backups", private access only -> Create Bucket
You should now see something like this:
You don't need to fiddle with anything else here, but take note of the Bucket ID (in this example, 24ee5229206af57d68040c16) because you'll need it later.
Generate a passphrase to encrypt your backed up data with. I wanted something fairly lengthy but still easy to type, so I headed over to the Diceware homepage, grabbed my gaming bag, and generated a pass sentence. A dozen words or so is about right. You won't need to type these over and over because we're going to script the backup to automate it. I highly recommend putting your passphrase into a password manager of some kind for safekeeping, as well as making an offline copy and putting it somewhere safe and not-at-home in case you have to access your backups from somewhere that isn't home during an emergency.
On the "B2 Cloud Storage Buckets" page, you'll need to get your account ID (a 12 hex digit string) and you'll need to generate an application key (like an API key) for Duplicity to use. Note that you can only have one application key per account (generating a new one invalidates the old one). You do this by clicking the "Show Account ID and Application Key" link and following the instructions. Copy and paste the application key someplace safe, because you're going to put it in your backup script.
Duplicity needs someplace to store its state database, where it keeps track of what it has and has not backed up. By default, Duplicity uses ~/.cache/duplicity/; due to the fact that you'll be running Duplicity with sudo so that you can back up files you may not ordinarily have access to (including some system configuration files) this means it uses the /root/.cache/duplicity/ directory. For reasons I went into much earlier (in the other blog post I linked to), I didn't want this so I created another directory that wasn't on solid state storage for both speed and to prolong the life of the drive:
drwho@leandra:(9) ~$ sudo mkdir -p /var/cache/duplicity drwho@leandra:(9) ~$ sudo chown root:root /var/cache/duplicity drwho@leandra:(9) ~$ sudo chmod 0750 /var/cache/duplicity
Now we need to start building a shell script that will run Duplicity once a day and back up directories you're concerned about; let's say for the sake of example that you want to back up /home, /opt, and /var. Create a shell script in your favorite text editor called offsite_backup.sh that starts like this:
#!/bin/bash # Account ID, application key, and bucket. ID="01234567890a" KEY="0123456789abcdef0123456789abcdef0123456789a" BUCKET="blogposttestbucket" # Encryption passphrase for backed up files. PASSPHRASE="This is where the passphrase to encrypt your data goes." echo "Backing up /home." sudo PASSPHRASE=$PASSPHRASE duplicity -v4 --tempdir /var/tmp \ --exclude /home/bots/Downloads \ --exclude /home/network-mounts \ --archive-dir /var/cache/duplicity \ /home b2://$ID:$KEY@$BUCKET/home
If you look at the above commands, you'll note a couple of unusual things:
- PASSPHRASE=$PASSPHRASE just after the sudo command. The PASSWORD environment variable is how you pass an encryption passphrase to Duplicity. It goes right after sudo and right before duplicity to set that variable inside the sudo shell.
- --tempdir /var/tmp - By default, Duplicity uses /tmp for its temp directory, but on most modern systems that exists only in RAM, and during full backups it can bog the system down. For backups I like to use the /var/tmp directory, which you really should create if it doesn't exist already (and if not, something's weird):
drwho@leandra:(9) ~$ sudo mkdir /var/tmp drwho@leandra:(9) ~$ sudo chmod 1777 /var/tmp
- There are a couple of instances of the --exclude option, which tells Duplicity to ignore those directories. This is a handy thing to know so that you don't accidentally back up temporary files or network mounts without doing so deliberately.
- --archive-dir /var/cache/duplicity - This tells Duplicity to look in this directory tree for its state database.
- /home - This is the directory structure you're backing up.
- b2://$ID:$KEY@$BUCKET/home - This is the destination URL to back up to. Note that $ID, $KEY, and $BUCKET are the variables you filled in.
You have account credentials ($ID, $KEY, and $BUCKET) in this shell script, so I strongly recommend that you make it unreadable or executable by anyone else on the system. There really aren't any good ways to store credentials, but I think this is one of the least bad ones.
drwho@leandra:(9) ~$ chmod 0700 offsite_backup.sh
If you run this script it'll execute duplicity, which would then start the backup process on /home. Duplicity creates multiple smaller files, about 210 megs in size each that hold compressed and encrypted chunks of backed up data. They have names like duplicity-full.20180103T065015Z.vol1.difftar.gpg and duplicity-full.20180103T065015Z.vol1006.difftar.gpg in the B2 bucket. If no other backups of a directory structure exist, Duplicity will make a full backup (it'll copy everything). If backup files do exist it'll make an incremental backup (which is to say, it'll copy only what's changed since the last backup). If this is the first time you've run this script it's going to make a full backup, so it's going to take a while depending on how much data you have in /home. For Leandra, the first full backup took about six days(!)
When it's done (or if you just want to move on with this blog post), let's add /var to the backup. Open offsite_backup.sh in a text editor again and add this to the bottom:
echo "Backing up /var." sudo PASSPHRASE=$PASSPHRASE duplicity -v4 --tempdir /var/tmp \ --exclude /var/tmp \ --archive-dir /var/cache/duplicity \ /var b2://$ID:$KEY@$BUCKET/var
Of note is the argument --exclude /var/tmp, which says to Duplicity "Because /var/tmp is your temporary directory, don't back it up because you're going to clean it out when you're done anyway." If you re-run the script it should skip just about all of /home because it's already backed up and not much will have changed there, but it'll make a full backup of your /var directory structure, which will grab (among other things) your system logs, the databases your Linux distro uses to keep track of what packages it does and doesn't have installed, and anything else the system stashes under /var. This may or may not be completely useful, but this is one of those instances where it tends to be more helpful to do so than restore after a system crash and discover that something important wasn't captured.
Something we want to keep on top of is deleting the oldest backup files so that files that don't exist anymore (and outdated copies of files) aren't maintained. This will help to keep the volume of backup data down, and hence keep your monthly bill down. Duplicity keeps track of data in terms of "What backup files do I need to keep that constitute a full backup if the user restores them?", so by telling it to delete the oldest ones it won't wreck your backups. It'll only delete the oldest copies of files that still exist as of the last backup. Open offsite_backup.sh in a text editor and add this to the bottom:
echo "Deleting oldest backups." sudo PASSPHRASE=$PASSPHRASE duplicity remove-older-than 1M -v4 \ --tempdir /var/tmp --archive-dir /var/cache/duplicity \ b2://$ID:$KEY@$BUCKET echo
If you re-run the script it should skip just about all of /home and /var because they're already backed up (though Duplicity will probably make an incremental of /var because some stuff will have changed since the last time it ran) and then it'll try to delete backed up data that's older than a month (the 1M part of the command). There isn't any yet so it won't do anything, but after a couple of months it'll groom the backed up data by deleting backed up copies of files that you don't keep around anymore, older versions of executables from before the last time you ran a system update, old log files, and things of that ilk.
The last thing we want to do is have Duplicity clean up the backup bucket by deleting broken backup files (sometimes a backup file needs to be retransmitted to ensure that it arrives intact) and giving B2 the go-ahead to purge deleted files. Add this command at the end of offsite_backup.sh:
echo "Cleaning up the storage bucket." sudo PASSPHRASE=$PASSPHRASE duplicity cleanup -v4 \ --tempdir /var/tmp --archive-dir /var/cache/duplicity \ b2://$ID:$KEY@$BUCKET echo
There you go. Once you've made a full backup of your system (or the parts of it that you care about, anyway) you can put a reference to this script into a cronjob so it runs every day, every other day, or on any other schedule you might need. I have Leandra run the script once a day so that there's always a fresh copy of everything offsite in case I need it. I have yet to do this with Windbringer due to how long it took a full backup of Leandra to run. I might have to do that the next time I'll be someplace with really good bandwidth for a couple of days. If you want to take a look at a version of the script I use for my offsite backups, it's here in my Github repository. It's a regular shell script with a couple of variables that you should fill in with the correct values. I made it fairly tweakable and extensible - there are evocations of duplicity that should match all the common use cases. It doesn't do anything fancy like dumping databases to back them up, but I've got a couple of tutorials in earlier posts that you might want to reread if that's something you need.
Now, a few of the gotchas that I ran into during my experiments.
There are two ways of using a B2 bucket: The quick and dirty way (which is also the wrong way to make backups), and the smart way. When I first started figuring out Duplicity, I ran my backups like this:
sudo PASSPHRASE=$PASSPHRASE duplicity -v4 --tempdir /var/tmp \ --exclude /home/bots/Downloads \ --exclude /home/network-mounts \ --archive-dir /var/cache/duplicity \ /home b2://$ID:$KEY@$BUCKET/
This backed up /home into the root of the B2 bucket. It also meant that I wasn't able to back up any other directory structure (like /root) into that bucket because Duplicity is picky about that sort of thing. Don't do that.
Duplicity's verbosity seems to be logarithmic. The -v5 switch was too much output to keep track of. I find -v4 is just right for daily use.
The first backup you make (the full backup) is always the longest. Holy cats, is it the longest. I'm wondering how long it'll take to get the rest of my machines backed up. I may have to use Ethernet to get better speed than wireless affords right now.
Leandra's /root directory is on a solid state storage device. Duplicity was puttings its cache files there and using up disk space, which wasn't so bad but when I started thinking about the useful lifespan of the drive and all the deletions from the cache over time, I figured it made more sense to put the Duplicity cache on the RAID array.
Now that I've got a couple of years of backups under my belt I figured that it was high time that I figured out how to delete the oldest files, both the file signature databases on Leandra as well as the backup files themselves at Backblaze. I had to fiddle with it for a bit, but here's the command line that does exactly what I want. Assuming that there is more than one year's worth of backups in a B2 bucket, you would do this:
sudo duplicity remove-all-but-n-full 1 --force -v4 --tempdir /var/tmp --archive-dir /var/cache/duplicity b2://$ID:$KEY@$BUCKET/$DIR
where $DIR is home, opt, or whatever directory structure you've backed up into that part of the bucket.