Migrating to Restic for offsite backups.

Apr 11 2020

20200426: UPDATE: Fixed the "pruned oldest snapshots" command.

A couple of years back I did a how-to about using a data backup utility called Duplicity to make offsite backups of Leandra to Backblaze B2. (referrer link) It worked just fine; it was stable, it was easy to script, you knew what it was doing.  But over time it started to show its warts, as everything does.  For starters, it was unusually slow when compared to the implementation of rsync Duplicity uses by itself.  I spent some time digging into it and benchmarking as many functional modules as I could and it wasn't that.  The bottleneck also didn't seem to be my network link, as much as I may complain about DSL from AT&T.  Even after upgrading Leandra's network interface it didn't really fix the issue.  Encryption before upload is a hard requirement for me but that didn't seem to be bogging backup runs down either upon investigation.  I even thought it might have been the somewhat lesser read performance of RAID-5 on Leandra's storage array adding up, which is one of the reasons I started using RAID-1 when I upgraded her to btrfs.  That didn't seem to make a difference, either.

Ultimately I decided that Duplicity was just too slow for my needs.  Initial full backups aside (because uploading everything to offsite storage always sucks), it really shouldn't take three hours to do an incremental backup of at most 500 megabytes (out of over 30 terabytes).  On top of that, Duplicity's ability to rotate out the oldest backups... just doesn't seem to work. I wasn't able to clean anything up automatically or manually.  Even after making a brand-new full backup (which I try to do yearly regardless of how much time it takes) I wasn't able to coax Duplicity into rotating out the oldest increments and had to delete the B2 bucket manually (later, of course).  So I did some asking around the Fediverse and listed my requirements.  Somebody (I don't remember whom, sorry) turned me on to Restic because they use it on their servers in production.  I did some research and decided to give it a try.

When you get right down to it, working with Restic isn't that different from working with Duplicity.  I was able to make relatively minor edits to the existing backup script on Leandra and drop Restic in with little difficulty.  I'll intersperse brief snippets of the replacement backup script in order through this article to help you bootstrap a new backup script of your own if you want to do the same thing I did.  Installing Restic was as trivial as Duplicity was:

[drwho @ leandra:(4) ~]$ sudo pacman -S restic

Restic doesn't have any dependencies, so there wasn't any figuring out how to make sure a B2 service module was installed.  That was another thing I kept running into after upgrading Leandra's systemware occasionally.

The setup part of the backup script isn't that different from before:

#!/bin/bash
# This is a backup script.  There are many out there like it,
# but this one is mine.

# Lockfile to prevent multiple simultaneous runs.
LOCKFILE="/tmp/.leandra_backup_lock"

# Account ID, application key, and bucket.
export B2_ACCOUNT_ID="12345"
export B2_ACCOUNT_KEY="67890"
export BUCKET="restic"

# Encryption passphrase for backed up files.
export PASSPHRASE="Joey-Hardcastle-is-the-real-hero-of-Hackers."

# Temporary directory because /tmp is a ramfs.
export TMPDIR="/var/tmp"

# Test for a running btrfs scrub job.
echo "Testing for a running btrfs scrub..."
sudo btrfs scrub status /btrfs > /dev/null
if [ $? -gt 0 ]; then
    echo "A btrfs scrub is running.  Terminating offsite backup."
    exit 1
else
    echo "btrfs scrub not running.  Proceeding."
    echo
fi

# Test for a running btrfs balance job.
echo "Testing for a running btrfs balance..."
sudo btrfs balance status /btrfs > /dev/null
if [ $? -gt 0 ]; then
    echo "A btrfs rebalance is running.  Terminating offsite backup."
    exit 1
else
    echo "btrfs rebalance not running.  Proceeding."
    echo
fi

# Test for the lockfile.  If it exists, abort because a backup job is still
# running.
if [ -f $LOCKFILE ]; then
    echo "ERROR - offsite backup is already running.  Terminating."
    exit 1
fi

# Set the lockfile.
touch $LOCKFILE

echo "Beginning system backup to Backblaze."
echo -n "Starting date and time: "
date
echo

Now some code to make a backup of the /home volume on Leandra:

echo "Backing up /home, which also backs up the boot drive."
sudo RESTIC_PASSWORD=$PASSPHRASE \
    B2_ACCOUNT_ID=$B2_ACCOUNT_ID \
    B2_ACCOUNT_KEY=$B2_ACCOUNT_KEY \
    TMPDIR=$TMPDIR \
    restic -r b2:$BUCKET:home/ \
    -o b2.connections=55 \
    --cache-dir /var/cache/restic \
    --exclude /home/bots/Downloads \
    --exclude /home/network-mounts \
    backup /home
echo
echo

That's it.  Repeat and modify as necessary for other parts of your file system.

The only thing really of note in the above command is the -o b2.connections=55 part.  What that does is tell Restic's B2 subsystem to use at most 55 simultaneous connections to Backblaze when uploading data.  You can adjust this number up or down as you deem necessary; 55 seems to work pretty well for Leandra.

Now for the "clean out the oldest data in the collection part of the backup run.  I have to admit, this part hasn't been fully tested yet because I haven't been using Restic for longer than two years (my usual lifetime for offsite backups) but running it manually (which doesn't actually delete anything) seemed to work just fine:

echo "Forgetting the oldest snapshots."
sudo RESTIC_PASSWORD=$PASSPHRASE \
    B2_ACCOUNT_ID=$B2_ACCOUNT_ID \
    B2_ACCOUNT_KEY=$B2_ACCOUNT_KEY \
    TMPDIR=$TMPDIR \
    restic -r b2:$BUCKET:home/ \
    -o b2.connections=55 \
    --cache-dir /var/cache/restic \
    forget --prune --keep-within 2y

Now the boilerplate stuff at the end of the backup script:

# Delete the lockfile.
rm -f $LOCKFILE

echo -n "Ending date and time: "
date

exit 0

That's pretty much it.  You now have all of the parts you need to assemble your own Restic backup script.

So, about that backup speed?  It's significantly faster than Duplicity ever was.  The initial full backup of Leandra still took about two weeks running nonstop day and night because of the sheer volume of data I have stored on Leandra.  However, making an incremental backup that totals about 500 megs every morning finishes in not more than an hour under normal circumstances.  This includes Restic scanning all of the files in the backup target to figure out which ones changed since the last backup run, compressing them into data blocks (a lot of the data is already compressed so it can't do much there), and uploading them to Backblaze.  Making a not-daily incremental backup of Windbringer that consists of about 875 megabytes (out of two terabytes) finished in about 36 minutes flat.  Windbringer has a pretty fast solid state drive, granted, but Duplicity used to take about four hours to backup the same volume of data on the same hardware with the same drive and network link.  I'd call this a win.