Migrating to Restic for offsite backups.

  backblaze backups duplicity libreops linux sysadmin restic

20201023: UPDATE: Added command to clean the local backup cache.

20200426: UPDATE: Fixed the "pruned oldest snapshots" command.

A couple of years back I did a how-to about using a data backup utility called Duplicity to make offsite backups of Leandra to Backblaze B2. (referrer link) It worked just fine; it was stable, it was easy to script, you knew what it was doing.  But over time it started to show its warts, as everything does.  For starters, it was unusually slow when compared to the implementation of rsync Duplicity uses by itself.  I spent some time digging into it and benchmarking as many functional modules as I could and it wasn't that.  The bottleneck also didn't seem to be my network link, as much as I may complain about DSL from AT&T.  Even after upgrading Leandra's network interface it didn't really fix the issue.  Encryption before upload is a hard requirement for me but that didn't seem to be bogging backup runs down either upon investigation.  I even thought it might have been the somewhat lesser read performance of RAID-5 on Leandra's storage array adding up, which is one of the reasons I started using RAID-1 when I upgraded her to btrfs.  That didn't seem to make a difference, either.

Ultimately I decided that Duplicity was just too slow for my needs.  Initial full backups aside (because uploading everything to offsite storage always sucks), it really shouldn't take three hours to do an incremental backup of at most 500 megabytes (out of over 30 terabytes).  On top of that, Duplicity's ability to rotate out the oldest backups... just doesn't seem to work. I wasn't able to clean anything up automatically or manually.  Even after making a brand-new full backup (which I try to do yearly regardless of how much time it takes) I wasn't able to coax Duplicity into rotating out the oldest increments and had to delete the B2 bucket manually (later, of course).  So I did some asking around the Fediverse and listed my requirements.  Somebody (I don't remember whom, sorry) turned me on to Restic because they use it on their servers in production.  I did some research and decided to give it a try.

When you get right down to it, working with Restic isn't that different from working with Duplicity.  I was able to make relatively minor edits to the existing backup script on Leandra and drop Restic in with little difficulty.  I'll intersperse brief snippets of the replacement backup script in order through this article to help you bootstrap a new backup script of your own if you want to do the same thing I did.  Installing Restic was as trivial as Duplicity was:

[drwho @ leandra:(4) ~]$ sudo pacman -S restic

Restic doesn't have any dependencies, so there wasn't any figuring out how to make sure a B2 service module was installed.  That was another thing I kept running into after upgrading Leandra's systemware occasionally.

The setup part of the backup script isn't that different from before:

#!/bin/bash
# This is a backup script.  There are many out there like it,
# but this one is mine.

# Lockfile to prevent multiple simultaneous runs.
LOCKFILE="/tmp/.leandra_backup_lock"

# Account ID, application key, and bucket.
export B2_ACCOUNT_ID="12345"
export B2_ACCOUNT_KEY="67890"
export BUCKET="restic"

# Encryption passphrase for backed up files.
export PASSPHRASE="Joey-Hardcastle-is-the-real-hero-of-Hackers."

# Temporary directory because /tmp is a ramfs.
export TMPDIR="/var/tmp"

# Test for a running btrfs scrub job.
echo "Testing for a running btrfs scrub..."
sudo btrfs scrub status /btrfs > /dev/null
if [ $? -gt 0 ]; then
    echo "A btrfs scrub is running.  Terminating offsite backup."
    exit 1
else
    echo "btrfs scrub not running.  Proceeding."
    echo
fi

# Test for a running btrfs balance job.
echo "Testing for a running btrfs balance..."
sudo btrfs balance status /btrfs > /dev/null
if [ $? -gt 0 ]; then
    echo "A btrfs rebalance is running.  Terminating offsite backup."
    exit 1
else
    echo "btrfs rebalance not running.  Proceeding."
    echo
fi

# Test for the lockfile.  If it exists, abort because a backup job is still
# running.
if [ -f $LOCKFILE ]; then
    echo "ERROR - offsite backup is already running.  Terminating."
    exit 1
fi

# Set the lockfile.
touch $LOCKFILE

echo "Beginning system backup to Backblaze."
echo -n "Starting date and time: "
date
echo

Now some code to make a backup of the /home volume on Leandra:

echo "Backing up /home, which also backs up the boot drive."
sudo RESTIC_PASSWORD=$PASSPHRASE \
    B2_ACCOUNT_ID=$B2_ACCOUNT_ID \
    B2_ACCOUNT_KEY=$B2_ACCOUNT_KEY \
    TMPDIR=$TMPDIR \
    restic -r b2:$BUCKET:home/ \
    -o b2.connections=55 \
    --cache-dir /var/cache/restic \
    --exclude /home/bots/Downloads \
    --exclude /home/network-mounts \
    backup /home
echo
echo

That's it.  Repeat and modify as necessary for other parts of your file system.

The only thing really of note in the above command is the -o b2.connections=55 part.  What that does is tell Restic's B2 subsystem to use at most 55 simultaneous connections to Backblaze when uploading data.  You can adjust this number up or down as you deem necessary; 55 seems to work pretty well for Leandra.

Now for the "clean out the oldest data in the collection part of the backup run.  I have to admit, this part hasn't been fully tested yet because I haven't been using Restic for longer than two years (my usual lifetime for offsite backups) but running it manually (which doesn't actually delete anything) seemed to work just fine:

echo "Forgetting the oldest snapshots."
sudo RESTIC_PASSWORD=$PASSPHRASE \
    B2_ACCOUNT_ID=$B2_ACCOUNT_ID \
    B2_ACCOUNT_KEY=$B2_ACCOUNT_KEY \
    TMPDIR=$TMPDIR \
    restic -r b2:$BUCKET:home/ \
    -o b2.connections=55 \
    --cache-dir /var/cache/restic \
    forget --prune --keep-within 2y

Now the boilerplate stuff at the end of the backup script:

# Delete the lockfile.
rm -f $LOCKFILE

echo -n "Ending date and time: "
date

exit 0

That's pretty much it.  You now have all of the parts you need to assemble your own Restic backup script.

So, about that backup speed?  It's significantly faster than Duplicity ever was.  The initial full backup of Leandra still took about two weeks running nonstop day and night because of the sheer volume of data I have stored on Leandra.  However, making an incremental backup that totals about 500 megs every morning finishes in not more than an hour under normal circumstances.  This includes Restic scanning all of the files in the backup target to figure out which ones changed since the last backup run, compressing them into data blocks (a lot of the data is already compressed so it can't do much there), and uploading them to Backblaze.  Making a not-daily incremental backup of Windbringer that consists of about 875 megabytes (out of two terabytes) finished in about 36 minutes flat.  Windbringer has a pretty fast solid state drive, granted, but Duplicity used to take about four hours to backup the same volume of data on the same hardware with the same drive and network link.  I'd call this a win.

If you've been running your backup script for a while, eventually you'll start seeing Restic complain about old cache directories, especially if you've had to control-c your backup script a few times.  Here is one such complaint from Windbringer during his last backup:

...
Backing up /boot.
repository 28598212 opened successfully, password is correct
found 7 old cache directories in /var/cache/restic, run `restic cache --cleanup` to remove them
...

It took some tinkering but I figured out how to fix this problem (non-problem, really, it's largely harmless if visually annoying).  Here's what the command looks like for a reasonably small backup repository (formatted for clarity, output included):

sudo RESTIC_PASSWORD=$PASSPHRASE \
    B2_ACCOUNT_ID=$B2_ACCOUNT_ID \
    B2_ACCOUNT_KEY=$B2_ACCOUNT_KEY \
    TMPDIR=$TMPDIR \
    restic -r b2:restic-windbringer:boot/ \
    --cache-dir /var/cache/restic \
    prune --cleanup-cache

repository 28598212 opened successfully, password is correct
counting files in repo
building new index for repo
[0:05] 100.00%  64 / 64 packs
repository contains 64 packs (286 blobs) with 221.868 MiB
processed 286 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 12 snapshots
[0:00] 100.00%  12 / 12 snapshots
found 286 of 286 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0 B
counting files in repo
[0:01] 100.00%  64 / 64 packs
finding old index files
saved new indexes as [c8802696]
remove 10 old index files
done
{12:30:02 @ Fri Oct 23}
[drwho @ windbringer ~] () $ 

This command seems to generalize nicely, and under ideal circumstances could be added to your backup script just before the commands to prune the oldest backups in the repository.  However, the amount of time it takes may vary depending upon how large the volume of data backed up to that particular repository is, so it may not be wise to run this command all the time.  In which case, adding it to your backup script but keeping it commented out to serve as documentation may be a good idea.  Or you could just write a clean_out_restic_cache.sh script that you execute only as needed.