Leave nothing to chance.

Sep 04, 2009

Something that I keep meaning to write about is the topic of practical data backups - how to back your data up in such a way that you won't go bonkers trying to manage it, but if you blow a drive you'll be able to restore something at least. The thing about backups is that they're at once easy to overthink and confuse yourself horribly (which means that you'll never make or use them) and easy to do in such a fashion that they won't be usable when you need them the most. At the enterprise level, there are at least a dozen information backup and archival packages out there - you don't need these. You also don't need a tape drive to back your data up. I've used them for my personal systems, and they're just as much a pain in the ass at home as they are at work; that's why there are people in IT whose primary job is to wrangle tape drives.

At the risk of sounding like I'm shilling for Western Digital, if you run Windows on your personal workstation, I can't recommend the WD Sync application (which is really a rebrand of Dmailer Sync) highly enough. Setting it up is trivial: plug the drive in and double-click on WD Sync.exe. Set up a profile and enter a strong passphrase when it asks you for it. Make sure that all of your applications (like your web browser and mail client) are closed and turn it loose. If you ever need to restore your data, plug the drive in and re-sync back to the hard drive. It really is that simple.

But not so much for Linux or BSD machines.. you need to put a little thought into how you're going to back everything up. First of all, don't just run your backup from / (the root directory) and hope things will turn out. Chances are you'll grab a bunch of directories that you really don't need to worry about, like /dev, /proc, and /sys. Second, if you're running a database server on the machine in question you can't just back up the database files when the database server is running because the files will be open, and thus in an inconsistent state. If you try to restore them they will probably not work. Third, pick your software carefully. After a lot of mucking around with packages like AFbackup and writing my own backup scripts (complete with storage management and error detection), none of which work exactly the way I want them to, my boss turned me on to rdiff-backup, which is easily the hottest thing since sliced bread. Not only can it copy data to backup media but it can also perform incremental backups by storing only the differences between older versions of backed up files, so you can restore from a particular point in time if necessary.

Using rdiff-backup requires a two-step process: first, purging the oldest backups on the storage media to free up space, and second, actually backing everything up. Here is how I carry out the first step on Leandra, who gets backed up to a USB hard drive mounted at location $BACKUP_DEST:
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/bin
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/boot
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/dev
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/emul
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/etc
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/home
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/lib32
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/lib32
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/lib64
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/opt
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/root
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/sbin
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/usr
rdiff-backup --force --remove-older-than 3W $BACKUP_DEST/var

The --remove-older-than 3W directive means that any and all backups of the specified directory structures that were made longer than three weeks previous to now should be deleted to free up disk space. The --force option is required for the purge operation to run successfully in the event that a couple of backup runs were missed - --remove-older-than 3W will complain and fail if it has to remove more than a day's worth of data.

Now actually backing up the data to $BACKUP_DEST:
rdiff-backup /bin $BACKUP_DEST/bin
rdiff-backup /boot $BACKUP_DEST/boot
rdiff-backup /dev $BACKUP_DEST/dev
rdiff-backup /emul $BACKUP_DEST/emul
rdiff-backup /etc $BACKUP_DEST/etc
rdiff-backup /home $BACKUP_DEST/home
rdiff-backup /lib32 $BACKUP_DEST/lib32
rdiff-backup /lib64 $BACKUP_DEST/lib64
rdiff-backup /opt $BACKUP_DEST/opt
rdiff-backup /root $BACKUP_DEST/root
rdiff-backup /sbin $BACKUP_DEST/sbin
rdiff-backup /usr $BACKUP_DEST/usr
rdiff-backup --exclude /var/lib/mysql /var $BACKUP_DEST/var

Note the final command - the directory /var/lib/mysql doesn't get backed up due to the aforementioned database file problem. What you have to do to get a usable database backup is dump the database into a SQL file somewhere that gets backed up normally. This is usually done just after the purge of older backups (technically, this should be the real step 2) but requires some minor modifications to the databases.

I really only have experience with MySQL so you're on your own if you run Postgres or Oracle personal edition, but the principle should be the same. What I did was create a user in the database called backup@localhost (username 'backup', can only log in from Leandra) requiring a strong passphrase to log in which has a very restricted set of privileges on every database and every table in each database: SELECT, LOCK TABLES, and RELOAD, to be specific. These privileges are sufficient to dump everything to a file using the mysqldump utility. Depending on how you go about it, you'll have to add the following to the /etc/my.cnf or /etc/mysql/my.cnf file and set it to mode 0600 (only the file's owner can read its contents):

In your backup script between the purge of the old backups and the start of the heavy lifting you'll have to put the following commands:
mysqldump --add-drop-table --all-databases | bzip2 -c > /path/to/mysql_backup_file.sql.bz2
chmod 0600 /path/to/mysql_backup_file.sql.bz2

This will dump the contents of every database and run them through the bzip2 utility to compress the file and save space. So long as this file isn't inside the /var/lib/mysql directory, it'll be picked up automatically when the backups are made.

Restoring the last set of backups made with rdiff-backup is simple - copy the files from the storage media to where they need to be. To restore an older backup, you'll need to use a command along the lines of rdiff-backup --restore-as-of YYYY-MM-DD /mnt/backup/usr /usr to restore from a particular calendar date or rdiff-backup --restore-as-of 1W3D /mnt/backup/usr /usr (to restore files backed up ten days ago). When restoring from a MySQL backup dump, you'll have to use a command string like the following: bzip2 -dc /path/to/mysql_backup_file.sql.bz2 | mysql -u root -p -

When prompted, enter the password for the database software's 'root' user. This command will have to be re-executed for every database to be restored.