Gargantuan file servers and tiny operating systems.

02 May 2017

We seem to have reached a unique point in history: Available to your average home user are gargantuan amounts of disk space (8 terabyte hard drives are a thing, and the prices are rapidly coming down to widespread affordability) and enough processing power is available for the palm of your hand that makes the computational power that put the human race on the moon compare in the same was that a grain of sand does to a beach.  For most people, it's the latest phone upgrade or more space for your media box.  For others, though, it poses an unusual challenge: How to make the best use of the hardware without wasting it needlessly.  By this, I mean how one might build a server that doesn't result in wasted hard drive space, wasted SATA ports on the mainboard, or having enough room to put all of that lovely (and by "lovely" I really mean "utterly disorganized") data that accumulates without even trying.  I mentioned last year that I rebuilt Leandra (specs in here) so I could work on some machine learning and search engine projects.  What I didn't mention was that I had some design constraints that I had to follow so that I could get the most out of her.

To get the best use possible out of all of those hard drives I had to figure out how to structure the RAID, where to put the guts of the Arch Linux install, and most importantly figure out how to set everything up so that if Leandra did blow a hard drive the entire system wouldn't be hosed.  If I partitioned all of the drives as described here and used one as the /boot and / partitions, and RAIDed the rest, if the first drive blew I'd be out an entire operating system.  Also, gauging the size of the / partition can be tricky; I like to keep my system installs as small as possible and add only packages that I absolutely need (and ruthlessly purge the ones that I don't use anymore).  20 gigs is way too big (currently, Leandra's OS install is 2.9 gigabytes after nearly a year of experimenting with this and that) but it would leave room to grow.

Decisions, decisions.

So, what did I finally decide on?

What I did was purchase a high-end USB3 flash drive from Amazon and use that as Leandra's / device, with both /boot and / combined.  I saw no need to separate them at this time, contrary to best practice.  The flash drive in question is plenty fast, very stable, and designed to stay plugged into a USB3 port all the time.  It's also bootable (not always a sure thing these days, but most of the time if you didn't get a flash drive as convention swag you can boot from it).  I did have to make some allowances, vis a vis it's a 32GB flash drive and not something a bit more suited for my usual minimalist installs, but it's also getting very hard to find smaller but more reliable flash drives.  I did say that "20 gigs was way too big" but putting everything on a separate storage drive simplifies things in the long run.  Needs must, after all.  I then built a RAID-5 (data is striped across every data drive in the array for read speed, four data drives, one parity drive for data reconstruction in emergencies, one hot-spare drive to reconstruct to in case one drive dies) out of the six 4TB hard drives plugged into Leandra's mainboard.  Total capacity: 14.55 terabytes of storage space (thus sayeth the RAID calculator.  This RAID is where all the files that get updated fairly often on a server go; systemware gets installed onto the / device and doesn't change very often, only when updates are installed.  This also prolongs the life of the flash drive.  To get the best use out of the RAID (which is to say, not creating actual partitions that are hard to manipulate when needed) I set up Logical Volume Management, which functionally acts like partitioning but lets me restructure the file systems when and if I need to.  Ultimately, I put four logical volumes (think of them like fake partitions) on the RAID because the stuff in them changes fairly frequently:

  • /home - Where all of my crap goes. (traditional)
  • /opt - Where I put software that didn't come as an Arch Linux package. (traditional)
  • /srv - Where web apps and web content seem to go these days.
  • /var - Where system logs, databases, and stuff that the OS updates often goes. (traditional)

This also means that if and when the flash drive packs it in, everything I really care about is on the RAID so it'll take less time to reconstruct everything.  This then brings up the question, "What happens when the flash drive packs it in?  Are you going to perform major exploratory surgery to figure out what to reinstall, how, and rebuild all of the config files under /etc?"  To be honest, in the past I used to do that to keep my skills sharp.  A good system is one that you don't need to reconfigure very often.  However... I'm really tired of doing that.  I've got better things to do with my time.  So, here's what I did, seeing as how I've got more disk space on the RAID than I know what to do with: I just backed it up.  Common sense, right?  15TB RAID, 3GB Linux install.  That's a drop in the bucket (and a significantly small fraction of my personal mail spool, to be honest).  First, let's figure out what directories are in the / partition that actually have files in them:

  • /boot
  • /etc
  • /usr

What directories are in the / partition that don't have files in them, but need to be there for other reasons?

  • /home, /opt, /srv, /var - Mountpoints for stuff on the RAID.  This'll save me time fumbling around in the future.
  • /bin, /sbin - Symlinks to the directory /usr/bin, which is an eccentricity of Arch Linux.
  • /lib, /lib64 - Symlinks to the directory /usr/lib, which is another eccentricity of Arch Linux.
  • /dev, /proc, /run, /sys, /tmp - Mountpoints for virtual filesystems like tmpfs and sysfs.  You get a broken system if these aren't present.
  • /root - The root user's home directory.  Always handy (and essential during a system crisis).
  • /mnt - Generic, universal "Mount this gizmo here" directory.  Useful to have laying around.

Creating these was as simple as using mkdir and creating a few symlinks, the work of a minute.

This is what the contents of my ~/backups/boot_drive directory look like:

[drwho@leandra: boot_drive]$ ls -alF
total 64
drwxr-xr-x 16 drwho drwho 4096 Jan  1 17:20 ./
drwx------ 14 drwho drwho 4096 Apr 15 16:29 ../
lrwxrwxrwx  1 drwho drwho    7 Jan  1 17:19 bin -> usr/bin/
drwxr-xr-x  3 root  root  4096 Mar 28 20:22 boot/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 dev/
drwxr-xr-x 62 root  root  4096 Apr 14 20:46 etc/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 home/
lrwxrwxrwx  1 drwho drwho    7 Jan  1 17:20 lib -> usr/lib/
lrwxrwxrwx  1 drwho drwho    7 Jan  1 17:20 lib64 -> usr/lib/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 mnt/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 opt/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 proc/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 root/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 run/
lrwxrwxrwx  1 drwho drwho    7 Jan  1 17:20 sbin -> usr/bin/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 srv/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 sys/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 tmp/
drwxr-xr-x  8 root  root  4096 Apr 14 20:46 usr/
drwxr-xr-x  2 drwho drwho 4096 Jan  1 16:37 var/

Restoring to a new flash drive is as simple as using cp -R after booting from an Arch Linux installation key and bringing the RAID back online.  Getting access to the RAID in the event of reconstruction would require just two commands which take advantage of automatic probing and configuration data written to the drives themselves:

mdadm --detail --scan >> /etc/mdadm.conf
mdadm --assemble --scan

The actual work of backing up the flash drive for preservation is done by a shell script that runs once a day and uses rsync to update the backed up files only if anything has changed:

cd ~/backups/boot_drive
sudo rsync -ah /boot/ boot/
sudo rsync -ah /etc/ etc/
sudo rsync -ah /usr/ usr/

I'm certain there are other ways to structure a server; as many as their are sysadmins.  For what it's worth, this way seems to work pretty well for me because it minimizes waste as much as possible, is as modular as possible, and has enough redundancy to prevent a total system meltdown in the event one or two components flame out.  As for adding more disk space to the RAID when bigger drives get cheap enough and need appears... I have no idea.  I've got ideas but nothing that I've tested, so I'll burn that bridge when I come to it.

If you've been using Linux for a while and you're getting ready to make the jump from "hosted VPS" or a RaspberryPi to building and running a full-scale server, I hope that my notes were somewhat helpful in terms of helping you make some important decisions about things that might cause you some problems in the long run.  I also hope that I was able to present one possible answer to the eternal and surprisingly practical question of "What do I do with all this disk space?"  If nothing else, I made some notes for myself to use later in case I have to revisit the process.