Turtles All the Way Down: Bootstrapping an operating system.

Mar 10, 2014

Now we need an operating system for the trusted, open source computer. As previously mentioned, Windows and MacOSX are out because we can't audit the code, and it is known that weaponized 0-days are stockpiled by some agencies for the purpose of exploitation and remote manipulation of systems, and are also sold on the black and grey markets for varying amounts of money (hundreds to multiple thousands of dollars). It has been observed by experts many a time that software being open source is not a panacea for security. It does, however, mean that the code can be audited for bugs (vulnerabilities and otherwise) via any number of methods, from manual inspection to dynamic analysis. It also means that the OS and its associated applications can be ported to platforms it's not already available on. And, let's face it, the chances that Microsoft will port Windows to our trusted open source platform are Slim and None, and Slim's in his hotel room in Berlin recovering from CCC.

So, what are our options?

Linux is always a good place to start. It's been ported to some pretty strange hardware in the past and there is even at least one full distribution for the LatticeMico32 that we could make use of if we had to. If, previously, the project went with an ARM architecture then we could use a distribution like Arch Linux for ARM, Debian, or Slackware for ARM. If it followed the MIPS path then it seems probable that one of the distros from this extensive list could be used. It is also conceivable that we could repurpose one of the more popular embedded distros of linux, such as OpenWRT as the software platform for this project if we had to (fun fact: OpenWRT has been ported to the x86 platform even though it's actually meant for building embedded devices). From these links, it seems reasonable to state that if we pick a sufficiently developed and popular hardware platform there is a good chance that a distro will already be available for it.

Moving a little farther afield of the penguin, there are other options available to this project. FreeBSD has already been ported to a couple of embedded(-ish) systems like the Raspberry Pi and the Beaglebone ARM development board. That's a good sign, though it also implies that the software is dictating the hardware again. OpenBSD is already something of a wildcard; love it though I do, it's probably not a good fit for our project unless we show up with a complete set of diffs for the OpenBSD source tree that adds full support for our trusted computing platform. Even then, someone that did so might be turned away. We need to make the best use of our time on an ongoing basis, so OpenBSD's probably off the table. The least-well publicized of the BSDs, NetBSD, has the motto is "Of course it runs NetBSD," because it was designed from the bottom up to be portable to as many platforms as possible. Consequently it runs on some pretty exotic hardware like the Alpha, Super-H, ARC, NeXT 68k, and the Sharp Zaurus. It's also readily portable to embedded systems, which arguably our trusted and open platform is if we're talking CPUs emulated on FPGAs and Systems-On-A-Chip.

NetBSD also makes a point of having a proactive security community and model without needing hype, which may be one of the reasons that it's not quite as well known, so we should not discount it without further research. As a platform it is still under heavy development so it's not going away anytime soon. If NetBSD isn't already ported to our trusted reference platform then chances are it should be pretty easy to do with a little work. I'll not make any suggestions here; I would personally go with whatever works the best. If there is a distribution of Linux already available for our platform then let's use it. If there isn't then we have a few options to explore: We can try to find a distro that runs on a similar platform and try to make it work on our hardware. If it doesn't work we can try to port it over, which might be as simple as recompiling the source tree with our trusted crossdev toolchain. Or not. Or we can try something else suitably creative, which I leave as an exercise to the reader because I'm still working on my first cup of coffee of the day. Alternatively we can try a BSD on our trusted and open hardware platform, probably NetBSD due to its stated goals of extreme portability. It does not stand to reason that somebody would go to all the trouble of designing and debugging a CPU or full system for an FPGA without having anything to run on it, even as a proof of concept to build on top of later.

What? You're not a fan of Linux or BSD? Too bad. Today's threat model explicitly states that you can't trust any closed-source software because many companies are strongly motivated to backdoor their code. Besides, there is too much money to be made in selling 0-day vulnerabilities for many hackers to bother disclosing what they find. Doubly so in today's job market, where it's not unusual for highly skilled people to be out of work for several years at a time. A single 0-day can easily bring in a month's wages. That said, Windows and MacOSX are right out. If our software is open source, then not only do we have a better chance of fiding and fixing vulnerabilities before they become a problem but we also have additional security options that many closed-source OSes don't, such as setting hard and soft limits on resource usage, Address Space Layout Randomization, jails, and others. It's a very long list and I won't go into all of them. There are lots of options open to us.

Now for the obligatory discussion of a famous paper, Ken Thompson's Reflections on Trusting Trust (Cheers!), which describes an attack in which a compiler is boobytrapped to do to things: Detect when certain software (like OpenSSH's sshd or the Linux kernel's TCP/IP stack) is being compiled and insert a backdoor of some kind into it, and detect whenever it's compiling a copy of itself, determine if the compiler has the backdoor injector or not, and put the injector back if it doesn't. The theory is that if the backdoor injector is discovered in the compiler, then the injector can be removed and the compiler suite can be recompiled with the backdoored toolchain... only the backdoor would be silently re-added to the now-considered-clean compiler suite causing a chicken-and-egg problem. It is a problem. It is also a problem with a solution.

Dr. David A. Wheeler wrote as his Ph.D dissertation a paper called Fully Countering Trusting Trust Through Diverse Double Compiling (local mirror). The paper provides a formal proof and a technique by which a potentially backdoored compilation suite can be detected by compiling it from source code multiple times with multiple untrusted (and independent) compilers. The process does not assume or require deterministic builds, which most compiler suites do not generate by default because, at a minimum, compilers tend to put unique data into binaries to identify them, like the date and time of compilation, configuration variables from the compilation environment, varying amounts of debugging information... it's kind of weird when you think about it because it's non-obvious, but there you have it. When carrying out the process the source code (which we can read and understand) and the generated executable are compared and differences (like boobytraps) are noted.

First, our assumptions:
  • We have the source code to Compiler A.
  • We have the source code to Compiler B.
  • We have a working build of Compiler A, which we don't necessarily trust.
  • We have a working build of Compiler B, which we don't necessarily trust.
  • Compilers A and B are totally different. They're unrelated to one another but do the same thing, i.e. compile the same language into executables appropriate to our environment.
We can probably use more than two compilers in this process. It'll take much longer and the process will have to be planned in advance but there will be a higher probability that one or more of the compilers used will not have been tampered with. Happily, there are many different F/OSS compilers available to us, GCC, Clang/LLVM, TCC, and PCC, to name a few. It's possible that someone wrote their own equivalent compiler in a language that we didn't expect, like Perl, Go, or JavaScript. If they exist we can use one or more of those to compile our canary-in-the-coal-mine compiler as well. It's not too far-fetched a notion, certainly no more so than our discussing the construction of a general purpose computer from the ground up. There are people who write emulators and development toolchains for fictitious CPUs when the fictions haven't yet been written for fun. We have options.

Bruce Schneier did an excellent job of summarizing the technique, which I'll reiterate briefly here using his text as a base (because his overview is what helped me make sense of the paper). Here is roughly, how the diverse compilation process would work:
  • Build the source code to Compiler A with Compiler A. This results in Executable X, which is a new copy of Compiler A.
  • Build the source code to Compiler A with Compiler B. This results in Executable Y, which is another new copy of Compiler A.
  • RESULT: Two copies of compiler A from two different sources which are functionally equivalent to one another, but not bit-for-bit identical.
  • Build the source code to Compiler A again with Executable X. This results in Executable V, which is another new copy of Compiler A.
  • Build the source code to Compiler A yet again with Executable Y. This results in Executable W, which is another new copy of Compiler A.
  • Compare Executables V and W.
  • If V and W are bit-for-bit identical, then there is a high probability that your toolchain is not compromised.
  • If V and W are not bit-for-bit identical (because they were compiled with the same compiler and the same options), then one or more of your compilers are up to something sketchy.
It's a huge paper (199 pages) but one well worth sitting down and reading. Summaries just aren't enough. Dr. Wheeler published his proof of concept shell script for executing these tests on his website. Glancing at it, it seems fairly well written, and would make an excellent base for a script that meets our purposes nicely. As with many things these days, it's a trust problem. We can mitigate the risk of attackers boobytrapping toolchains by picking compilers from all through the set of all development environments like a frog jumping around on a hotplate. An attacker can't possibly get to all of them before we do, statistically not all versions of any one tool, and if it comes down to it we can use a couple of older toolchains on the hypothesis that newer ones are more likely to be compromised than older ones. All you need is one un-spiked compiler out of the set of all compilers to be able to detect if the generated executables are functionally identical or if one or more are potentially backdoored. At this point, a good test of our probably-not-backdoored toolchain is to try to compile the OS that we intend to run on our trusted and open computing platform. I don't know what OS you would want to run, so the best advice I can give is to start with the OS' developers' documentation. Figure out how they do a cross-compilation build and get to it. If you're concerned that the environment you're cross-compiling on is going to misuse you trusted crossdev toolkit to backdoor the code anyway, you can and should repeat the Wheeler process on our platform before it's ever hooked up to a network. If you need to (and, let's face it, crossdev kits aren't a cakewalk) consider using an open and auditable tool like Scratchbox to set up a trusted cross-compilation chain. I haven't really looked at it but I wish I'd known about it in a previous life...

Time to install our recompiled-for-paranoia's-sake OS. There are two ways that I can see to go about this, which are to build new installation media (like an .ISO image or a file system image that can be pressed onto a USB key), or you can plug one or more storage devices for the trusted and open computing platform into the machine you're bootstrapping from and do a manual install using a process along the lines of Linux From Scratch. I'm not going to go over this process because they literally wrote the book on it. If you opted to rebuild the installation media, now's the time to boot it on the open and trusted computing platform and install it. If you're not already familiar with alternative operating systems, start by reading their documentation, at the very least reading over the installation process twice. By rebuilding the installation media for a distro you've basically done a remaster on it, and if you did it right their installation should work normally. Chances are you'll want to set up encrypted file systems as well. This is when you would do so using the instructions for whatever OS you're installing (Linux, BSD, et cetera). Install as minimal a set of packages as feasible; this is so that the new system has as small a vulnerability footprint as can be managed. Fewer features mean fewer potential vulnerabilities, which is important at this stage of the game. If all goes according to plan (and for the purposes of this series of articles, no unexpected difficulties will have been encountered that could not be overcome) our trusted open workstation should boot to a login prompt.

Login as the root user. Now harden the OS you just installed. I shouldn't have to say this. There are lots of hardening guidelines out there for open source operating systems, from Ubuntu Linux to OpenBSD. Google them, save a local copy, read through the document at least once before you try anything, and then follow the process. If you're feeling sufficiently motivated (or hate doing the same thing over and over again), write a shell script as you go that implements as much of the hardening process as is feasible. You may wish to consider setting up a Git repository to put the configuration files that changed during hardening under verseion control and copy them into appropriate places in the repo. Some people suggest turning your whole /etc directory into a Git repository with an app like etckeeper or metastore. I personally think this is a good idea because it becomes easy to determine if any of the files have been altered, and if so when and how they were altered. Then use software like AIDE or OSSEC to take a snapshot of the state of your file system so you can detect if anything has been changed without your knowing it. Configure a local firewall to deny all traffic that isn't a response to an outbound request, like Ubuntu does. Run as few services listening on the local network as you can get away with. If this sounds like InfoSec 101 stuff, that's because it is. You'd be horrified at the number of entities (people and corporations alike) that don't bother.

All locked down? Good. Now install all extant patches for your OS (unless you downloaded and compiled the latest and greatest of everything). This will probably mean plugging it into your network at this point, but if you're hardcore enough to want to keep your new system airgapped, you'll have to develop processes by which you can sneakernet data onto and out of your isolated system using removable media in such a way that you run less of a risk of getting gigged. This includes diffs, updated packages, and source code. Now start patching. The OS you selected undoubtedly has a toolkit for managing packages, which includes system updates. The construction process for this trusted and open computing platform involved recompiling the installation binaries from scratch with a more-trusted compiler, so we're talking tarballs full of source code. Unpack them or install diffs as appropriate, then start compiling. Alternatively, it should be possible to cross-compile the updates on our bootstrap workstation using the more-trusted crossdev toolchain that we built earlier. It's likely an incredible pain to do this but we went so far as to build our own computers from the ground up, so if integrity is to be maintained that's the route we're going to have to take. This is why a BSD style system updating process is a better fit for what we're trying to accomplish, they've got it down to a science and it takes only a few commands to get everything updated. If you opted to use a BSD for our hypothetical trusted and open computing platform, then there isn't much you have to worry about here so long as the compilation toolchain we built that we're pretty sure hasn't been backdoored is the one used.

If you didn't rebuild your own distro-native packages (say, you used LFS to compile everything from scratch) then read what they have to say about package management and decide what steps you're going to take. The gotcha, and it's a big one is that we compiled everything with a toolchain that we're pretty sure hasn't been backdoored to boobytrap anything we compile with it, so if we want to maintain that sense of trust then we have to keep using it for everything that gets installed, from security updates to new applications. This is where I think I have to advise against airgapping the systems built - to keep the system updated (as well as install new applications) we have to get the necessary code onto the system, and downloading it (via SSL/TLS and checking the cryptographic signatures wherever possible) is the most workable way. If installing updates isn't reasonably easy most people (even folks like us) won't bother, which means vulnerabilities tend to pile up and increase risk of compromise over time. That seems like a pretty big waste of effort to me, going to the lengths discussed in this series and then not maintaining it. The Linux From Scratch FAQ has some suggestions, one of which boils down to "If you did it right, it's safe to compile and install the new version over the old version," which is true but you'll want to keep an eye on things to make sure that cruft doesn't build up.

After pondering it a little bit, this is what I came up with: When you compile the stuff on your root partition - the most basic of systemware - that is one package. root.tar.bz2, if you like. All of that stuff goes under / as one would expect. Everything else, from a native development environment to scripting languages to your SSH client are compiled and turned into installable packages using one of the methods described, be it Arch Linux style .tar.xz files, one of the package formats supported by Checkinstall, installing every package into a separate subdirectory and then symlinking as appropriate (which is a huge pain in the ass and easy to get wrong, I've done it), or some other way that I'm forgetting. At this point I have to leave it up to you, because everyone has a different style of package management and different needs. Speaking only for myself, if I didn't already have a package management system in place I'd use Checkinstall to make Slackware-style .tgz files (which I've done in the past) or if I had enough processing power the newer .txz files. I've used it to build Debian style .deb packages, too.

More thoughts on this will be in the final post in this series.

This work by The Doctor [412/724/301/703][ZS] is published under a Creative Commons By Attribution / Noncommercial / Share Alike v3.0 License.