If Microsoft buys Github, there are alternatives.

Jun 03 2018

If you're plugged into the open source or business communities to any degree, you've probably heard buzz that Microsoft is considering buying Github, an online service with a history of having a toxic work environment due to pervasive sexual harassment but still remains the de facto core of collaboration of the open source community - source code hosting, ticket tracking, archival, release management, documentation, project webpage hosting, and generally learning how to use the Git version control system.  At this point it's unclear if they're considering merely investing in the company (currently valued in the neighborhood of $5bus) or buying it outright, the way they did LinkedIn.  Github is certainly an attractive property for Microsoft to consider: The service currently has something like 23 million user accounts and 1.5 million organizations.  I don't think anybody's tried to count the lines of code that Github stores and serves copies of.  It's been observed that Microsoft seems to be carrying out a strategy of controlling as many of the access points to the tech job market.  Not only is Github a highly useful service for managing software projects, but if you're trying to get a job in a technical field having a Github account and a couple of repositories is practically a pre-requisite.

There's also the issue that at least some parts of Microsoft have no qualms against stealing things they think will be useful and filing the identifying features off (local mirror), and fuck the license.  By this, I refer to Learna.  But now I'm getting a little off-track.

As one might imagine, once word got around people began expressing their intention to bail on Github if the takeover went through.  Not that there are no alternatives to Github which not only have many of the same features but are self-hosted, meaning that all you need to do is get an inexpensive virtual machine someplace, install the package, set up backups (you DO back your stuff up, right?), pull your stuff out of Github (easy to do because just about everything is a Git repository), and then push it all back up to your new server.  This is possible because when you clone a Git repository, you get the entire history of the repo - every change ever made, from the very first gets copied to your workstation.  This means that if you then do a `git push` to a new repository, you're effectively making a backup copy of the entire thing to that new remote.  This also means that if there is even a single copy of a Git repository someplace, you can reconstitute the entire project.  This is how I maintain multiple copies of my projects' source code repos simultaneously.  Among these self-hosted alternatives to Github are Gitlab (which is a bit of a bear to maintain, I'm told), Gogs, Gitea, and even Keybase's Git support.

There is, however, another option that I'd like to talk about a little, which I think would be a good alterantive to Github.  It's called Fossil.

Git aside, Github made it big because it provides a couple of services that people find incredibly useful: A ticketing system for tracking the lifecycle of bugs (including closing them with a commit), a way to track pull requests, an easy way to fork a repository, and wikis to make notes, write documention, and what have you.  To really be a working replacement for Github, any alternatives need to have comparable features to be viable.

Fossil is unique, not only in that it offers all of these features in a single executable file available for all of the usual operating systems (the server software is the same as the client software), but all of the content of those tools is stored alongside the project's codebase.  So, if you clone a Fossil repo, you not only get the source code it's managing, you also get everything in the project's bug tracker, you get everything in the project's wiki, and you also get everything in the project's technotes, which are kind of like a CMS for announcements, kind of like project milestone notes, and kind of like a blog.  This means that you don't need to be online to interact with the project because you take it all with you, including the web interface.  When you push changes up to a Fossil server, you push all of the changes you made to those tools, and when you pull from a server you synchronize them locally.  This also means that any copy of Fossil can be a project's core server, until you decide that it's not anymore.  If I wanted to, if I was at a hackathon I could create a Fossil repo and expose the server on Windbringer so that everybody (including myself) could use it to manage the project, go offsite to get coffee, and push the entire project to my own Fossil server to back it up, then return and tell everybody the URL to my server.  If somebody decided that they didn't trust my server (or it kept running into bandwidth caps) they could stand up their own Fossil server, push the project up to it, and everybody could use that one instead.  At the risk of misquoting a certain movie, you are the core server, and the core server is you.

Fossil also aims to be as lightweight as possible.  As I mentioned before, the entire application (because it's both a client and a server depending on how you use it) is a single executable, so all you have to do is download it and stick it someplace.  No installer, no splatting files all over the place, just one file.  It also doesn't use any complicated protocols to do its job.  Fossil supports something called autosync mode, which means that you can configure your local Fossil repository to automatically pull from and push to another Fossil server.  This was a design decision meant to replace Git's fork-and-merge workflow which works most of the time, right up until it doesn't, usually at the worst possible moment.

Git and Fossil, however, use very different structures for their repositories.  Ultimately, Git repos are a multitude of files that are kept in a hidden directory inside the repo named, unsurprisingly, .git/.  Everything is implemented as a file.  Fossil, on the other hand, takes a different approach.  Every Fossil project (because it's more than just a source code repository, more than just a wiki... you get the picture) is stored in a single SQLite database file.  I don't want to go into the weeds talking about SQLite, so I'll just point you to the Wikipedia page (which is surprisingly accurate).  Suffice it to say that you have probably been using SQLite for many years, day in and day out, and you probably don't even know it.  When you clone a project, you're basically making a copy of a single SQLite database file, and when you update your local project you're updating a local database.  Using a database also means that everything has a degree of integrity checking in place automatically, unlike Git.

Fossil is also a very efficient piece of software.  In case you're curious, the main Fossil website is a Fossil repo, served by a copy of the Fossil software.  The project eats its own dogfood, which is one of the reasons that I both recommend and use it.

Depending on how things go in the future, I might write a tutorial for pulling the various components of a project out of Github and migrating them into a Fossil repository.  It seems like a useful thing to know how to do.

Anyway, I think I've rambled enough.  If you're considering jumping ship from Github in the near future, I suggest looking into Fossil as an alternative for managing your project.  It's an interesting and mature piece of software, and I think it's worth investigating as an alternative to using centralized services for managing projects.  Why not download it and play with it a little to see how you like it?