LOCKSS and Git.
The archival community has a saying: LOCKSS. Lots Of Copies Keep Stuff Safe.
Ultimately, if you trust someone else to hold your data for you there is always a chance that the service can disappear, taking your stuff with it. A notorious case in point is Google - the Big G has terminated so many useful services that there is an online graveyard dedicated to them. Some years ago a company called Code Spaces, which was in pretty much the same business as Github was utterly destroyed in an attack. Whoever cracked them got into their Amazon EC2 control panel left a ransom note, and when the company investigated the attackers wiped everything, from the virtual machines to the code repositories to the backups. Everybody lost everything.
While anybody who's cloned a Git repo can, in theory reconstruct the project anywhere they want, there are still repercussions of a hosted project suddenly vanishing. For starters, it's demoralizing as hell. If you lost your project hosting it's a real kick betwixt wind and water. Collaborators on the project may (and in the past, have) kicked the project in the head and given up after such a loss. Additionally, the value provided by a project hosting outfit lies in the bug tracker integrated with the code repository, occasionally the wiki, and integration with CI/CD (continuous integration/continuous delivery) pipelines. While there are software packages out there that integrate all of these things with the code repository, like Fossil, good luck getting anybody to start using them.
So, what can you do?
Of course, you can always stand up your own project hosting software someplace. There are some excellent alternatives out there like Gitea or Gitosis, but a mere application just doesn't go far enough because you have to use it correctly as well. Plus, you have to figure out just how you're going to use them. So, here's what I did:
Let's take my public repository of Huginn networks as an example. It's on Github, which is simultaneously the de facto hub of the open source community these days as well as a potential single point of failure. So on Leandra (a machine I control, because she's installed in my office) I set up a bare Git repository (slightly reformatted for clarity):
{22:49:18 @ Sat Dec 19}
[drwho @ leandra:(3) ~]$ mkdir exocortex-agents
{15:04:10 @ Sun Dec 20}
[drwho @ leandra:(3) ~]$ cd exocortex-agents/
{15:04:13 @ Sun Dec 20}
[drwho @ leandra:(3) exocortex-agents]$ git init --bare
Initialized empty Git repository in /home/drwho/exocortex-agents/
{15:04:17 @ Sun Dec 20}
[drwho @ leandra:(3) exocortex-agents]$ ls -alF
drwxr-xr-x drwho drwho 98 B Sun Dec 20 15:04:17 2020 ./
drwxr-xr-x drwho drwho 4.6 KB Sun Dec 20 15:04:10 2020 ../
drwxr-xr-x drwho drwho 0 B Sun Dec 20 15:04:17 2020 branches/
.rw-r--r-- drwho drwho 66 B Sun Dec 20 15:04:17 2020 config
.rw-r--r-- drwho drwho 73 B Sun Dec 20 15:04:17 2020 description
.rw-r--r-- drwho drwho 23 B Sun Dec 20 15:04:17 2020 HEAD
drwxr-xr-x drwho drwho 460 B Sun Dec 20 15:04:17 2020 hooks/
drwxr-xr-x drwho drwho 14 B Sun Dec 20 15:04:17 2020 info/
drwxr-xr-x drwho drwho 16 B Sun Dec 20 15:04:17 2020 objects/
drwxr-xr-x drwho drwho 18 B Sun Dec 20 15:04:17 2020 refs/
Then I set up a Git remote which, as the name implies is a Git repository accessed remotely (i.e., on a different machine).
{15:05:14 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ git remote add leandra
ssh://leandra/home/drwho/exocortex-agents
{15:09:01 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ git remote -v
git.hackers.town ssh://git@git.hackers.town:2222/drwho/exocortex-agents.git (fetch)
git.hackers.town ssh://git@git.hackers.town:2222/drwho/exocortex-agents.git (push)
gitlab git@gitlab.com:virtadpt/exocortex-agents.git (fetch)
gitlab git@gitlab.com:virtadpt/exocortex-agents.git (push)
leandra ssh://leandra/home/drwho/exocortex-agents (fetch)
leandra ssh://leandra/home/drwho/exocortex-agents (push)
origin git@github.com:virtadpt/exocortex-agents.git (fetch)
origin git@github.com:virtadpt/exocortex-agents.git (push)
If you look at the above output, you'l note that I have multiple remotes for that code repository. The new one (leandra) I just added breaks down like this:
leandra ssh://leandra/home/drwho/exocortex-agents (fetch)
leandra ssh://leandra/home/drwho/exocortex-agents (push)
leandra
- The name of the remote. You refer to it by name for convenience.ssh://
- The remote is accessed over SSH, so I can work with it at home.leandra
- Leandra's hostname./home/drwho/exocortex-agents
- Full path to the repository on Leandra.(fetch)
- This means that I can pull from this copy of the repo with the URL on that line.(push)
- This means that I can also push to that copy of the repo with the URL on that line.
At the moment it's empty. Let's fix that.
{15:17:05 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ git push leandra
X11 forwarding request failed
Enumerating objects: 67, done.
Counting objects: 100% (67/67), done.
Delta compression using up to 12 threads
Compressing objects: 100% (67/67), done.
Writing objects: 100% (67/67), 27.90 KiB | 1.16 MiB/s, done.
Total 67 (delta 35), reused 0 (delta 0), pack-reused 0
To ssh://leandra/home/drwho/exocortex-agents
* [new branch] master -> master
Now there is a full copy of the repo in question on Leandra. Let's test it.
{15:18:26 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ cd ~/tmp
{15:18:27 @ Sun Dec 20}
[drwho @ windbringer tmp] () $ git clone ssh://leandra/home/drwho/exocortex-agents
Cloning into 'exocortex-agents'...
X11 forwarding request failed
remote: Enumerating objects: 67, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (67/67), done.
remote: Total 67 (delta 35), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (67/67), 27.90 KiB | 595.00 KiB/s, done.
Resolving deltas: 100% (35/35), done.
{15:18:41 @ Sun Dec 20}
[drwho @ windbringer tmp] () $ cd exocortex-agents/
{15:18:45 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ ls
butterfly-in-china.json searx-answering-api-examples.json
coronavirus-news-agents.json shake-rattle-and-roll.json
demo-weather-forecaster.json test-matrix-integration.json
elephant.json test-scenario.json
mastodon-integation-demo.json tripwire.json
README.md twitter-activity-monitor.json
sample-rss-feed-consumer.json user_credentials.json
searcherizer.json
There we go.
As you can see from earlier that particular project has a bunch of remotes. Now, when I'm working in a repository I have to push updates to each and every one of them. I could push to each one in sequence but that kind of sucks as a workflow because it's easy to forget things. There's an easier way that someone showed me else.net (I wish I could remember whom - please ping me and I'll credit you). When you use Git you can set up a .gitconfig file in your home directory to set some personal defaults. Here's mine:
{15:23:02 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ cat ~/.gitconfig
[user]
email = drwho at virtadpt dot net
name = The Doctor
signingkey = 0x807B17C1
[push]
default = simple
[alias]
pushall = !git remote | xargs -L1 -P0 git push --all --follow-tags
The [user]
and [push]
bits are there because Git yells at you if they're not set, which is a bit of a misfeature as far as I'm concerned. But it is what it is. It's the [alias]
block that is of interest to us. Here's what it means when you break it down:
pushall
- The name of the newgit
command to create.!git
- Run the commandgit
in a subshell.remote
- List just the names of the configured remotes, without their URLs.|
- Run the output into another command.xargs
- A basic command line utility (manpage) that basically means "for every thing you pass me that is separated by a newline or whitespace, I will do the following thing to it."-L1
- Take at most one full line from the input toxargs
at a time.-P0
- Run as many processes simultaneously as possible. This basically amps offxargs
runs. You probably don't need this but I find it handy.git push
- Push new commits.--all
- Push all branches with new commits, all at once.- This is a thing folks usually do at work. If it's just you there isn't really much of a need for this. The command line option won't hurt anything, though.
--follow-tags
- Also push all annotated tags that have any changes.- Same. If you use tags, you know. If you don't use tags, don't worry.
Once the above line is in your ~/.gitconfig
file you can use it regardless of what you're working on. Let's try it out:
{15:25:36 @ Sun Dec 20}
[drwho @ windbringer exocortex-agents] () $ git pushall
X11 forwarding request failed
Everything up-to-date
Host key fingerprint is SHA256:nThb...
Host key fingerprint is SHA256:HbW3...
Host key fingerprint is SHA256:IyW9...
X11 forwarding request failed on channel 1
X11 forwarding request failed on channel 1
Everything up-to-date
X11 forwarding request failed on channel 1
Everything up-to-date
Everything up-to-date
As you can see I just pushed all of my changes (there weren't any at the moment I wrote this, but just pretend there were) to all three remotes. The output is a little out of order due to the -P0
argument to xargs
, but that's okay.
20210214 - NOTE - I think I found where I learned about this trick
And there we go. I hope you find this useful. Happy hacking!