Saving stuff before it vanishes down the memory hole.

31 January 2017

UPDATE - 20170302 - Added Firefox plugin for the Internet Archive.

UPDATE - 20170205 - Added Chrome plugin for the Internet Archive.

Note: This article is aimed at people all across the spectrum of levels of experience with computers.  You might see a lot of stuff you already know; then again, you might learn one or two things that hadn't showed up on your radar yet.  Be patient.

In George Orwell's novel 1984, one of his plot points of the story was something called the Memory Hole. They were slots all over the building in which Winston Smith worked, into which documents which the Party considered seditious or merely inconvenient were deposited for incineration.  Anything that the Ministry of Truth decided had to go because it posed a threat to the party line was destroyed.  This meant that if anyone wanted to go back and double check to see what history might have been, the only thing they could get hold of were "officially sanctioned" documents written to reflect the revised Party policy.  Human memory's funny: If you don't have any static representation of something to refer back to periodically, eventually you come to think that whatever people have been telling you is the real deal, regardless of what you just lived through.  No mind tricks are necessary, just repetition.

The Net's a lot like that.  There are literally piles and piles of information everywhere you look, but most of it resides on systems that aren't yours.  This blog is running on somebody else's server, and it wouldn't take much to wipe it off the face of the Net.  All it would take is a DMCA takedown notice with no evidence (historically speaking, this is usually the case).  This has happened in the past a number of times, including to an archive maintained by Project Gutenberg and documents explicitly placed into the public domain so somebody could try to make a buck off of them.  This is a common enough thing that the IETF has made a standard HTTP error code to reflect it, Error 451 - Unavailable for legal reasons.

So, how would you make local copies of information that you think might be pulled down because somebody thought it was inconvenient?  For example, climatological data archives?

The fastest and easiest way is to use your web browser.  Just about every web browser today can save web pages as PDF files which you can open with software like Acrobat Reader (remember that?) or your web browser (most modern browsers can display PDF files natively).  For Firefox, Iceweasel and Internet Explorer, hit control-p and select "Print to file."  The same goes for Google Chrome and Chromium.  You can then copy these PDF files to offline storage to archive them, if you so choose.

Because I'm predominently a Firefox user, my go-to utility for saving anything is an add-on called Scrapbook; it's one of the first five add-ons I install, in point of fact.  Whenever I come to an interesting web page that I want to keep a copy of (which comes in handy when I'm working on something but I'm on an airplane flight that doesn't have wi-fi) I reflexively right-click someplace inside the page and select Save Page, and then pick a folder to drop the page into.  I've lost count of the number of web pages I've saved this way, easily a couple of thousand.  There seem to be two similar extensions for Chrome called PageArchiver and SingleFile.  I've tried the former (incidentally, it depends on the add-on SingleFile Core) and I've yet to get it to work. I haven't tried the latter add-on yet.  The Internet Archive has released a Wayback Machine addon for Chrome which does two things: First, if you try to visit a page that doesn't exist anymore (including one that returns an Error 451) it'll offer to show you the latest copy in the Wayback Machine if it has one.  Second, it lets you submit the page you're looking at to the Wayback Machine so it can be archvied.  I experimented with this addon a bit but it seems to work pretty well.  There is also an official version of the add-on for Firefox.  For saving video footage (like much of the mirrored street and news footage scattered all over my website) I use Video Download Helper.  When I come to a page that has a video I want to save (like Youtube) I click on the icon in the bottom-right corner, select the highest resolution video (width by height), and pick a directory to save it to.  There is a similar add-on for Chrome called Video Downloader Professional, but I haven't tried it yet.

I don't know if there are similar plug-ins for Internet Explorer or Safari.  I've not used either of them.

If you're not concerned that they'll go away anytime soon, or be asked to remove certain archived entries (see also HTTP Error 451) there are some online archives that you can ask to mirror single web pages and make them available for posterity.  The first is - drop in a URL or use their handy bookmarklet (which I've used often to good effect) and they'll make a complete snapshot of the web page at the URL.  If the page in question already exists it'll tell you so along with how old the copy is; you have the option to replace it with a newer snapshot.  You can also search their archive for a page and capture the shortened URL of their copy of the page to return to it later.  The Internet Archive also has a page to which you can submit a URL (look for "Save Page Now" on the page) and one of their bots will make a snapshot of it when they reach it in their input queue.  Of course, you can also use one of the above personal methods to make local copies of those snapshots.

If you're feeling adventurous you can copy the content into a page on a public site, such as somebody's public (or private) Etherpad instance.  If you've never used it, Etherpad is basically an online, collaborative text editor not too different from Google Docs.  Everything you'd expect of a WYSIWYG text edit is there, although multiple people can edit the same document at the same time, and there is a chat function (which you can toggle on and off) so the editors can talk through a side channel.  The downside of this is that anybody who finds the page can change its for whatever reason, and that means that the contents of the document can be altered without your knowing it (though you can also roll the page back to an earlier point in time of its history).  If you're going to go the Etherpad route, make sure it's your own or run by somebody you trust and is behind a password.  You could also copy-and-paste the contents of the page into a site like Pastebin, where you can't edit the page once you save it, but on the other hand a page can also mysteriously vanish if someone demands it of the people running it.

So, there you have it.  If you feel that you need to save news articles, video footage, or press releases from the Memory Hole, here are the resources you can use to do it.

Never forget the way things used to be, so things to come will not seem normal.