Technomancer Tools: Creating a local web archive with Chrome and PageArchiver.

Sep 24 2017

Some time ago I wrote an article of suggestions for archiving web content offline, at the very least to have local copies in the event that connectivity was unavailable.  I also expressed some frustration that there didn't seem to be any workable options for the Chromium web browser because I'd been having trouble getting the viable options working.  After my attempt at fixing up Firefox fell far short of my goal (it worked for all of a day, if that) I realized that I needed to come up with something that would let me do what I needed to do.  I installed Chromium on Windbringer (I'm not a fan of Chrome because Google puts a great deal of tracking and monitoring crap into the browser and I'm not okay with that) and set to work.  Here's how I did it:

First I spent some time configuring Chromium with my usual preferences.  That always takes a while, and involved importing my bookmarks from Firefox, an automated process that took several hours to run.  I also exported everything I had cached in Scrapbook, which wound up taking all night.  I then installed the SingleFile Core plugin for Chrome/Chromium, which does the actual work of turning web pages open in browser tabs into a cacheable single file.  I restarted Chromium, which I probably didn't need to do but I really wanted a working solution so I opted for caution and then installed PageArchiver from the Chrome store and restarted Chromium again.  This added the little "open file folder" icon to the Chromium menu bar.  The order the add-ons are installed in seems to matter, add SingleFile Core first if you do nothing else.

Now get ready for me to feel stupid: If you want to store something using PageArchiver, click on the file folder icon to open the PageArchiver pop-up, click "Tabs" to show a list of tabs you have open in Chromium/Chrome, click the checkboxes for the ones you want to save, and then hit the save button.  For systems like Windbringer which have extremely high resolution screens, that save button may not be visible.  You can, however, scroll both horizontally and vertically in the PageArchiver pop-up panel to expose that button.  I didn't realize that before so I never found that button.  That's all it took.

Here's what didn't work:

I can't import my Scrapbook archives because they're sitting in a folder on Windbringer's desktop as a couple of thousand separate subdirectories, each of them containing all of the web content for a single web page.  I need to figure out what to do there.  It may consist of writing a utility that turns directories full of HTML into SQL commands to inject them into PageArchiver's SQLite database which, by default, resides in the directory $HOME/.config/chromium/Default/databases/chrome-extension_ihkkeoeinpbomhnpkmmkpggkaefincbn_0 (the directory name is constant; the jumble of letters at the end is the same as the one in the Chrome Store URL) and has the filename 2 (yes, just the number 2).  You can open it up with the SQLite browser of you choice if you wish and go poking around.  Somebody may have come up with a technique for it and I just haven't found it yet, I don't know.  I may not be able to add them in any reasonable way at all and have to resort to running an ad-hoc local web server with Python or something if I want to access them, like this:

[drwho@windbringer ~]$ python2 -m SimpleHTTPServer 8000

Cleaning up Firefox... somewhat.

Sep 04 2017

Chances are you're running one of two major web browsers on the desktop to read my website - Firefox or Google's Chrome.

Chrome isn't bad; I have to use it at work (it's the only browser we're allowed to have, enforced centrally).  In point of fact, I'd have switched to it a long time ago if it wasn't for one thing.  I make heavy use of a plugin for Firefox called Scrapbook Plus, which make it possible to take a full snapshot of a web page and store it locally so that it can be read offline, annotated, and full-text searched.  I never count on having connectivity (I live in the United States, after all, and right now my home connection is running quite poorly and has been for several days due to an ongoing situation at my local CO) so I try to keep both essential documentation and reading material in general stored locally for those dry periods.  However, there is no port of Scrapbook Plus for Chrome, nor is there a workable equivalent addon for same (I think I've tried them all).  I'm not about to do without my traveling hoard of information (which at this time numbers around 10,000 unique web pages and 15 gigabytes of disk space).  Out of desperation last night I did some research into how I might be able to speed up Firefox just a little and get more use out of it until I figure out what to do.  Here's what I found: