Feb 10, 2017
Let's say that you want to mirror a website chock full of data before it gets 451'd - say it's epadatadump.com. You've got a boatload of disk space free on your Linux box (maybe a terabyte or so) and a relatively stable network connection. How do you do it?
wget. You use wget. Here's how you do it:
[user@guerilla-archival:(9) ~]$ wget --mirror --continue \
-e robots=off --wait 30 --random-wait http://epadatadump.com/
Let's break this down:
- wget - Self explanatory.
- --mirror - Mirror the site.
- --continue - If you have to re-run the command, pick up where you left off (including the exact location in a file).
- -e robots=off - Ignore robots.txt because it will be in your way otherwise. Many archive owners use this file to prevent web crawlers (and wget) from riffling through their data. Assuming this is sufficiently important, this is what you want to use.
- --wait 30 - Wait 30 seconds between downloads.
- --random-wait - Actually wait for 0.5 * (value of --wait) to 1.5 * (value of --wait) seconds in between requests to evade rate limiters.
- http://epadatadump.com/ - The URL of the website or archive you're copying.
If the archive you're copying requires a username and password to get in, you'll want to add the --user=<your username> and --password=<your password> to the above command line.
Happy mirroring. Make sure you have enough disk space.
Feb 07, 2017
As I've mentioned a few times in the past, diverse parts of my exocortex monitor many different aspects of the world. One of them, called Ironmonger, constantly data mines the global stock markets looking for anomalies. Ordinarily, Ironmonger only triggers when stock trading events greater than three standard deviations hit the market. On Monday, 6 Feb at 14:50:38 hours UTC-0800 (PST), Ironmonger did an acrobatic pirouette off the fucking handle. Massive trades of three different tech companies (Intel, Apple, and Facebook) his the US stock market within the same thirty second period. By "massive," I mean that 3,271,714,562 shares of Apple, 3,271,696,623 shares of Intel, and 2,030,897,857 shares of Facebook all hit the market at the same time. The time_t datestamps of the transactions were 1486421438 (Intel), 1486421431 (Apple), and 1486421442 (Facebook) (I use time.is to convert them back into organic-readable time/date specifiers). I grabbed some screenshots from the Exocortex console at the time - check them out:
Intel ; Apple ; Facebook
The tall blue slivers at the far right-hand edges of each graph represent the stock trades. I waited a couple of hours and took another set of screenshots (Intel, Apple, Facebook) because the graph had moved on a bit and the transaction spikes were much more visible. While my knowledge of the stock market is limited, I have to admit that I've never seen multi-billion share stock trades happen before. Out of curiosity, I took a look at the historical price per share of each of those stocks to see what those huge offers did to them. The answer, somewhat surprisingly, was "not much." Check out these extracts from Ironmonger's memory: Facebook, Intel, and Apple.
Because I am a paranoid and curious sort, I immediately wondered if there was a correlation with the large spike in the Bitcoin transaction fee earlier that day (at 13:19:16 UTC-0800, to be precise). The answer is... probably not. A transaction fee of 2.35288902 BTC (approximately $2510.93us as of 22:32 hours UTC-0800 on 7 February 2017, as I write this article ahead of time), while a sizeable sum that would certainly guarantee that someone's transaction made it into a block at that very instant does not mean that it was involved. There just isn't enough data, but it stands on its own as another anomaly that day. I wish I knew who put those huge blocks of stock up for sale all at once. The only thing they seem to have in common is that they're all listed on the Singularity Index, which is mildly noteworthy.
Anybody have any ideas?
Feb 04, 2017
A couple of weeks ago I ran into some of the functional limits of my web search bot, a bot that I wrote for my exocortex which accepts English-like commands ("Send me top 15 hits for HAL 9000 quotes.") and runs web searches in response using the Searx meta-search engine on the back end. This is to say that I gave my bot a broken command ("Send hits for HAL 9000 quotes.") and the parser got into a state where it couldn't cope, threw an exception, and crashed. To be fair, my command parser was very brittle and it was only a matter of time before I did something dumb and wrecked it. At the time I patched it with a bunch of if..then checks for truncated and incorrect commands, but if you look at all of the conditionals and ad-hoc error handling I probably made the situation worse, as well as much more difficult to maintain in the long run. Time for a rewrite.
Back to my long-term memory field. What to do?
I knew from comp.sci classes long ago that compilers use things called parsers and grammars to interpret code so that it can be converted into an executable. I also knew that the parser Infocom used in its interactive fiction was widely considered to be the best anyone had come up with in a long time, and it was efficient enough to run on humble microcomputers like the C-64 and the Apple II. For quite a few years I also ran and hacked on a MOO, which for the purposes of this post you can think of as a massive interactive fiction environment that the players can modify as well as play in; a MOO's command parser does pretty much the same thing as Infocom's IF parser but is responsive to the changes the user's make to their environments. I also recalled something called a parse tree, which I sort-of-kind-of remembered from comp.sci but because I'd never actually done anything with them, I only recalled a low-res sketch. At least I had someplace to start from so I turned my rebooted web search bot loose with a couple of search terms and went through the results after work. I also spent some time chatting with a colleague whose knowledge of the linguistics of programming languages is significantly greater than mine and bouncing some ideas off of him (thanks, TQ!)
But how do I do something with all this random stuff?
Click for the rest of the article...
Feb 02, 2017
Come out, come out, wherever you are...
Feb 01, 2017
The Magick Poke - noun - When you touch a failing appliance, light bulb, or other gizmo in the just the right way as you're replacing it, and it spontaneously starts working again. This usually saves it from the trashcan or dumpster. Comes from the POKE command in Commodore BASIC which could let you do some pretty strange things by putting just the right value into just the right memory location, usually by fat-fingering a value.
Jan 26, 2017
UPDATE - 20170302 - Added Firefox plugin for the Internet Archive.
UPDATE - 20170205 - Added Chrome plugin for the Internet Archive.
Note: This article is aimed at people all across the spectrum of levels of experience with computers. You might see a lot of stuff you already know; then again, you might learn one or two things that hadn't showed up on your radar yet. Be patient.
In George Orwell's novel 1984, one of his plot points of the story was something called the Memory Hole. They were slots all over the building in which Winston Smith worked, into which documents which the Party considered seditious or merely inconvenient were deposited for incineration. Anything that the Ministry of Truth decided had to go because it posed a threat to the party line was destroyed. This meant that if anyone wanted to go back and double check to see what history might have been, the only thing they could get hold of were "officially sanctioned" documents written to reflect the revised Party policy. Human memory's funny: If you don't have any static representation of something to refer back to periodically, eventually you come to think that whatever people have been telling you is the real deal, regardless of what you just lived through. No mind tricks are necessary, just repetition.
The Net's a lot like that. There are literally piles and piles of information everywhere you look, but most of it resides on systems that aren't yours. This blog is running on somebody else's server, and it wouldn't take much to wipe it off the face of the Net. All it would take is a DMCA takedown notice with no evidence (historically speaking, this is usually the case). This has happened in the past a number of times, including to an archive maintained by Project Gutenberg and documents explicitly placed into the public domain so somebody could try to make a buck off of them. This is a common enough thing that the IETF has made a standard HTTP error code to reflect it, Error 451 - Unavailable for legal reasons.
So, how would you make local copies of information that you think might be pulled down because somebody thought it was inconvenient? For example, climatological data archives?
Click for the rest of the article...
Jan 30, 2017
I hate the word "cyber" but it's in the title.
Download and analyze, please!
Jan 26, 2017
Everything I need to know in life, I learned from reading William Gibson novels.
Jan 29, 2017
Due to extenuating circumstances, I don't think I can keep updating this entry. For the sake of my mental, emotional, and physical health I'm going to let it go. Lifeline, Edison, and other parts of me are going to continue monitoring and archiving the USian political situation but I, the organic core of everything, need to step back and do other things.
In response to reading this tweet, I thought I'd type up the following list, and add links to some stuff I've observed. I'll update it as necessary. List beneath the cut.
Click for the rest of the article...
Jan 28, 2017
-----BEGIN PGP SIGNED MESSAGE-----
I very much want to be wrong.
Within 180 days of 0000 hours UTC, Friday, 27 January 2017, the United States of
America will declare war once again. That puts it at Wednesday, 26 July 2017
at 0000 hours UTC. I do not know for sure, but countries in the Middle East
seem the most likely targets.
This seems due, in part, that the USA seems to be trying to start the Crusades
again (George W. Bush tried once). The Trump administrations' public and
flagrant distrust, disapproval, and seeming pants-shitting-fear of Muslims
around the world would seem to point to this.
This seems due, in part, to the low approval ratings of President Donald J.
Trump. As of 28 January 2017 @ 1454 hours PST8PDT, they are at 36% approval,
44% disapproval on http://www.pollingreport.com/djt_job.htm. I've made a PDF
copy of this file here:
SHA-512 hash of trump_approval_ratings-20170128.pdf:
To jack up his approval ratings (and improve his chances of re-election, if
nothing else) this is a logical national action to take. It certainly worked
for George W. Bush after 9/11 but wasn't enough to sustain his approval ratings
in the long run.
Evidence of the above claim: http://www.gallup.com/poll/116500/presidential-approval-ratings-george-bush.aspx
Local copy of the above claim:
SHA-512 hash of george_w_bush-overall_approval_ratings.pdf:
I have received no orders, demands, or suggestions to write or post this. I
have received no pay or other compensation to write or post this.
I have no insider knowledge.
I have only the patterns of history to suggest this, and I'm seeing everything
happen all over again. And again.
I don't want to be right again.
- --The Doctor [412/724/301/703/415]
-----BEGIN PGP SIGNATURE-----
-----END PGP SIGNATURE-----