Guerilla archival using wget.

  datarefuge howto datarescue wget mirroring linux

Let's say that you want to mirror a website chock full of data before it gets 451'd - say it's epadatadump.com.  You've got a boatload of disk space free on your Linux box (maybe a terabyte or so) and a relatively stable network connection.  How do you do it?

wget.  You use wget.  Here's how you do it:

[user@guerilla-archival:(9) ~]$ wget --mirror --continue \
    -e robots=off --wait 30 --random-wait http://epadatadump.com/

Let's break this down:

  • wget - Self explanatory.
  • --mirror - Mirror the site.
  • --continue - If you have to re-run the command, pick up where you left off (including the …
Read more...

#datarefuge in the Bay Area - 11 February 2017.

  announcement archival climate_change data datarefuge guerilla

UPDATE: 20170131 - The Eventbrite page for this event has gone live!  Sign up!

I haven't had time to write about #datarefuge yet, in part because people a lot closer to the matter have been doing so, and much better than I could at the moment.  An entire movement has arisen around scientific data being 451'd because it's politically inconvenient, and not many of us know if it's being erased or just shut down.  We also don't know for certain if it's being copied elsewhere for safekeeping so we're doing it ourselves.  To do my part, I've been communicating with some …

Read more...