Tag: datarescue

  1. Guerilla archival using wget.

    11 February 2017

    Let's say that you want to mirror a website chock full of data before it gets 451'd - say it's epadatadump.com.  You've got a boatload of disk space free on your Linux box (maybe a terabyte or so) and a relatively stable network connection.  How do you do it?

    wget.  You use wget.  Here's how you do it:

    [user@guerilla-archival:(9) ~]$ wget --mirror --continue \
        -e robots=off --wait 30 --random-wait http://epadatadump.com/

    Let's break this down:

    • wget - Self explanatory.
    • --mirror - Mirror the site.
    • --continue - If you have to re-run the command, pick up where you left off (including the …