Guerilla archival using wget.

Let's say that you want to mirror a website chock full of data before it gets 451'd - say it's epadatadump.com.  You've got a boatload of disk space free on your Linux box (maybe a terabyte or so) and a relatively stable network connection.  How do you do it?

wget.  You use wget.  Here's how you do it:

[user@guerilla-archival:(9) ~]$ wget --mirror --continue \
    -e robots=off --wait 30 --random-wait http://epadatadump.com/

Let's break this down:

  • wget - Self explanatory.
  • --mirror - Mirror the site.
  • --continue - If you have to re-run the command, pick up where you left off (including the …
Read more...