Let's say there's a website that you want to make a local mirror of. This means that you can refer to it offline, and you can make offline backups of it for archival. Let's further state that you have access to some server someplace with enough disk space to hold the copy, and that you can start a task, disconnect, and let it run to completion some time later, with GNU Screen for example. Let's further state that you want the local copy of the site to not be broken when you load it in a browser; all the links …
Let's say that you want to mirror a website chock full of data before it gets 451'd - say it's epadatadump.com. You've got a boatload of disk space free on your Linux box (maybe a terabyte or so) and a relatively stable network connection. How do you do it?
wget. You use wget. Here's how you do it:
[user@guerilla-archival:(9) ~]$ wget --mirror --continue \ -e robots=off --wait 30 --random-wait http://epadatadump.com/
Let's break this down:
- wget - Self explanatory.
- --mirror - Mirror the site.
- --continue - If you have to re-run the command, pick up where you left off (including the …