Tag Archives: wget

wget mirroring with external references

I was having trouble mirroring a website that had all its images hosted on a different domain, which happened to be random subdomains of cloudfront.net.  I tried adding *.cloudfront.net to the -D parameter, but that didn’t work.  It turns out it’s smart enough to figure out that all subdomains in the domain list should be included as well:

wget -mkpEK -D www.allshepherdrescue.org,cloudfront.net -H -t 3 \
     --restrict-file-names=windows http://www.allshepherdrescue.org/

This goes into mirror mode, changes relative links to the proper form, fixes the query string urls to static ones, and downloads all files from the domains in the -D parameter.  The manpage details all of this.