I was having trouble mirroring a website that had all its images hosted on a different domain, which happened to be random subdomains of cloudfront.net. I tried adding *.cloudfront.net to the -D parameter, but that didn’t work. It turns out it’s smart enough to figure out that all subdomains in the domain list should be included as well:
This goes into mirror mode, changes relative links to the proper form, fixes the query string urls to static ones, and downloads all files from the domains in the -D parameter. The manpage details all of this.
There’s this site that has an equipment exchange I wanted to keep track of. Yet, it’s done with what seems to be a custom php file rather than vbulletin, so none of the usual RSS feeds from the site apply to it. So, I decided to make a scraper/feed-generator to get me the latest version every 5 minutes and generate a nice RSS feed, so I can view it in Google Reader. The volume of posting is low enough that this won’t be annoying to see in my daily feeds.
I usually use Ruby for this because it offers Hpricot, a very nice and fast scraper and XPath interface. This time, I resolved to find something that does RSS generation better, and I stumbled upon RubyRSS, which happens to be in the core ruby distribution! Continue reading →