There’s this site that has an equipment exchange I wanted to keep track of. Yet, it’s done with what seems to be a custom php file rather than vbulletin, so none of the usual RSS feeds from the site apply to it. So, I decided to make a scraper/feed-generator to get me the latest version every 5 minutes and generate a nice RSS feed, so I can view it in Google Reader. The volume of posting is low enough that this won’t be annoying to see in my daily feeds.
I usually use Ruby for this because it offers Hpricot, a very nice and fast scraper and XPath interface. This time, I resolved to find something that does RSS generation better, and I stumbled upon RubyRSS, which happens to be in the core ruby distribution!
Here’s what I ended up with after about an hour:
Now this is impressive if you look at the fail of html
class attributes coming out of the original page. I had to base everything off of the links to the items that were not images, and then the structure up the tree from there (see the liberal use of
.parent). I’ve rediscovered that Hpricot is awesome (_why, come back to us!), and that it truly only takes 30 lines of code to generate a nice RSS feed in ruby. The resultant RSS feed for MDShooters Classifieds site is here.
And now, yet another RSS feed generator: MD Super Ads
Here’s the code: