I usually use Ruby for this because it offers Hpricot, a very nice and fast scraper and XPath interface. This time, I resolved to find something that does RSS generation better, and I stumbled upon RubyRSS, which happens to be in the core ruby distribution!
Here’s what I ended up with after about an hour:
Now this is impressive if you look at the fail of html
class attributes coming out of the original page. I had to base everything off of the links to the items that were not images, and then the structure up the tree from there (see the liberal use of
.parent). I’ve rediscovered that Hpricot is awesome (_why, come back to us!), and that it truly only takes 30 lines of code to generate a nice RSS feed in ruby. The resultant RSS feed for MDShooters Classifieds site is here.
And now, yet another RSS feed generator: MD Super Ads
Here’s the code: