Malicious enjoyment derived from observing someone else’s misfortune
 

Tag Archives: amazon

OSCON Sessions, Day 2

Oh man, what a day. I attended quite a few talks, grabbed a lot of swag, and entered a few contests. I ended up buying the Arduino Starter Kit from MAKE so I can do some awesome embedded Ruby like I saw at FOSCON. It looks really fun – I can’t wait to try it out. The talks I attended were half-way decent, but I learned a lot more on the first day. Hadoop and EC2 A good overview of [...]

Parsing Amazon with Hpricot

_why made a really sweet HTML parser called Hpricot. This allows you to easily parse a remote document using Open-URI. Here’s how to do it: require ‘rubygems’ require_gem ‘hpricot’ require ‘open-uri’ puts "Grabbing Page…" html = open("http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155") puts "Parsing…" doc = Hpricot.parse(html) (doc.search("//table//td[@id='prodImageCell']")/:img).each do |link| p link.attributes end {"src"=>"http://ec1.images-amazon.com/images/P/1844300439.01._AA240_SCLZZZZZZZ_V54614147_.jpg", "border"=>"0", "id"=>"prodImage", "height"=>"240", "alt"=>"Cobblers", "width"=>"240"} ruby -rrubygems -ropen-uri -e “require ‘hpricot’;(Hpricot.parse(open(‘http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155′)).search(\”//table//td[@id='prodImageCell']\”)/:img).each {|link| p link.attributes }” Amazing stuff really. The parser is so amazingly fast. All the time is spent fetching the [...]