<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Schadenfreude &#187; parsing</title>
	<atom:link href="http://www.ralree.com/tag/parsing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ralree.com</link>
	<description>Malicious enjoyment derived from observing someone else's misfortune</description>
	<lastBuildDate>Thu, 09 Feb 2012 01:49:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Parsing Amazon with Hpricot</title>
		<link>http://www.ralree.com/2006/07/06/parsing-amazon-with-hpricot/</link>
		<comments>http://www.ralree.com/2006/07/06/parsing-amazon-with-hpricot/#comments</comments>
		<pubDate>Thu, 06 Jul 2006 01:07:34 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[hpricot]]></category>
		<category><![CDATA[parsing]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[_why]]></category>

		<guid isPermaLink="false">http://www.ralree.info/2007/10/13/parsing-amazon-with-hpricot</guid>
		<description><![CDATA[_why made a really sweet HTML parser called Hpricot. This allows you to easily parse a remote document using Open-URI. Here&#8217;s how to do it: require 'rubygems' require_gem 'hpricot' require 'open-uri' puts &#34;Grabbing Page...&#34; html = open(&#34;http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155&#34;) puts &#34;Parsing...&#34; doc = Hpricot.parse(html) (doc.search(&#34;//table//td[@id='prodImageCell']&#34;)/:img).each do &#124;link&#124; p link.attributes end {&#34;src&#34;=&#62;&#34;http://ec1.images-amazon.com/images/P/1844300439.01._AA240_SCLZZZZZZZ_V54614147_.jpg&#34;, &#34;border&#34;=&#62;&#34;0&#34;, &#34;id&#34;=&#62;&#34;prodImage&#34;, &#34;height&#34;=&#62;&#34;240&#34;, &#34;alt&#34;=&#62;&#34;Cobblers&#34;, &#34;width&#34;=&#62;&#34;240&#34;} ruby -rrubygems -ropen-uri -e "require 'hpricot';(Hpricot.parse(open('http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155')).search(\"//table//td[@id='prodImageCell']\")/:img).each {&#124;link&#124; p link.attributes }" Amazing stuff really. The parser is so amazingly fast. All the time is spent fetching the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://redhanded.hobix.com">_why</a> made a really sweet HTML parser called <a href="http://redhanded.hobix.com/inspect/okayGiveHpricot02AGo.html">Hpricot</a>.   This allows you to easily parse a remote document using Open-URI.  Here&#8217;s how to do it:</p>
<div class="CodeRay">
<div class="code">
<pre><code>
require <span class="s"><span class="dl">'</span><span class="k">rubygems</span><span class="dl">'</span></span>
require_gem <span class="s"><span class="dl">'</span><span class="k">hpricot</span><span class="dl">'</span></span>
require <span class="s"><span class="dl">'</span><span class="k">open-uri</span><span class="dl">'</span></span>
puts <span class="s"><span class="dl">&quot;</span><span class="k">Grabbing Page...</span><span class="dl">&quot;</span></span>
html = open(<span class="s"><span class="dl">&quot;</span><span class="k">http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155</span><span class="dl">&quot;</span></span>)
puts <span class="s"><span class="dl">&quot;</span><span class="k">Parsing...</span><span class="dl">&quot;</span></span>
doc = <span class="co">Hpricot</span>.parse(html)
(doc.search(<span class="s"><span class="dl">&quot;</span><span class="k">//table//td[@id='prodImageCell']</span><span class="dl">&quot;</span></span>)/<span class="sy">:img</span>).each <span class="r">do</span> |link|
  p link.attributes
<span class="r">end</span>
</code></pre>
</div>
</div>
<div class="CodeRay">
<div class="code">
<pre><code>
{<span class="s"><span class="dl">&quot;</span><span class="k">src</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">http://ec1.images-amazon.com/images/P/1844300439.01._AA240_SCLZZZZZZZ_V54614147_.jpg</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">border</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">0</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">id</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">prodImage</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">height</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">240</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">alt</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">Cobblers</span><span class="dl">&quot;</span></span>, <span class="s"><span class="dl">&quot;</span><span class="k">width</span><span class="dl">&quot;</span></span>=&gt;<span class="s"><span class="dl">&quot;</span><span class="k">240</span><span class="dl">&quot;</span></span>}
</code></pre>
</div>
</div>
<pre><code>
ruby -rrubygems -ropen-uri -e "require 'hpricot';(Hpricot.parse(open('http://www.amazon.com/gp/product/1844300439/ref=amb_cob_bh_194691301/002-0086113-2532879?n=283155')).search(\"//table//td[@id='prodImageCell']\")/:img).each {|link| p link.attributes }"
</code></pre>
<p>Amazing stuff really.  The parser is so amazingly fast.  All the time is spent fetching the page, not parsing!</p>
<p>Also, &#8220;Sunset, Sunrise&#8221; by Razor Ramon is awesome.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2006/07/06/parsing-amazon-with-hpricot/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

