<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Schadenfreude &#187; sql</title>
	<atom:link href="http://www.ralree.com/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ralree.com</link>
	<description>Malicious enjoyment derived from observing someone else's misfortune</description>
	<lastBuildDate>Thu, 09 Feb 2012 01:49:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Fulltext Indexing Wikipedia with Sphinx</title>
		<link>http://www.ralree.com/2007/09/15/fulltext-indexing-wikipedia-with-sphinx/</link>
		<comments>http://www.ralree.com/2007/09/15/fulltext-indexing-wikipedia-with-sphinx/#comments</comments>
		<pubDate>Sat, 15 Sep 2007 22:17:00 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.ralree.info/2007/10/13/fulltext-indexing-wikipedia-with-sphinx</guid>
		<description><![CDATA[So, earlier this year, I decided it would be cool to mirror Wikipedia. So, I successfully set up a local copy on my system, and it&#8217;s been just sitting there ever since. But lately, I&#8217;ve been interested in fulltext indexing offered by various indexing engines, and Sphinx has looked especially tasty. So, I figured I&#8217;d sit down and try it today. I pointed it at my 16GB of Wikipedia text in my MySQL database. So, earlier this year, I decided [...]]]></description>
			<content:encoded><![CDATA[<p>So, earlier this year, I decided it would be cool to mirror Wikipedia.  So, I successfully set up a local copy on my system, and it&#8217;s been just sitting there ever since.  But lately, I&#8217;ve been interested in fulltext indexing offered by various indexing engines, and <a href="http://www.sphinxsearch.com/">Sphinx</a> has looked especially tasty.  So, I figured I&#8217;d sit down and try it today.</p>
<p>I pointed it at my 16GB of Wikipedia text in my MySQL database.</p>
<p>            <span id="more-3411"></span></p>
<p>So, earlier this year, I decided it would be cool to mirror Wikipedia.  So, I successfully set up a local copy on my system, and it&#8217;s been just sitting there ever since.  But lately, I&#8217;ve been interested in fulltext indexing offered by various indexing engines, and <a href="http://www.sphinxsearch.com/">Sphinx</a> has looked especially tasty.  So, I figured I&#8217;d sit down and try it today.</p>
<p>I pointed it at my 16GB of Wikipedia text in my MySQL database like so:</p>
<h2>sphinx.conf</h2>
<pre><code>
source src1
{
  type        = mysql
  strip_html      = 0
  index_html_attrs  =
  sql_host      = localhost
  sql_user      = wikipedia
  sql_pass      = wikipedia
  sql_db        = wikidb
  sql_query_pre   =
  sql_query     = \
    SELECT old_id, old_text\
    FROM text
  sql_query_post    =
  sql_query_info    = SELECT * FROM text WHERE old_id=$id
}

</code></pre>
<h2>Next, I set up the indexing section.</h2>
<pre><code>
index wikipedia
{
  source      = src1
  path      = /nexus/rofl/sphinx/wikipedia.sphinx
  docinfo     = extern
  morphology      = none
  stopwords     =
  min_word_len    = 1
  charset_type    = utf-8
  min_prefix_len    = 0
  min_infix_len   = 0
}
index wikipediastemmed : wikipedia
{
  path      = /var/data/wikipediastemmed
  morphology    = stem_en
}
indexer
{
  mem_limit     = 512M
}

</code></pre>
<p>I left all the other options as default.  Next, I turned on the indexing and waited for about <strong>2.5 hours</strong>.  Now, bear in mind that 2.5 hours isn&#8217;t all that long to index this much data, especially given the results I&#8217;m about to show you.</p>
<h2>Now it&#8217;s time to test this out!</h2>
<pre><code>

hank@rofl:/usr/local/etc$ time search endothermic
## ....................................................................................................
## ....................................................................................................
## ....................................................................................................
= Sterling D. | title = Cold Fire® is a Hot Fire Extinguisher | publisher =
Company press release | date = Nov. 28, 2003 | url= http://www.greaterthings.com/News/ColdFire/pr031122.html | accessdate = August 21, 2006}}&lt;/ref&gt;
==References==
&lt;references/&gt;
== External links ==
* [http://www.firefreeze.com Fire Freeze Worldwide Inc.]

[[Category:Firefighting]]
        old_flags=utf-8
20. document=112594001, weight=1
        old_id=112594001
        old_text=#REDIRECT[[Endothermic]]
        old_flags=utf-8

words:
1. 'endothermic': 173 documents, 293 hits

real    0m0.831s
user    0m0.004s
sys     0m0.080s

hank@rofl:/usr/local/etc$ time search "hello &#038; world" &gt;/dev/null

real    0m0.659s
user    0m0.032s
sys     0m0.052s

</code></pre>
<h1>Look at that time!!  <strong>0.8 Seconds</strong> to search <strong>16GB of text</strong>!</h1>
<h2>Sphinx is indeed the master of the fulltexting.</h2>
<p>I&#8217;m very impressed.  I&#8217;m sure I will find a use for this soon.</p>
<h1>Update: It&#8217;s actually faster.</h1>
<p>Due to the comment from Sphinx&#8217;s author below, I ran a <code>searchd</code> instance with gets rid of all the overhead when searching from the command line.</p>
<p>Here are some results I got using the Ruby API that&#8217;s included with Sphinx:</p>
<pre><code>
irb(main):010:0&gt; t = Time.now; s.query('(Single &#038; mother) &#038; !father'); puts Time.now - t
0.016864
=&gt; nil
</code></pre>
<h2>It only took <strong>0.017 seconds</strong> to find all instances of single and mother without mention of father in Wikipedia&#8217;s database.</h2>
<p>This is indeed impressive.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2007/09/15/fulltext-indexing-wikipedia-with-sphinx/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Sweet SQL queries for Amarok</title>
		<link>http://www.ralree.com/2007/05/11/sweet-sql-queries-for-amarok/</link>
		<comments>http://www.ralree.com/2007/05/11/sweet-sql-queries-for-amarok/#comments</comments>
		<pubDate>Fri, 11 May 2007 08:22:00 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[amarok]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.ralree.info/2007/10/13/sweet-sql-queries-for-amarok</guid>
		<description><![CDATA[I was messing around writing some sweet SQL statements for amarok tonight. You can either run them using the MySQL console or using dcop (google &#8216;amarok dcop&#8217;). Here&#8217;s some examples: # List artists and their average rating and number of ratings ordered by favorite artists first SELECT a.name, avg(s.rating) avg, COUNT(s.rating) count FROM tags t, artist a, statistics s WHERE a.id=t.artist AND t.url=s.url GROUP BY a.name HAVING count &#62; 10 ORDER BY avg DESC;]]></description>
			<content:encoded><![CDATA[<p>I was messing around writing some sweet SQL statements for amarok tonight.  You can either run them using the MySQL console or using dcop (google &#8216;amarok dcop&#8217;).  Here&#8217;s some examples:</p>
<pre><code>
# List artists and their average rating and number of ratings ordered by favorite artists first
SELECT a.name, avg(s.rating) avg, COUNT(s.rating) count FROM tags t, artist a, statistics s WHERE a.id=t.artist AND t.url=s.url GROUP BY a.name HAVING count &gt; 10 ORDER BY avg DESC;
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2007/05/11/sweet-sql-queries-for-amarok/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQL Capitalization Issue</title>
		<link>http://www.ralree.com/2007/02/24/mysql-capitalization-issue/</link>
		<comments>http://www.ralree.com/2007/02/24/mysql-capitalization-issue/#comments</comments>
		<pubDate>Sat, 24 Feb 2007 13:07:00 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.ralree.info/2007/10/13/mysql-capitalization-issue</guid>
		<description><![CDATA[So, a problem with MySQL (in my opinion) is that it is not case sensitive by default for VARCHAR fields. That makes getting rid of crappy entries like &#8217;ITALY&#8217; a bother. I mean, sure, I could just post-process it with ruby (see titleize), but what&#8217;s the fun in that. select distinct BINARY(lead_country) from countries; Ah, finally recognizes that ITALY is not Italy. One is definitely uglier than the other one. Now for the change. update countries set lead_country = 'Italy' [...]]]></description>
			<content:encoded><![CDATA[<p>So, a problem with MySQL (in my opinion) is that it is not case sensitive by default for <strong>VARCHAR</strong> fields.  That makes getting rid of crappy entries like &#8217;<strong>ITALY</strong>&#8217; a bother.  I mean, sure, I could just post-process it with ruby (see <a href="http://api.rubyonrails.org/classes/Inflector.html#M001082">titleize</a>), but what&#8217;s the fun in that.</p>
<pre><code>
select distinct BINARY(lead_country) from countries;
</code></pre>
<p>Ah, finally recognizes that <strong>ITALY</strong> is not <strong>Italy</strong>.  One is definitely uglier than the other one.  Now for the change.</p>
<pre><code>
update countries set lead_country = 'Italy' where lead_country = "ITALY";
</code></pre>
<p>Much better.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2007/02/24/mysql-capitalization-issue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

