<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Schadenfreude &#187; data</title>
	<atom:link href="http://www.ralree.com/tag/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ralree.com</link>
	<description>Malicious enjoyment derived from observing someone else's misfortune</description>
	<lastBuildDate>Sun, 28 Feb 2010 04:18:37 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Importing MySQL 1.4 Amarok data into Amarok 2.2 Nightly</title>
		<link>http://www.ralree.com/2009/09/28/importing-mysql-1-4-amarok-data-into-amarok-2-2-nightly/</link>
		<comments>http://www.ralree.com/2009/09/28/importing-mysql-1-4-amarok-data-into-amarok-2-2-nightly/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 20:15:49 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[amarok]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[import]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[music]]></category>
		<category><![CDATA[processing]]></category>
		<category><![CDATA[ubuntu]]></category>

		<guid isPermaLink="false">http://www.ralree.com/?p=22704</guid>
		<description><![CDATA[I was having a bunch of trouble today importing my old MySQL amarok database into the new nightly version of amarok I installed.  The Amarok Wiki had a great section on how to convert a MySQL Amarok collection into an SQLlite one.  This was the key to importing my old 1.4 collection into the new [...]]]></description>
			<content:encoded><![CDATA[<p>I was having a bunch of trouble today importing my old MySQL amarok database into the new nightly version of amarok I installed.  <span style="background-color: #ffffff; "><a href="http://amarok.kde.org/wiki/MySQL_HowTo#Amarok_1.4.8_and_MySQL_5.0.45">The Amarok Wiki</a> had a great section on how to convert a MySQL Amarok collection into an SQLlite one.  This was the key to importing my old 1.4 collection into the new 2.2 nightly version of Amarok.</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2009/09/28/importing-mysql-1-4-amarok-data-into-amarok-2-2-nightly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reading compressed files with postgres using named pipes</title>
		<link>http://www.ralree.com/2009/09/04/reading-compressed-files-with-postgres-using-named-pipes/</link>
		<comments>http://www.ralree.com/2009/09/04/reading-compressed-files-with-postgres-using-named-pipes/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 06:38:55 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[awesome]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[dba]]></category>
		<category><![CDATA[dump]]></category>
		<category><![CDATA[postgres]]></category>

		<guid isPermaLink="false">http://www.ralree.com/?p=22661</guid>
		<description><![CDATA[Postgres has the same type of ability MySQL has to read in files, yet much nicer syntax.  LOAD DATA INFILE from MySQL is just COPY in postgres.  I decided to try having it read from a named pipe today, and it worked out nicely.

I started out making a test db and making a [...]]]></description>
			<content:encoded><![CDATA[<p>Postgres has the same type of ability MySQL has to read in files, yet much nicer syntax.  <code>LOAD DATA INFILE</code> from MySQL is just <code>COPY</code> in postgres.  I decided to try having it read from a named pipe today, and it worked out nicely.<br />
<span id="more-22661"></span><br />
I started out making a test db and making a nice little schema:</p>
<pre><code>
postgres@tardis:~$ createdb test
postgres@tardis:~$ psql test
psql (8.4.0)
Type "help" for help.

test=# CREATE TYPE rank AS ENUM ('general', 'sergeant', 'private');
CREATE TYPE
test=# CREATE TABLE military (id SERIAL PRIMARY KEY,
test(#   name VARCHAR(128),
test(#   rank rank);
NOTICE:  CREATE TABLE will create implicit sequence "military_id_seq" for serial column "military.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "military_pkey" for table "military"
CREATE TABLE
</code></pre>
<p>Notice the use of <code>SERIAL</code>?  That&#8217;s postgres&#8217; <code>AUTO_INCREMENT</code>, basically.  I like it better.  Next, it&#8217;s time to make a text file with some data and compress it.  Here&#8217;s what I put in the file (note that the spaces between the words are <code>TAB</code> characters):</p>
<pre><code>
general Lee
sergeant  Hartman
private Pyle
</code></pre>
<p>And compress it with <code>gzip</code>, making a nice little file:</p>
<pre><code>
hank@tardis:/tmp$ gzip file
hank@tardis:/tmp$ zcat file.gz
general	Lee
sergeant	Hartman
private	Pyle
</code></pre>
<p>Now let&#8217;s actually make a named pipe for postgres to read from:</p>
<pre><code>
hank@tardis:/tmp$ mkfifo namedpipe
</code></pre>
<p>Now that we have our named pipe, let&#8217;s start reading from it:</p>
<pre><code>
test=# COPY military (rank, name) FROM '/tmp/namedpipe' WITH DELIMITER E'\t';
</code></pre>
<p>The <code>E'\t'</code> part means to escape characters inside the single-quoted string, turning this into an actual tab character.  All that we have to do now is use zcat:</p>
<pre><code>
hank@tardis:/tmp$ zcat file.gz > namedpipe
</code></pre>
<p>Immediately, there&#8217;s some output in the psql session:</p>
<pre><code>
COPY 3
</code></pre>
<p>So, postgres says it got 3 records successfully.  Yay!  Now, let&#8217;s display them:</p>
<pre><code>
test=# select * from military;
 id |  name   |   rank
----+---------+----------
  1 | Lee     | general
  2 | Hartman | sergeant
  3 | Pyle    | private
(3 rows)
</code></pre>
<p>So, this is a pretty good method to read in compressed files with postgres.  I&#8217;ve seen many articles that use similar methods with postgres dump files, but it&#8217;s useful for bulk delimited data loading as well, as many times it&#8217;s prudent to compress bulk data files and not extract them before loading them.  See the postgres <a href="http://www.postgresql.org/docs/8.4/interactive/sql-copy.html">COPY</a> page for more information about this awesome function.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2009/09/04/reading-compressed-files-with-postgres-using-named-pipes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Calculating Averages from a CSV with Perl</title>
		<link>http://www.ralree.com/2007/12/08/calculating-averages-from-a-csv-with-perl/</link>
		<comments>http://www.ralree.com/2007/12/08/calculating-averages-from-a-csv-with-perl/#comments</comments>
		<pubDate>Sat, 08 Dec 2007 18:09:00 +0000</pubDate>
		<dc:creator>Erik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[crime]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.ralree.info/2007/12/13/calculating-averages-from-a-csv-with-perl</guid>
		<description><![CDATA[Code
Here&#8217;s a quick one-liner using some UNIX utilities and Perl to construct some nice averages from CSV data:

for i in `seq 2 20`; do cat crim_rate_2005_by_state.csv &#124; cut -d , -f $i &#124; perl -e '$c=$d=0;$e;while(&#60;&#62;){if(/^\d/){$c+=$_;$d+=1}else{s/\s{2,}/ /g;s/"//g;chomp($e=$_);}} print $e, ": ", $c/$d, "\n"'; done

And now, the spaced out version:

#!/bin/bash
for i in `seq 2 20`; do
 [...]]]></description>
			<content:encoded><![CDATA[<h2>Code</h2>
<p>Here&#8217;s a quick one-liner using some UNIX utilities and Perl to construct some nice averages from CSV data:</p>
<pre><code>
for i in `seq 2 20`; do cat crim_rate_2005_by_state.csv | cut -d , -f $i | perl -e '$c=$d=0;$e;while(&lt;&gt;){if(/^\d/){$c+=$_;$d+=1}else{s/\s{2,}/ /g;s/"//g;chomp($e=$_);}} print $e, ": ", $c/$d, "\n"'; done
</code></pre>
<p>And now, the spaced out version:</p>
<pre><code>
#!/bin/bash
for i in `seq 2 20`; do
  cat crim_rate_2005_by_state.csv | \
  cut -d , -f $i | \
  perl -e '$c=$d=0;
    $e;
    while(&lt;&gt;){
      if(/^\d/){
        $c+=$_;
        $d+=1
      } else {
        s/\s{2,}/ /g;
        s/"//g;
        chomp($e=$_);
      }
    }
    print $e, ": ", $c/$d, "\n"';
done
</code></pre>
<h2>Output</h2>
<ul>
<li>Population: 5775431.88461538</li>
<li>Violent crime rate: 418.930769230769</li>
<li>Murder/manslaughter rate: 5.59038461538462</li>
<li>Forcible rape rate: 33.1634615384615</li>
<li>Robbery rate: 114.455769230769</li>
<li>Assault rate: 265.728846153846</li>
<li>Property crime rate: 3339.50961538462</li>
<li>Burglary rate: 685.671153846154</li>
<li>Larceny/theft rate: 2273.43269230769</li>
<li>Motor vehicle theft rate: 380.417307692308</li>
<li>Violent crime: 26928.3461538462</li>
<li>Murder and nonnegligent manslaughter: 335.730769230769</li>
<li>Forcible rape: 1809.67307692308</li>
<li>Robbery: 8128.30769230769</li>
<li>Aggravated assault: 16654.6346153846</li>
<li>Property crime: 196569.711538462</li>
<li>Burglary: 41756.0961538462</li>
<li>Larceny-theft: 130880.442307692</li>
<li>Motor vehicle theft: 23933.1730769231</li>
</ul>
<p>So, now we have our averages.  More work to be done.  The data file used is available here:</p>
<h3><a href="http://ralree.com/assets/2007/12/8/crim_rate_2005_by_state.csv">crim_rate_2005_by_state.csv</a></h3>
]]></content:encoded>
			<wfw:commentRss>http://www.ralree.com/2007/12/08/calculating-averages-from-a-csv-with-perl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.340 seconds -->
