Reading compressed files with postgres using named pipes

Postgres has the same type of ability MySQL has to read in files, yet much nicer syntax. LOAD DATA INFILE from MySQL is just COPY in postgres. I decided to try having it read from a named pipe today, and it worked out nicely.

I started out making a test db and making a nice little schema:


postgres@tardis:~$ createdb test
postgres@tardis:~$ psql test
psql (8.4.0)
Type "help" for help.

test=# CREATE TYPE rank AS ENUM ('general', 'sergeant', 'private');
CREATE TYPE
test=# CREATE TABLE military (id SERIAL PRIMARY KEY,
test(#   name VARCHAR(128),
test(#   rank rank);
NOTICE:  CREATE TABLE will create implicit sequence "military_id_seq" for serial column "military.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "military_pkey" for table "military"
CREATE TABLE

Notice the use of SERIAL? That’s postgres’ AUTO_INCREMENT, basically. I like it better. Next, it’s time to make a text file with some data and compress it. Here’s what I put in the file (note that the spaces between the words are TAB characters):


general Lee
sergeant  Hartman
private Pyle

And compress it with gzip, making a nice little file:


hank@tardis:/tmp$ gzip file
hank@tardis:/tmp$ zcat file.gz 
general	Lee
sergeant	Hartman
private	Pyle

Now let’s actually make a named pipe for postgres to read from:


hank@tardis:/tmp$ mkfifo namedpipe

Now that we have our named pipe, let’s start reading from it:


test=# COPY military (rank, name) FROM '/tmp/namedpipe' WITH DELIMITER E'\t';

The E'\t' part means to escape characters inside the single-quoted string, turning this into an actual tab character. All that we have to do now is use zcat:


hank@tardis:/tmp$ zcat file.gz > namedpipe 

Immediately, there’s some output in the psql session:


COPY 3

So, postgres says it got 3 records successfully. Yay! Now, let’s display them:


test=# select * from military;
 id |  name   |   rank   
----+---------+----------
  1 | Lee     | general
  2 | Hartman | sergeant
  3 | Pyle    | private
(3 rows)

So, this is a pretty good method to read in compressed files with postgres. I’ve seen many articles that use similar methods with postgres dump files, but it’s useful for bulk delimited data loading as well, as many times it’s prudent to compress bulk data files and not extract them before loading them. See the postgres COPY page for more information about this awesome function.


3 thoughts on “Reading compressed files with postgres using named pipes

  1. jeh

    that’s pretty awesome. ur db can just sit there sucking on the pipe, and you can dump data in whenevr you want. Can think of many ways that would be useful.

  2. Michael E Driscoll

    This is a feature that I’d always wished for in my MySQL days. It’s hard to understate how valuable it is to avoid unnecessary disk-writes with massive data sets. Glad to see it works in Postgres!

Leave a Reply

Your email address will not be published. Required fields are marked *