PostgreSQL tsearch2 migration from 8.0 to 8.3 - how I did it successfully

Recently I was in the process of moving my site to a much better hosting situation (more on that later).  During the move I decided to upgrade from PostgreSQL-8.0 to PostgreSQL-8.3 as I was pretty far behind and I prefer to stay current.  This sort of upgrade isn’t a big deal, and I’ve done it many times.  So I did my usual process:

  1. Install the desired version of PostgreSQL (in this case 8.3)
  2. Scp my last backup (taken a few minutes after I lock down the site) to the new host
  3. Run a script that essentially creates a new db, and restores the backup to it

That’s when things went horribly wrong.  After a few searches I quickly learned that moving from 8.0 to 8.3 is a bit tricky when you have tsearch2 stuff in there.  It turned out to be really easy to upgrade, read on if you want to know how I did it.

The main problem turned out to be the movement of Tsearch2 into the core.  Previously it was a contributed project and thus you had to “install” it into any database that needed to use it.  The installation was really just a bunch of sql statements that created functions inside your database.  So the problem when you restore the db is that the functions installed conflict with the functions that now exist in the core (aka in PostgreSQL itself).

So here’s how I migrated my database successfully:

  1. Lock my site so I know nothing can write to the database
  2. Create a backup (not to be used, but in case I typo a subsequent step)
  3. Drop all of the tsearch2 objects from my database (I’ll provide the commands I used)
  4. Use pg_dump to dump the database per normal
  5. Create a fresh database on 8.3
  6. Restore the database per normal

So basically by doing those steps I was easily able to get my database ported over to PostgreSQL-8.3.  Before I move on to how my app had to change to accomidate things, here’s the sql script I used to purge Tsearch2 from my db:

-- Clean blogs
DROP INDEX idx_blog_tags_idx CASCADE;
ALTER TABLE blogs DROP COLUMN idx_tags CASCADE;
DROP TRIGGER tsvectorupdate ON blogs CASCADE;

-- Clean board_posts
DROP INDEX idx_tags_idx CASCADE;
ALTER TABLE board_posts DROP COLUMN idx_tags CASCADE;
DROP TRIGGER tsvectorupdate ON board_posts CASCADE;

-- Clean tsearch tables
DROP TABLE pg_ts_cfg CASCADE;
DROP TABLE pg_ts_cfgmap CASCADE;
DROP TABLE pg_ts_dict CASCADE;
DROP TABLE pg_ts_parser CASCADE;

-- Clean tsearch functions
DROP FUNCTION concat(tsvector, tsvector) CASCADE;
DROP FUNCTION dex_init(internal) CASCADE;
DROP FUNCTION dex_lexize(internal, internal, integer) CASCADE;
DROP FUNCTION exectsq(tsvector, tsquery) CASCADE;
DROP FUNCTION get_covers(tsvector, tsquery) CASCADE;
DROP FUNCTION _get_parser_from_curcfg() CASCADE;
DROP FUNCTION gtsvector_compress(internal) CASCADE;
DROP FUNCTION gtsvector_consistent(gtsvector, internal, integer) CASCADE;
DROP FUNCTION gtsvector_decompress(internal) CASCADE;
DROP FUNCTION gtsvector_in(cstring) CASCADE;
DROP FUNCTION gtsvector_out(gtsvector) CASCADE;
DROP FUNCTION gtsvector_penalty(internal, internal, internal) CASCADE;
DROP FUNCTION gtsvector_picksplit(internal, internal) CASCADE;
DROP FUNCTION gtsvector_same(gtsvector, gtsvector, internal) CASCADE;
DROP FUNCTION gtsvector_union(internal, internal) CASCADE;
DROP FUNCTION headline(oid, text, tsquery) CASCADE;
DROP FUNCTION headline(oid, text, tsquery, text) CASCADE;
DROP FUNCTION headline(text, text, tsquery) CASCADE;
DROP FUNCTION headline(text, text, tsquery, text) CASCADE;
DROP FUNCTION headline(text, tsquery) CASCADE;
DROP FUNCTION headline(text, tsquery, text) CASCADE;
DROP FUNCTION length(tsvector) CASCADE;
DROP FUNCTION lexize(oid, text) CASCADE;
DROP FUNCTION lexize(text) CASCADE;
DROP FUNCTION lexize(text, text) CASCADE;
DROP FUNCTION parse(oid, text) CASCADE;
DROP FUNCTION parse(text) CASCADE;
DROP FUNCTION parse(text, text) CASCADE;
DROP FUNCTION prsd_end(internal) CASCADE;
DROP FUNCTION prsd_getlexeme(internal, internal, internal) CASCADE;
DROP FUNCTION prsd_headline(internal, internal, internal) CASCADE;
DROP FUNCTION prsd_lextype(internal) CASCADE;
DROP FUNCTION prsd_start(internal, integer) CASCADE;
DROP FUNCTION querytree(tsquery) CASCADE;
DROP FUNCTION rank_cd(integer, tsvector, tsquery) CASCADE;
DROP FUNCTION rank_cd(integer, tsvector, tsquery, integer) CASCADE;
DROP FUNCTION rank_cd(tsvector, tsquery) CASCADE;
DROP FUNCTION rank_cd(tsvector, tsquery, integer) CASCADE;
DROP FUNCTION rank(real[], tsvector, tsquery) CASCADE;
DROP FUNCTION rank(real[], tsvector, tsquery, integer) CASCADE;
DROP FUNCTION rank(tsvector, tsquery) CASCADE;
DROP FUNCTION rank(tsvector, tsquery, integer) CASCADE;
DROP FUNCTION reset_tsearch() CASCADE;
DROP FUNCTION rexectsq(tsquery, tsvector) CASCADE;
DROP FUNCTION set_curcfg(integer) CASCADE;
DROP FUNCTION set_curcfg(text) CASCADE;
DROP FUNCTION set_curdict(integer) CASCADE;
DROP FUNCTION set_curdict(text) CASCADE;
DROP FUNCTION set_curprs(integer) CASCADE;
DROP FUNCTION set_curprs(text) CASCADE;
DROP FUNCTION setweight(tsvector, "char") CASCADE;
DROP FUNCTION show_curcfg() CASCADE;
DROP FUNCTION snb_en_init(internal) CASCADE;
DROP FUNCTION snb_lexize(internal, internal, integer) CASCADE;
DROP FUNCTION snb_ru_init(internal) CASCADE;
DROP FUNCTION spell_init(internal) CASCADE;
DROP FUNCTION spell_lexize(internal, internal, integer) CASCADE;
DROP FUNCTION stat(text) CASCADE;
DROP FUNCTION stat(text, text) CASCADE;
DROP FUNCTION "strip"(tsvector) CASCADE;
DROP FUNCTION syn_init(internal) CASCADE;
DROP FUNCTION syn_lexize(internal, internal, integer) CASCADE;
DROP FUNCTION token_type() CASCADE;
DROP FUNCTION token_type(integer) CASCADE;
DROP FUNCTION token_type(text) CASCADE;
DROP FUNCTION to_tsquery(oid, text) CASCADE;
DROP FUNCTION to_tsquery(text) CASCADE;
DROP FUNCTION to_tsquery(text, text) CASCADE;
DROP FUNCTION to_tsvector(oid, text) CASCADE;
DROP FUNCTION to_tsvector(text) CASCADE;
DROP FUNCTION to_tsvector(text, text) CASCADE;
DROP FUNCTION ts_debug(text) CASCADE;
DROP FUNCTION tsearch2() CASCADE;
DROP FUNCTION tsquery_in(cstring) CASCADE;
DROP FUNCTION tsquery_out(tsquery) CASCADE;
DROP FUNCTION tsvector_cmp(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_eq(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_ge(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_gt(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_in(cstring) CASCADE;
DROP FUNCTION tsvector_le(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_lt(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_ne(tsvector, tsvector) CASCADE;
DROP FUNCTION tsvector_out(tsvector) CASCADE;

The next thing I needed to do was update my code so that it would reference the new core Tsearch2 objects. This turned out to be super easy. Here are a few code diffs of the lines I had to change (in my Python code):

- coalesce = "COALESCE(RANK(idx_tags, TO_TSQUERY('%s')), 0)"
+ coalesce = "COALESCE(TS_RANK(TO_TSVECTOR(tags), TO_TSQUERY('%s')), 0)"

- filter = "AND idx_tags @@ TO_TSQUERY('%s')"
+ filter = "AND TO_TSVECTOR(tags) @@ TO_TSQUERY('%s')"

I had to rinse and repeat those two changes a few times, and I was all set.  Here’s the page I used as a reference to get me going.  Currently I’m running without either an index or a dedicated tsvector column and things are running great.  I’m soon going to be adding a GIN index to gain back the speed I had before (and potentially more).

If you have any questions or anything please speak up - I know plenty of people are having challenges getting onto [the awesome] 8.3.