R&D · Postgres optimization
Postgres Partitioning for 95% Faster Queries
I merged a 35 GB archive database into a 27 GB production one — over 56 million rows in a single table — with a migration built to be reversible and safe to re-run, then partitioned that table and proved, with EXPLAIN ANALYZE, that the queries people actually run came back at least 95% faster.
One overgrown table, split into many — by category, then by year.
Two databases into one — then make the big table fast.
Two Postgres databases had drifted apart — a 35 GB archive and a 27 GB production system that no longer shared the same schema or the same constraints — and the business needed them as one. Their combined size wouldn't sit comfortably on a single box, archived rows referenced parents that no longer existed, and the largest table had grown past fifty million rows, slow to query on exactly the filters people used most. The merge had to run against a copy of live production data, so the real constraint wasn't writing the SQL — it was doing it safely: a process I could re-run without duplicating data and roll back cleanly if anything looked wrong.
Merge safely, clean up, partition, then prove it.
Four beats, in order: reconcile two drifted schemas and merge the data reversibly, repair the integrity the merge exposed, partition the table that had grown too big to scan, and benchmark the result against the unpartitioned original instead of trusting that it helped.
Five stages, each with a clean input and output.
The migration was deliberately staged — line up the schemas, load only what's missing, repair what the merge breaks, partition the hot table, then measure. Each stage does one job and hands off the next.
The two databases had drifted, so I diffed them and generated each table's INSERT in Python — the columns both shared, NULL for the columns only production had, and a source tag on every migrated row — so a structural mismatch couldn't silently drop or shift a column.
Each table merged in one transaction, inserting only the rows missing from production with a left-join anti-join on the primary key. Re-running never duplicated data, and a clean rollback was a single DELETE on the source tag.
Triggers and downstream integration flows were switched off during the load. Foreign keys were dropped and recreated around the insert to sidestep ordering deadlocks, and rows pointing at parents that never existed in the archive resolved to NULL instead of failing the whole batch.
The 56-million-row table was partitioned declaratively — by a low-cardinality category with LIST, each category sub-partitioned by year with RANGE — with data and indexes copied across. Built as a parallel table beside the original, so nothing was destroyed.
A Python script generated a matrix of realistic queries across date ranges and categories, ran each against the partitioned and unpartitioned tables with EXPLAIN ANALYZE, and computed the speedup — turning “partitioning should help” into a measured number.
Every step of the migration had to be reversible.
Running against a copy of live production, the load was designed to be both idempotent and
invertible. Idempotent: it inserts only the rows missing from production, so a re-run after a
failure adds nothing twice. Invertible: every migrated row carries a source tag, so undoing the
whole merge is a single DELETE … WHERE src = 'ARC'. That one column is the
difference between a safe experiment you can run on a Friday and a one-way door.
List by category, then range by year.
The table is split two ways. The top level is a LIST partition on a low-cardinality category; inside each category, a RANGE partition by year, plus a DEFAULT to catch anything outside the declared bounds. A query that names a category and a date window touches only the partitions it has to.
Category first, year second — on purpose: the category filter is set more often, so it prunes the most partitions before a single row is read.
At least 95% faster on the queries that matter.
Each figure is an average speedup measured with EXPLAIN ANALYZE — the same queries run against the partitioned and unpartitioned tables, on identical data. Filtering on both partition keys is where the win is largest; even a single key cuts the work meaningfully.
Filtered by category and date
the common case
| Filter | Faster | Rows (avg) |
|---|---|---|
| 3 months | 98% | 350K |
| 6 months | 98% | 700K |
| 12 months | 98% | 1.4M |
| 12 months · spanning 2 years | 95% | 1.7M |
| 24 months | 96% | 3M |
Filtered by category only
no date bound
| Filter | Faster | Rows (avg) |
|---|---|---|
| Category A | 79% | 17M |
| Category B | 62% | 29M |
| Others | 37% | 473K |
Averaged across a generated query matrix · partitioned vs. unpartitioned · same data.
The calls that shaped it.
The interesting part of a migration like this isn't the SQL — it's the trade-offs. These are the ones I'd defend.
Inheritance partitioning needs trigger-routed inserts and extra overhead to control where rows land — control I didn't need. Declarative routing is native and faster, so it won.
The category filter is set far more often than the date filter, so it's the top-level partition key — the most common query prunes the most partitions before it touches a row.
Deferring constraints with INITIALLY DEFERRED didn't behave as expected, and walking the table dependency graph to order inserts was complex. Dropping the foreign keys around the load and recreating them after was the reliable path.
The partitioned table was built alongside the original, not in place of it — so the migration stayed reversible and the benchmark was a true A/B on identical data, not a before/after guess.
Schema-aware INSERTs generated in Python and executed in a single transaction per table — so a failure rolled the whole thing back instead of leaving a half-merged table behind.
Triggers and integration flows were disabled for the bulk insert and switched back on after, so automated downstream processing didn't fire on every migrated row.
Got a Postgres table that's outgrown its hardware?
Tell me what's slow and how big it's gotten, and I'll come back with whether partitioning (or something simpler) is the right fix and what a first step looks like.