PostgreSQL: Partitioning
Partitioning is the process of splitting a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the data, which improves query performance and simplifies data management.
Partitioning can provide several benefits:
Query performance can be improved dramatically particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions.
Dropping an individual partition using
DROP TABLE
, or doingALTER TABLE DETACH PARTITION
, is far faster than a bulk operation. These commands also entirely avoid theVACUUM
overhead caused by a bulkDELETE
.Rarely used data can be migrated to cheaper and slower storage media.
Types of Partitioning in PostgreSQL
Range Partitioning: Data is divided based on a range of values (e.g., date ranges).
List Partitioning: Data is divided based on a specific list of values (e.g., regions, categories).
Hash Partitioning: Data is divided using a hash function (e.g., evenly distributing based on user IDs).
Range Partitioning
The command below will create partitions based on transaction_date
. One is for transactions happens in 2024 and the other is 2025.
With this when your query have where clause using transaction_date
it will prune the unnecessary partition.
Lets insert some dummy data for the table above and try different queries to see if it leverages partition or not.
First lets try selecting all the transactions.
As we can see above if we didn't use where clause using the partition key it will scan all partition.
Now lets try to query all transactions that happens in February 2025.
And we can see above that postgres prune unnecessary partition. In this case we only need data in the transactions_2025
partition so it will skip scanning on other partition. And as you can see the planning time and execution time is reduced as well.
List Partitioning
This command below will create 3 partition for ID
, JP
, and others respectively.
We need to create DEFAULT
partition because if we insert other countries other than ID
and JP
it will return error. So we need to define a partition where postgres can store countries other than ID
and JP
.
The query optimization concept is same with other partition type. If we include country
field in the where clause it will prune unnecessary partition.
Hash Partitioning
This command below will create 4 partition based on MODULUS 4
. If you want to create smaller or larger partition you can change the MODULUS
and REMAINDER
based on your need.
For every record that we insert, postgres will calculate the hash value for customer_id
and then put it in the correct partition.
When our query have customer_id
in the where clause, postgres will calculate the hash value and scan only the specific partition.
Example:
The hash value of customer_id = 104
is -1470666514
hence the remainder is 0
.
So postgres will only scan the partition transactions_p0
and ignore the other partition.
PostgreSQL Hash Functions
text, varchar
pg_catalog.hashtext()
int, bigint
pg_catalog.hashint4()
, pg_catalog.hashint8()
uuid
pg_catalog.hashuuid()
date, timestamp
pg_catalog.hashfloat8()
Best Practices
The most common query filters should align with the partition key to ensure partition pruning where postgres only scans relevant partitions, not the whole table.
Time-based queries → Use Range Partitioning on order_date.
Region-based searches → Use List Partitioning on country.
Random access (e.g., user_id) → Use Hash Partitioning on user_id.
Partitioning works best when data is evenly distributed across partitions. Imbalanced data (e.g., one partition being huge while others are small) can cause hotspots and reduce the benefit of partitioning.
Avoid frequent updates on the partition key because it triggers row movement.
References
Last updated