🦉
Programming Notes
  • My Programming Notes
  • CKA Exam Preparation
    • Certified Kubernetes Administrator
    • Setup Minikube
    • Network Design Principles
    • Role-Based Access Control (RBAC)
    • Namespace
    • Resource Quota
    • Pod
    • Deployment
    • Deployment: Rollout
    • ConfigMap
    • Service
    • Service: kubectl expose
    • Pod: Resources Management
    • Pod & Container: Quality of Service Class
    • Pod & Container: Probes
    • Limit Range
    • Scaling: Manual
    • Scaling: Horizontal Pod Autoscaler
    • Persistent Volume & Claim
    • Secret
    • Ingress: Routing
    • Ingress: TLS
    • Ingress: Rate Limit
    • Ingress: Basic Auth
    • Ingress: CRD (Custom Resource Definition)
    • Job
    • CronJob
    • Mutli-Node Cluster
  • Golang
    • Generics
    • Context
    • Goroutines and Channels in Go
    • Goroutine: Concurrency vs Parallelism
    • Goroutine: Performance & Tradeoffs
    • JSON: omitzero
  • Rust
    • Arrays & Slices
    • Closures
    • Generics & Traits
    • Iterators
    • Run Code Simultaneously
    • String vs &str
    • Tests
    • Rustlings Exercises
      • Variables
      • Functions
      • If
      • Primitive Types
      • Vectors
      • Move Semantics
      • Structs
      • Enums and Matching Pattern
      • Strings
      • Modules
      • Hashmaps
      • Options
      • Error Handling
      • Generics
      • Traits
      • Lifetimes
      • Tests
      • Iterators
      • Smart Pointers
      • Threads
      • Macros
      • Quiz 1
      • Quiz 2
      • Quiz 3
  • Software Engineering
    • CAP Theorem
    • Circuit Breaker
    • Decoupling
    • GraphQL: Query Caching
    • HMAC Signature
    • Idempotency
    • Monolith VS Microservice
    • OWASP Top 10 2021
    • PCI DSS
    • PostgreSQL: Partitioning
    • PostgreSQL: Replication
    • Protobuf & gRPC
    • Redis: Streams
    • Resource Scaling
    • Signed URL
    • SOLID
    • Stack VS Heap
    • Stateful VS Stateless
  • Site Reliability Engineering
    • Chaos Engineering
    • Distributed Tracing
    • Kubernetes (k8s)
    • SLA, SLO, and SLI Metrics
    • Site Reliability Engineer
  • Others
    • FFMPEG Cheat sheet
Powered by GitBook
On this page
  • Types of Partitioning in PostgreSQL
  • Range Partitioning
  • List Partitioning
  • Hash Partitioning
  • PostgreSQL Hash Functions
  • Best Practices
  • References
  1. Software Engineering

PostgreSQL: Partitioning

Partitioning is the process of splitting a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the data, which improves query performance and simplifies data management.

Partitioning can provide several benefits:

  • Query performance can be improved dramatically particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions.

  • Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.

  • Rarely used data can be migrated to cheaper and slower storage media.

Types of Partitioning in PostgreSQL

  • Range Partitioning: Data is divided based on a range of values (e.g., date ranges).

  • List Partitioning: Data is divided based on a specific list of values (e.g., regions, categories).

  • Hash Partitioning: Data is divided using a hash function (e.g., evenly distributing based on user IDs).

Range Partitioning

CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", transaction_date)
) PARTITION BY RANGE (transaction_date);

The command below will create partitions based on transaction_date. One is for transactions happens in 2024 and the other is 2025.

CREATE TABLE transactions_2024 PARTITION OF transactions
    FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');

CREATE TABLE transactions_2025 PARTITION OF transactions
    FOR VALUES FROM ('2025-01-01') TO ('2025-12-31');

With this when your query have where clause using transaction_date it will prune the unnecessary partition.

Lets insert some dummy data for the table above and try different queries to see if it leverages partition or not.

INSERT INTO transactions (customer_id, country, transaction_date) 
VALUES 
    (101, 'ID', '2024-01-15'),
    (102, 'JP', '2024-03-22'),
    (103, 'ID', '2024-05-10'),
    (104, 'JP', '2024-07-30'),
    (105, 'ID', '2024-09-05'),
    (106, 'JP', '2025-02-18'),
    (107, 'ID', '2025-04-12'),
    (108, 'JP', '2025-06-25'),
    (109, 'ID', '2025-08-08'),
    (110, 'JP', '2025-11-14');

First lets try selecting all the transactions.

EXPLAIN ANALYZE 
SELECT * FROM transactions;
 Append  (cost=0.00..53.90 rows=2260 width=44) (actual time=0.027..0.049 rows=10 loops=1)
   ->  Seq Scan on transactions_2024 transactions_1  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.025..0.027 rows=5 loops=1)
   ->  Seq Scan on transactions_2025 transactions_2  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.019..0.019 rows=5 loops=1)
 Planning Time: 1.901 ms
 Execution Time: 0.076 ms

As we can see above if we didn't use where clause using the partition key it will scan all partition.

Now lets try to query all transactions that happens in February 2025.

EXPLAIN ANALYZE 
SELECT * FROM transactions 
WHERE transaction_date >= '2025-02-01' 
AND transaction_date <= '2025-02-28';
 Seq Scan on transactions_2025 transactions  (cost=0.00..26.95 rows=6 width=44) (actual time=0.015..0.018 rows=1 loops=1)
   Filter: ((transaction_date >= '2025-02-01'::date) AND (transaction_date <= '2025-02-28'::date))
   Rows Removed by Filter: 4
 Planning Time: 0.401 ms
 Execution Time: 0.039 ms

And we can see above that postgres prune unnecessary partition. In this case we only need data in the transactions_2025 partition so it will skip scanning on other partition. And as you can see the planning time and execution time is reduced as well.

List Partitioning

CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", country)
) PARTITION BY LIST (country);

This command below will create 3 partition for ID, JP, and others respectively.

CREATE TABLE transactions_id PARTITION OF transactions
    FOR VALUES IN ('ID');

CREATE TABLE transactions_jp PARTITION OF transactions
    FOR VALUES IN ('JP');

CREATE TABLE transactions_other PARTITION OF transactions
    DEFAULT; -- Catch-all for other countries

We need to create DEFAULT partition because if we insert other countries other than ID and JP it will return error. So we need to define a partition where postgres can store countries other than ID and JP.

The query optimization concept is same with other partition type. If we include country field in the where clause it will prune unnecessary partition.

Hash Partitioning

CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", customer_id)
) PARTITION BY HASH (customer_id);

This command below will create 4 partition based on MODULUS 4. If you want to create smaller or larger partition you can change the MODULUS and REMAINDER based on your need.

CREATE TABLE transactions_p0 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE transactions_p1 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE transactions_p2 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE transactions_p3 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

For every record that we insert, postgres will calculate the hash value for customer_id and then put it in the correct partition.

When our query have customer_id in the where clause, postgres will calculate the hash value and scan only the specific partition.

Example:

The hash value of customer_id = 104 is -1470666514 hence the remainder is 0.

SELECT pg_catalog.hashint4(104);

  hashint4   
-------------
 -1470666514

So postgres will only scan the partition transactions_p0 and ignore the other partition.

EXPLAIN ANALYZE
SELECT * FROM transactions 
WHERE customer_id = 104;
 Seq Scan on transactions_p0 transactions  (cost=0.00..24.12 rows=6 width=44) (actual time=0.032..0.036 rows=1 loops=1)
   Filter: (customer_id = 104)
   Rows Removed by Filter: 3
 Planning Time: 0.230 ms
 Execution Time: 0.066 ms

PostgreSQL Hash Functions

Data Type
Hash Function

text, varchar

pg_catalog.hashtext()

int, bigint

pg_catalog.hashint4(), pg_catalog.hashint8()

uuid

pg_catalog.hashuuid()

date, timestamp

pg_catalog.hashfloat8()

Best Practices

  • The most common query filters should align with the partition key to ensure partition pruning where postgres only scans relevant partitions, not the whole table.

    • Time-based queries → Use Range Partitioning on order_date.

    • Region-based searches → Use List Partitioning on country.

    • Random access (e.g., user_id) → Use Hash Partitioning on user_id.

  • Partitioning works best when data is evenly distributed across partitions. Imbalanced data (e.g., one partition being huge while others are small) can cause hotspots and reduce the benefit of partitioning.

  • Avoid frequent updates on the partition key because it triggers row movement.

References

PreviousPCI DSSNextPostgreSQL: Replication

Last updated 3 months ago

https://www.postgresql.org/docs/current/ddl-partitioning.html