# PostgreSQL: Partitioning

Partitioning is the process of splitting a large table into smaller, more manageable pieces called partitions. Each partition holds a subset of the data, which improves query performance and simplifies data management.

Partitioning can provide several benefits:

* Query performance can be improved dramatically particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions.
* Dropping an individual partition using `DROP TABLE`, or doing `ALTER TABLE DETACH PARTITION`, is far faster than a bulk operation. These commands also entirely avoid the `VACUUM` overhead caused by a bulk `DELETE`.
* Rarely used data can be migrated to cheaper and slower storage media.

## Types of Partitioning in PostgreSQL

* **Range Partitioning**: Data is divided based on a range of values (e.g., date ranges).
* **List Partitioning**: Data is divided based on a specific list of values (e.g., regions, categories).
* **Hash Partitioning**: Data is divided using a hash function (e.g., evenly distributing based on user IDs).

## Range Partitioning

```sql
CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", transaction_date)
) PARTITION BY RANGE (transaction_date);
```

The command below will create partitions based on `transaction_date`. One is for transactions happens in 2024 and the other is 2025.

```sql
CREATE TABLE transactions_2024 PARTITION OF transactions
    FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');

CREATE TABLE transactions_2025 PARTITION OF transactions
    FOR VALUES FROM ('2025-01-01') TO ('2025-12-31');
```

With this when your query have where clause using `transaction_date` it will prune the unnecessary partition.

Lets insert some dummy data for the table above and try different queries to see if it leverages partition or not.

```sql
INSERT INTO transactions (customer_id, country, transaction_date) 
VALUES 
    (101, 'ID', '2024-01-15'),
    (102, 'JP', '2024-03-22'),
    (103, 'ID', '2024-05-10'),
    (104, 'JP', '2024-07-30'),
    (105, 'ID', '2024-09-05'),
    (106, 'JP', '2025-02-18'),
    (107, 'ID', '2025-04-12'),
    (108, 'JP', '2025-06-25'),
    (109, 'ID', '2025-08-08'),
    (110, 'JP', '2025-11-14');
```

First lets try selecting all the transactions.

```sql
EXPLAIN ANALYZE 
SELECT * FROM transactions;
```

```sql
 Append  (cost=0.00..53.90 rows=2260 width=44) (actual time=0.027..0.049 rows=10 loops=1)
   ->  Seq Scan on transactions_2024 transactions_1  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.025..0.027 rows=5 loops=1)
   ->  Seq Scan on transactions_2025 transactions_2  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.019..0.019 rows=5 loops=1)
 Planning Time: 1.901 ms
 Execution Time: 0.076 ms
```

As we can see above if we didn't use where clause using the partition key it will scan all partition.

Now lets try to query all transactions that happens in February 2025.

```sql
EXPLAIN ANALYZE 
SELECT * FROM transactions 
WHERE transaction_date >= '2025-02-01' 
AND transaction_date <= '2025-02-28';
```

```sql
 Seq Scan on transactions_2025 transactions  (cost=0.00..26.95 rows=6 width=44) (actual time=0.015..0.018 rows=1 loops=1)
   Filter: ((transaction_date >= '2025-02-01'::date) AND (transaction_date <= '2025-02-28'::date))
   Rows Removed by Filter: 4
 Planning Time: 0.401 ms
 Execution Time: 0.039 ms
```

And we can see above that postgres prune unnecessary partition. In this case we only need data in the `transactions_2025` partition so it will skip scanning on other partition. And as you can see the planning time and execution time is reduced as well.

## List Partitioning

```sql
CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", country)
) PARTITION BY LIST (country);
```

This command below will create 3 partition for `ID`, `JP`, and others respectively.

```sql
CREATE TABLE transactions_id PARTITION OF transactions
    FOR VALUES IN ('ID');

CREATE TABLE transactions_jp PARTITION OF transactions
    FOR VALUES IN ('JP');

CREATE TABLE transactions_other PARTITION OF transactions
    DEFAULT; -- Catch-all for other countries
```

We need to create `DEFAULT` partition because if we insert other countries other than `ID` and `JP` it will return error. So we need to define a partition where postgres can store countries other than `ID` and `JP`.

The query optimization concept is same with other partition type. If we include `country` field in the where clause it will prune unnecessary partition.

## Hash Partitioning

```sql
CREATE TABLE transactions (
    id                  SERIAL  NOT NULL,
    customer_id         INT     NOT NULL,
    country             TEXT    NOT NULL,
    transaction_date    DATE    NOT NULL,
    CONSTRAINT PK_transaction PRIMARY KEY ("id", customer_id)
) PARTITION BY HASH (customer_id);
```

This command below will create 4 partition based on `MODULUS 4`. If you want to create smaller or larger partition you can change the `MODULUS` and `REMAINDER` based on your need.

```sql
CREATE TABLE transactions_p0 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE transactions_p1 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE transactions_p2 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE transactions_p3 PARTITION OF transactions
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
```

For every record that we insert, postgres will calculate the hash value for `customer_id` and then put it in the correct partition.

When our query have `customer_id` in the where clause, postgres will calculate the hash value and scan only the specific partition.

**Example**:

The hash value of `customer_id = 104` is `-1470666514` hence the remainder is `0`.

```sql
SELECT pg_catalog.hashint4(104);

  hashint4   
-------------
 -1470666514
```

So postgres will only scan the partition `transactions_p0` and ignore the other partition.

```sql
EXPLAIN ANALYZE
SELECT * FROM transactions 
WHERE customer_id = 104;
```

```sql
 Seq Scan on transactions_p0 transactions  (cost=0.00..24.12 rows=6 width=44) (actual time=0.032..0.036 rows=1 loops=1)
   Filter: (customer_id = 104)
   Rows Removed by Filter: 3
 Planning Time: 0.230 ms
 Execution Time: 0.066 ms
```

### PostgreSQL Hash Functions

| Data Type       | Hash Function                                    |
| --------------- | ------------------------------------------------ |
| text, varchar   | `pg_catalog.hashtext()`                          |
| int, bigint     | `pg_catalog.hashint4()`, `pg_catalog.hashint8()` |
| uuid            | `pg_catalog.hashuuid()`                          |
| date, timestamp | `pg_catalog.hashfloat8()`                        |

## Best Practices

* The most common query filters should align with the partition key to ensure partition pruning where postgres only scans relevant partitions, not the whole table.
  * Time-based queries → Use Range Partitioning on order\_date.
  * Region-based searches → Use List Partitioning on country.
  * Random access (e.g., user\_id) → Use Hash Partitioning on user\_id.
* Partitioning works best when data is evenly distributed across partitions. Imbalanced data (e.g., one partition being huge while others are small) can cause hotspots and reduce the benefit of partitioning.
* Avoid frequent updates on the partition key because it triggers row movement.

## References

* <https://www.postgresql.org/docs/current/ddl-partitioning.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bagus-cahyono.gitbook.io/programming-notes/programming/postgre_partition.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
