29
.
04
.
2024
25
.
03
.
2024
Postgresql
Backend

Table partitioning types - Postgres Stories

Jarosław Kowalewski
Ruby Developer

In this article, I'll provide a quick recap on the types of partitioning available in PostgreSQL. I will cover the basics, including creating partitioned tables from scratch, which is, let’s be honest, almost never the case. Usually partitioning comes to the table when data set is already overloaded and performance issues need to be fixed by it. I will cover such case in another article, but it is nice to have some background before diving deeper into the topic.

Different ways of partitioning

Before PostgreSQL 10 was released, there was no native support for partitioned tables.The process was managed by using inheritance and creating child tables. Triggers were used to insert data into these child tables. Version 10 introduced native support for a feature known as Declarative Partitioning, which allows partitioning by range and value (also known as list partitioning). PostgreSQL 11 expanded this feature by adding the ability to partition by hash.

Range partitioning

Range partitioning divides data into segments based on a particular range of values in a column. It is especially beneficial for time-series data or any data with a natural order. Each partition corresponds to a unique range of values, and data within that range is stored in the respective partition.A good example of this is partitioning by the created_at column - here considered for users table:

CREATE TABLE IF NOT EXISTS public.users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY RANGE(created_at);

Next, let’s create our partitions. Each partition's definition must specify bounds that match the partitioning method and partition key of the parent. To be clear - each range's bounds are inclusive at the start of range and exclusive at the end.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

Now, after adding a new user to the users table, it will be inserted into the appropriate partition according to its creation date.

If I attempt to create overlapping partitions, such as:

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2023-12-01') TO ('2024-02-01');

It will result in an error:

ERROR: partition "users_2" would overlap partition "users_1”

Additionally, considering the scenario where a value with a specific date doesn't fit into any range-partitioned tables, like this

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, '2025-05-01', '2025-05-01');

will return a message:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (created_at) = (2024-05-01 00:00:00).

The conclusion drawn from this type of error is that it may be beneficial to establish a cron job, such as pg_cron or another scheduler, which will periodically create new partitions for a specific time range. For range partitioning it is also possible to create default partition, which may hold values which don’t fit to any specified partition. It allows to avoid above error.

List partitioning

This type of partitioning relies on selecting specific values present in a column and dividing by them. This particular one allows to choose specific, not overlapping values instead of range of values - a good example would be partition by category or status columns.

CREATE TABLE IF NOT EXISTS users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY LIST(status);

I assumed here that I am using integer as enum in application - it is possible to add additional constraints if needed.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES IN (0);

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES IN (1);

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES IN (2);

Now, I can create our insert statement to correctly add record to specific partition:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, NOW(), NOW());

It is worth to mention what actually happens if you try to use status which is not defined in partition, something like:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',4, NOW(), NOW());

As a result of such query error is received:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (4).

Similar case happens when I do not specify value of status at all:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (null).

This is crucial because without partitioning, I could smoothly add new values to the enum in our codebase. Before adding new status value, you need to define a new partition. Alternatively, define a default partition or partition for null value. Choose the right approach according to your business needs.

Hash partitioning

Hash partitioning is another method of table partitioning in PostgreSQL. Data is divided into partitions based on the hash value of a specific column. This approach is useful if you want to distribute data evenly across partitions, allowing for balanced data load. For this one, I’m going to distribute data over foreign key, like user_id or customer_id.

CREATE TABLE IF NOT EXISTS orders
(
    id bigint NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
    user_id bigint,
    amount numeric(10,2) NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT orders_pkey PRIMARY KEY (id),
    CONSTRAINT fk_rails_f868b47f6a FOREIGN KEY (user_id)
        REFERENCES users (id) MATCH SIMPLE
) PARTITION BY HASH(user_id);

Then, following partitions looks like that:

CREATE TABLE orders_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 0);

CREATE TABLE orders_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 1);

CREATE TABLE orders_3 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 2);

CREATE TABLE orders_4 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 3);

CREATE TABLE orders_5 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 4);

And here comes example of inserting new records for each partition:

// orders_1
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (1,10, NOW(), NOW());
// orders_2
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (2,10, NOW(), NOW());
// orders_3
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (3,10, NOW(), NOW());
// orders_4
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (4,10, NOW(), NOW());
// orders_5
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (5,10, NOW(), NOW());

To clarify, which I found misleading during my research, PostgreSQL does not directly calculate modulo from the column's value (as, for example, hash partitioning is also possible for varchar column).

PostgreSQL uses a hash function to balance partitions as evenly as possible without any particular order. This approach does not make much sense for columns with low cardinality, as demonstrated by the example above, where I obtained certain results for only 5 different values:

orders_1

User ID Amount Created At Updated At
1 10
4 10
5 10

orders_2

User ID Amount Created At Updated At
2 10
3 10

Records were distributed among only two of the five partitions, without any specific order, leaving the rest of the partitions (orders3, orders4 and orders_5) empty. Therefore, for a column with a low variety of values, applying hash partitioning would be pointless since the records won't be distributed evenly.

Hash partitioning proves useful in a select number of scenarios. It is primarily used for I/O improvements, such as parallel writing, especially when the cardinality of the partitioning column is high. Additionally, it enhances the efficiency of autovacuuming, as vacuuming smaller tables is faster and more effective.

Last but not least - if our write statement contains null value in partitioning attribute, it is assigned to partition with reminder 0 by default. Also, for hash partitioning it is not possible to create default partition.

Wrapping up

To sum up - just as careful consideration is required when deciding on table partitioning, the selection of the appropriate partitioning method should also be well-thought-out. Shorty describing best cases for each - pick range partitioning for time-series data - it very natural way for any archiving purposes. List partitioning would be best for any specific values, or group of values. Hash partitioning would be intuitively least likely choice, as it doesn’t rely specifically on data structure, but for some scenarios with I/O improvements and parallel writing it could be a good alternative. Choose wisely!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

Is Go Language the Right Choice for Your Next Project?

14
.
11
.
2023
Maciej Zdunek
Backend
Business

SXSW Tradeshow 2020: Get Your FREE Tickets and Meet Us

02
.
10
.
2024
Michał Krochecki
Ruby on Rails
Conferences
Frontend
Backend
Business

How to build effective website: simplicity & McDonald's

14
.
11
.
2023
Lukasz Jackiewicz
Ruby on Rails
Frontend
Design

Thermal Printer Protocols for Image and Text

14
.
11
.
2023
Burak Aybar
Backend
Tutorial
Software

WebUSB - Print Image and Text in Thermal Printers

14
.
11
.
2023
Burak Aybar
Backend
Tutorial
Software

What happened in Visuality in 2019

14
.
11
.
2023
Maciej Zdunek
Visuality
HR

Three strategies that work in board games and in real life

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails

HR Wave - No Bullshit HR Conference 2019

14
.
11
.
2023
Alicja Gruszczyk
HR
Conferences

Lightning Talks in your company

14
.
11
.
2023
Jarosław Kowalewski
Ruby on Rails
Visuality

Stress in Project Management

02
.
10
.
2024
Wiktor De Witte
HR
Project Management

How to find good developers and keep them happy - Part 1

02
.
10
.
2024
Michał Krochecki
HR
Visuality

PKP Intercity - Redesign and case study of polish national carrier

14
.
11
.
2023
Katarzyna Szewc
Design
Business
Frontend

Let’s prepare for GITEX Dubai together!

14
.
11
.
2023
Michał Piórkowski
Conferences
Business

Ruby Quirks

14
.
11
.
2023
Jan Matusz
Ruby on Rails
Ruby

Visuality recognized as one of the Best Ruby on Rails Devs

14
.
11
.
2023
Maciej Zdunek
Ruby on Rails
Visuality
Business

Is the culture of the organization important?

14
.
11
.
2023
Alicja Gruszczyk
Conferences
Visuality

Between the devil and the deep blue sea

04
.
12
.
2023
Mateusz Wodyk
Project Management
Backend
HR

Let’s prototype!

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Backend