29
.
04
.
2024
25
.
03
.
2024
Postgresql
Backend

Table partitioning types - Postgres Stories

Jarosław Kowalewski
Ruby Developer

In this article, I'll provide a quick recap on the types of partitioning available in PostgreSQL. I will cover the basics, including creating partitioned tables from scratch, which is, let’s be honest, almost never the case. Usually partitioning comes to the table when data set is already overloaded and performance issues need to be fixed by it. I will cover such case in another article, but it is nice to have some background before diving deeper into the topic.

Different ways of partitioning

Before PostgreSQL 10 was released, there was no native support for partitioned tables.The process was managed by using inheritance and creating child tables. Triggers were used to insert data into these child tables. Version 10 introduced native support for a feature known as Declarative Partitioning, which allows partitioning by range and value (also known as list partitioning). PostgreSQL 11 expanded this feature by adding the ability to partition by hash.

Range partitioning

Range partitioning divides data into segments based on a particular range of values in a column. It is especially beneficial for time-series data or any data with a natural order. Each partition corresponds to a unique range of values, and data within that range is stored in the respective partition.A good example of this is partitioning by the created_at column - here considered for users table:

CREATE TABLE IF NOT EXISTS public.users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY RANGE(created_at);

Next, let’s create our partitions. Each partition's definition must specify bounds that match the partitioning method and partition key of the parent. To be clear - each range's bounds are inclusive at the start of range and exclusive at the end.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

Now, after adding a new user to the users table, it will be inserted into the appropriate partition according to its creation date.

If I attempt to create overlapping partitions, such as:

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES FROM ('2023-12-01') TO ('2024-02-01');

It will result in an error:

ERROR: partition "users_2" would overlap partition "users_1”

Additionally, considering the scenario where a value with a specific date doesn't fit into any range-partitioned tables, like this

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, '2025-05-01', '2025-05-01');

will return a message:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (created_at) = (2024-05-01 00:00:00).

The conclusion drawn from this type of error is that it may be beneficial to establish a cron job, such as pg_cron or another scheduler, which will periodically create new partitions for a specific time range. For range partitioning it is also possible to create default partition, which may hold values which don’t fit to any specified partition. It allows to avoid above error.

List partitioning

This type of partitioning relies on selecting specific values present in a column and dividing by them. This particular one allows to choose specific, not overlapping values instead of range of values - a good example would be partition by category or status columns.

CREATE TABLE IF NOT EXISTS users
(
    id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
    email character varying NOT NULL,
    first_name character varying,
    last_name character varying,
    username character varying NOT NULL,
    status integer DEFAULT 0,
    settings jsonb DEFAULT '{}'::jsonb,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY LIST(status);

I assumed here that I am using integer as enum in application - it is possible to add additional constraints if needed.

CREATE TABLE users_1 PARTITION OF users
    FOR VALUES IN (0);

CREATE TABLE users_2 PARTITION OF users
    FOR VALUES IN (1);

CREATE TABLE users_3 PARTITION OF users
    FOR VALUES IN (2);

Now, I can create our insert statement to correctly add record to specific partition:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, NOW(), NOW());

It is worth to mention what actually happens if you try to use status which is not defined in partition, something like:

INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',4, NOW(), NOW());

As a result of such query error is received:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (4).

Similar case happens when I do not specify value of status at all:

ERROR: no partition of relation "users" found for row DETAIL: Partition key of the failing row contains (status) = (null).

This is crucial because without partitioning, I could smoothly add new values to the enum in our codebase. Before adding new status value, you need to define a new partition. Alternatively, define a default partition or partition for null value. Choose the right approach according to your business needs.

Hash partitioning

Hash partitioning is another method of table partitioning in PostgreSQL. Data is divided into partitions based on the hash value of a specific column. This approach is useful if you want to distribute data evenly across partitions, allowing for balanced data load. For this one, I’m going to distribute data over foreign key, like user_id or customer_id.

CREATE TABLE IF NOT EXISTS orders
(
    id bigint NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
    user_id bigint,
    amount numeric(10,2) NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT orders_pkey PRIMARY KEY (id),
    CONSTRAINT fk_rails_f868b47f6a FOREIGN KEY (user_id)
        REFERENCES users (id) MATCH SIMPLE
) PARTITION BY HASH(user_id);

Then, following partitions looks like that:

CREATE TABLE orders_1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 0);

CREATE TABLE orders_2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 1);

CREATE TABLE orders_3 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 2);

CREATE TABLE orders_4 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 3);

CREATE TABLE orders_5 PARTITION OF orders
    FOR VALUES WITH (MODULUS 5, REMAINDER 4);

And here comes example of inserting new records for each partition:

// orders_1
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (1,10, NOW(), NOW());
// orders_2
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (2,10, NOW(), NOW());
// orders_3
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (3,10, NOW(), NOW());
// orders_4
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (4,10, NOW(), NOW());
// orders_5
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (5,10, NOW(), NOW());

To clarify, which I found misleading during my research, PostgreSQL does not directly calculate modulo from the column's value (as, for example, hash partitioning is also possible for varchar column).

PostgreSQL uses a hash function to balance partitions as evenly as possible without any particular order. This approach does not make much sense for columns with low cardinality, as demonstrated by the example above, where I obtained certain results for only 5 different values:

orders_1

User ID Amount Created At Updated At
1 10
4 10
5 10

orders_2

User ID Amount Created At Updated At
2 10
3 10

Records were distributed among only two of the five partitions, without any specific order, leaving the rest of the partitions (orders3, orders4 and orders_5) empty. Therefore, for a column with a low variety of values, applying hash partitioning would be pointless since the records won't be distributed evenly.

Hash partitioning proves useful in a select number of scenarios. It is primarily used for I/O improvements, such as parallel writing, especially when the cardinality of the partitioning column is high. Additionally, it enhances the efficiency of autovacuuming, as vacuuming smaller tables is faster and more effective.

Last but not least - if our write statement contains null value in partitioning attribute, it is assigned to partition with reminder 0 by default. Also, for hash partitioning it is not possible to create default partition.

Wrapping up

To sum up - just as careful consideration is required when deciding on table partitioning, the selection of the appropriate partitioning method should also be well-thought-out. Shorty describing best cases for each - pick range partitioning for time-series data - it very natural way for any archiving purposes. List partitioning would be best for any specific values, or group of values. Hash partitioning would be intuitively least likely choice, as it doesn’t rely specifically on data structure, but for some scenarios with I/O improvements and parallel writing it could be a good alternative. Choose wisely!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

NASA 1st global hackaton in Poland? Visuality Created it!

14
.
11
.
2023
Rafał Maliszewski
Ruby on Rails

Berlin StartupCamp 2016 summary

14
.
11
.
2023
Michał Piórkowski
Conferences
Startups

Investment Days for productivity

14
.
11
.
2023
Rafał Maliszewski
Visuality
HR

Happy new year

14
.
11
.
2023
Michał Piórkowski
Visuality

Does Norway need Polish software development?

14
.
11
.
2023
Rafał Maliszewski
Ruby on Rails

Visuality is 8 years old

14
.
11
.
2023
Michał Piórkowski
Visuality
Backend

Use less javascript plugins

14
.
11
.
2023
Michał Młoźniak
Frontend

Front-Trends 2015

14
.
11
.
2023
Adam Król
Frontend

Automatic door opener controlled through slack

14
.
11
.
2023
Sakir Temel
Backend
Software
Tutorial

Wolves Summit

14
.
11
.
2023
Michał Piórkowski
Conferences
Business

Berlin Startup Camp

14
.
11
.
2023
Michał Piórkowski
Ruby on Rails
Conferences

Why you shouldn't work at Visuality

14
.
11
.
2023
Michał Piórkowski
Visuality
HR

SaaS Meetup #People

14
.
11
.
2023
Michał Piórkowski
Conferences

Startup Safary Berlin 2015

14
.
11
.
2023
Michał Piórkowski
Conferences

Emmet makes HTML and CSS easier

14
.
11
.
2023
Michał Piórkowski
Frontend
Tutorial

Optional dependencies in gems

14
.
11
.
2023
Karol Słuszniak
Ruby on Rails
Backend

Text messaging with textris gem

14
.
11
.
2023
Karol Słuszniak
Ruby on Rails
Backend