In this article, I'll provide a quick recap on the types of partitioning available in PostgreSQL. I will cover the basics, including creating partitioned tables from scratch, which is, let’s be honest, almost never the case. Usually partitioning comes to the table when data set is already overloaded and performance issues need to be fixed by it. I will cover such case in another article, but it is nice to have some background before diving deeper into the topic.
Different ways of partitioning
Before PostgreSQL 10 was released, there was no native support for partitioned tables.The process was managed by using inheritance and creating child tables. Triggers were used to insert data into these child tables. Version 10 introduced native support for a feature known as Declarative Partitioning, which allows partitioning by range and value (also known as list partitioning). PostgreSQL 11 expanded this feature by adding the ability to partition by hash.
Range partitioning
Range partitioning divides data into segments based on a particular range of values in a column. It is especially beneficial for time-series data or any data with a natural order. Each partition corresponds to a unique range of values, and data within that range is stored in the respective partition.A good example of this is partitioning by the created_at
column - here considered for users
table:
CREATE TABLE IF NOT EXISTS public.users
(
id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
email character varying NOT NULL,
first_name character varying,
last_name character varying,
username character varying NOT NULL,
status integer DEFAULT 0,
settings jsonb DEFAULT '{}'::jsonb,
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL,
CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY RANGE(created_at);
Next, let’s create our partitions. Each partition's definition must specify bounds that match the partitioning method and partition key of the parent. To be clear - each range's bounds are inclusive at the start of range and exclusive at the end.
CREATE TABLE users_1 PARTITION OF users
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE users_2 PARTITION OF users
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE users_3 PARTITION OF users
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
Now, after adding a new user to the users
table, it will be inserted into the appropriate partition according to its creation date.
If I attempt to create overlapping partitions, such as:
CREATE TABLE users_1 PARTITION OF users
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE users_2 PARTITION OF users
FOR VALUES FROM ('2023-12-01') TO ('2024-02-01');
It will result in an error:
ERROR: partition "users_2" would overlap partition "users_1”
Additionally, considering the scenario where a value with a specific date doesn't fit into any range-partitioned tables, like this
INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, '2025-05-01', '2025-05-01');
will return a message:
ERROR: no partition of relation "users" found for row
DETAIL: Partition key of the failing row contains (created_at) = (2024-05-01 00:00:00).
The conclusion drawn from this type of error is that it may be beneficial to establish a cron job, such as pg_cron or another scheduler, which will periodically create new partitions for a specific time range. For range partitioning it is also possible to create default partition, which may hold values which don’t fit to any specified partition. It allows to avoid above error.
List partitioning
This type of partitioning relies on selecting specific values present in a column and dividing by them. This particular one allows to choose specific, not overlapping values instead of range of values - a good example would be partition by category
or status
columns.
CREATE TABLE IF NOT EXISTS users
(
id bigint NOT NULL DEFAULT nextval('users_id_seq'::regclass),
email character varying NOT NULL,
first_name character varying,
last_name character varying,
username character varying NOT NULL,
status integer DEFAULT 0,
settings jsonb DEFAULT '{}'::jsonb,
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL,
CONSTRAINT users_pkey PRIMARY KEY (id)
) PARTITION BY LIST(status);
I assumed here that I am using integer as enum in application - it is possible to add additional constraints if needed.
CREATE TABLE users_1 PARTITION OF users
FOR VALUES IN (0);
CREATE TABLE users_2 PARTITION OF users
FOR VALUES IN (1);
CREATE TABLE users_3 PARTITION OF users
FOR VALUES IN (2);
Now, I can create our insert statement to correctly add record to specific partition:
INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',0, NOW(), NOW());
It is worth to mention what actually happens if you try to use status which is not defined in partition, something like:
INSERT INTO users (email, last_name, first_name, username, status,created_at, updated_at))
VALUES ('example@.example.com','Smith', 'John', 'Superman',4, NOW(), NOW());
As a result of such query error is received:
ERROR: no partition of relation "users" found for row
DETAIL: Partition key of the failing row contains (status) = (4).
Similar case happens when I do not specify value of status at all:
ERROR: no partition of relation "users" found for row
DETAIL: Partition key of the failing row contains (status) = (null).
This is crucial because without partitioning, I could smoothly add new values to the enum in our codebase. Before adding new status
value, you need to define a new partition. Alternatively, define a default partition or partition for null
value. Choose the right approach according to your business needs.
Hash partitioning
Hash partitioning is another method of table partitioning in PostgreSQL. Data is divided into partitions based on the hash value of a specific column. This approach is useful if you want to distribute data evenly across partitions, allowing for balanced data load. For this one, I’m going to distribute data over foreign key, like user_id
or customer_id
.
CREATE TABLE IF NOT EXISTS orders
(
id bigint NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
user_id bigint,
amount numeric(10,2) NOT NULL,
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL,
CONSTRAINT orders_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_f868b47f6a FOREIGN KEY (user_id)
REFERENCES users (id) MATCH SIMPLE
) PARTITION BY HASH(user_id);
Then, following partitions looks like that:
CREATE TABLE orders_1 PARTITION OF orders
FOR VALUES WITH (MODULUS 5, REMAINDER 0);
CREATE TABLE orders_2 PARTITION OF orders
FOR VALUES WITH (MODULUS 5, REMAINDER 1);
CREATE TABLE orders_3 PARTITION OF orders
FOR VALUES WITH (MODULUS 5, REMAINDER 2);
CREATE TABLE orders_4 PARTITION OF orders
FOR VALUES WITH (MODULUS 5, REMAINDER 3);
CREATE TABLE orders_5 PARTITION OF orders
FOR VALUES WITH (MODULUS 5, REMAINDER 4);
And here comes example of inserting new records for each partition:
// orders_1
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (1,10, NOW(), NOW());
// orders_2
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (2,10, NOW(), NOW());
// orders_3
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (3,10, NOW(), NOW());
// orders_4
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (4,10, NOW(), NOW());
// orders_5
INSERT INTO orders (user_id, amount, created_at, updated_at))
VALUES (5,10, NOW(), NOW());
To clarify, which I found misleading during my research, PostgreSQL does not directly calculate modulo from the column's value (as, for example, hash partitioning is also possible for varchar column).
PostgreSQL uses a hash function to balance partitions as evenly as possible without any particular order. This approach does not make much sense for columns with low cardinality, as demonstrated by the example above, where I obtained certain results for only 5 different values:
orders_1
User ID | Amount | Created At | Updated At |
---|---|---|---|
1 | 10 | … | … |
4 | 10 | … | … |
5 | 10 | … | … |
orders_2
User ID | Amount | Created At | Updated At |
---|---|---|---|
2 | 10 | … | … |
3 | 10 | … | … |
Records were distributed among only two of the five partitions, without any specific order, leaving the rest of the partitions (orders3, orders4 and orders_5) empty. Therefore, for a column with a low variety of values, applying hash partitioning would be pointless since the records won't be distributed evenly.
Hash partitioning proves useful in a select number of scenarios. It is primarily used for I/O improvements, such as parallel writing, especially when the cardinality of the partitioning column is high. Additionally, it enhances the efficiency of autovacuuming, as vacuuming smaller tables is faster and more effective.
Last but not least - if our write statement contains null value in partitioning attribute, it is assigned to partition with reminder 0 by default. Also, for hash partitioning it is not possible to create default partition.
Wrapping up
To sum up - just as careful consideration is required when deciding on table partitioning, the selection of the appropriate partitioning method should also be well-thought-out. Shorty describing best cases for each - pick range partitioning for time-series data - it very natural way for any archiving purposes. List partitioning would be best for any specific values, or group of values. Hash partitioning would be intuitively least likely choice, as it doesn’t rely specifically on data structure, but for some scenarios with I/O improvements and parallel writing it could be a good alternative. Choose wisely!
Articles in this series
- Table partitioning in Rails, part 1
- Table partitioning in Rails, part 2
- Table partitioning type
- Indexing partitioned table