29
.
04
.
2024
29
.
04
.
2024
Backend
Postgresql
Ruby on Rails

Table partitioning in Rails, part 2 - Postgres Stories

Jarosław Kowalewski
Ruby Developer

Hello again! In the previous part about partitioning tables in Rails, I introduced you to the basics of how partitioning works and how you can implement it on a freshly created table. Also, I promised that I would show you how to handle it for an already existing table which holds some (probably much) data. So, let’s dive into the topic.

Migrating existing table

Partitioning primarily focuses on large datasets, which is seldom a consideration at the beginning of a project. It's rare to plan table partitioning upfront.

Typically, the situation involves hitting the performance threshold and deciding to partition an existing large table. If this table is huge it likely plays a significant role in the system. Given its widespread use, continuous availability is crucial. Any downtime would be unpleasant to the end-user.

Meanwhile, we aim to migrate or modify a large dataset, a process that often involves locking or shutting down the table for longer period. That's why it's important to find a solution that minimizes potential unavailability. This is particularly significant since most PostgreSQL documentation and other tutorials focus on development of partitioning from scratch.

The basic approach is simple:

  • Create a new, partitioned table.
  • Create partitions.
  • Use INSERT INTO/SELECT or COPY to transfer data into the partitioned table.
  • Carry out any additional steps like recreating indexes/triggers, etc.
  • Switch or rename the old table and the partitioned table.
  • Delete the old table.

At first glance, this method seems effective, but it complicates write operations. Despite having triggers on update on the old table, maintaining complete integrity between the old and new tables is a challenge. Furthermore, this approach requires at least twice the storage for the operation.

Partitioning a large table with no down time

Let’s see alternative, better approach - I will present it for 2 different types of partitioning - range and list. More about such types you can read in separate article about partitioning types

Range partitioning

This is the first part of transforming our table into a partitioned one, which includes the following steps:

  • Rename the old table
  • Create a new partitioned table
  • Attach the old table as the default partition for all historical data
  • Create partitions for future use
BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY RANGE(created_at)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;

// here we assume that our current data is ending before 2023-12-01

    CREATE TABLE users_2023_12 PARTITION OF users
        FOR VALUES FROM ('2023-12-01') TO ('2024-01-01');

    CREATE TABLE users_2024_01 PARTITION OF users
        FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

// add indexes/triggers for new partitions
COMMIT;

The second part will involve transforming historical data into separate partitions:

The steps described below are as follows:

  • Detach the default partition
  • Create a new partition for a portion of the historical data
  • Remove data from the historical table and insert it into the new partition
  • Reattach the old table
BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_2023_11 PARTITION OF users
        FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');

    // add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE created_at >= '2023-11-01' AND created_at < '2023-12-01'
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

We detach the default partition because it's impossible to add a new partition if records in the default one belong to the new one.

This example pertains to range partitioning. An additional example for list partitioning is the next one, which follows a similar process.

List partitioning

BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY LIST(status)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

In this case, additional partitions aren't created by default. This is because we assume that all statuses are in use. Similarly to range partitioning, we cannot create partitions for common values.

BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_0 PARTITION OF users
        FOR VALUES IN (0)

    //add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE status = 0
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

Once again, we detach the default partition, create a new one, transfer data to it, and reattach the default partition.

Using Views and Triggers to introduce Partitioning

The last approach that I found handy is to use different mechanisms of Postgres - views and triggers. Long story short - this one has following steps:

  • Create partitioned table along with partitions/indexes
  • Rename old table
  • Create view with UNION ALL both tables
  • Create proper triggers for writing operations - INSERT is executed only on partitioned table, UPDATE/DELETE are proceeded on both tables
  • Move data from old table to new one in batches
  • Remove view and old table

You can read more about this approach in this article. The benefit of this method is that it allows us to easily manage the size of batches we want to transfer into partitions. In the previous method, batch sizes were defined by the new partition's size (either by the range of values or by a specific value in list partitioning). This restricts our ability to control transaction duration lengths and the locks they caused.

Partitioning large tables in Rails

Now, its time to prepare migration for partitioning within Rails app. Exactly as in previous article, I am going to utilize https://github.com/rkrage/pg_party gem. In this case, I’ve chosen list partitioning by the status field. The first migration will initialize the partitioned table and attach the old table as the default partition.

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        rename_table :users, :users_temp

    create_list_partition :users, partition_key: :status, template: false do |t|
      t.string :email, null: false
      t.string :first_name
      t.string :last_name
      t.string :username, null: false
      t.integer :status, default: 0
      t.jsonb :settings, default: {}
      t.timestamps
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp
        drop_table :users
        rename_table :users_temp, :users
    end
end

The following steps outline the process I've used in SQL:

  1. Rename the old table as temporary
  2. Create a new partitioned table
  3. Attach the old table as default

Next, I will create the first list partition for the status value equal to 0. To simplify data migration, I will also create a temporary model. Alternatively, if you need to, you can also persist such a model directly in a migration.

class UsersTemp < ApplicationRecord; end

Following migration:

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        detach_partition :users, :users_temp
    create_list_partition_of :users_0, :users, values: 0

    ActiveRecord::Base.transaction do
      UsersTemp.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        User.insert_all!(batch.map { |user_temp| user_temp.attributes.except("id") })
      end

      UsersTemp.where(status: 0).delete_all
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp

        ActiveRecord::Base.transaction do
            User.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        UsersTemp.insert_all!(batch.map { |user| user.attributes.except("id") })
      end
        end

        detach_partition :users, :users_0
        drop_table :users_0
        attach_default_partition :users, :users_temp
    end
end

Again, I’ve tried to reproduce steps from SQL:

  1. Detach default partition
  2. Create a new partition for specific value of status = 0
  3. Migrate data from default detached partition
  4. Attach default partition again

Such migration should be prepared for every possible value of status field. I assumed, that amount of status values are limited.

In the revert migration, I have the option to detach the users_0 partition and directly copy its data into the users partitioned table. This would automatically assign the data to the default partition, and then I could simply drop this partition.

To keep with the ActiveRecord approach, this might require creating a temporary model like Users0 or correctly assigning the table_name. Alternatively, writing raw SQL with ActiveRecord::Base.connection.execute for this data migration could allow us to avoid avoid the need for temporary models to access our data. This approach should be considered depending on the scenario.

Summary

In general, partitioning existing tables is always a significant challenge that may demand even weeks of proper planning and choosing the right approach for the specific case. It's important to highlight a few issues that warrant concern:

  • Maintaining data consistency between tables
  • Minimizing lock duration and downtimes
  • Keeping an eye on possible storage usage

Following these points should be a core part of preparing such a migration and should lead to a happy ending. With that, I'll wrap up this quite exciting (I trust not only for me) journey through partitioning in PostgreSQL. I’m happy that I took part in explaining you the details of this topic and I hope you enjoyed it!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

Investment Days for productivity

14
.
11
.
2023
Rafał Maliszewski
Visuality
HR

Happy new year

14
.
11
.
2023
Michał Piórkowski
Visuality

Does Norway need Polish software development?

14
.
11
.
2023
Rafał Maliszewski
Ruby on Rails

Visuality is 8 years old

14
.
11
.
2023
Michał Piórkowski
Visuality
Backend

Use less javascript plugins

14
.
11
.
2023
Michał Młoźniak
Frontend

Front-Trends 2015

14
.
11
.
2023
Adam Król
Frontend

Automatic door opener controlled through slack

14
.
11
.
2023
Sakir Temel
Backend
Software
Tutorial

Wolves Summit

14
.
11
.
2023
Michał Piórkowski
Conferences
Business

Berlin Startup Camp

14
.
11
.
2023
Michał Piórkowski
Ruby on Rails
Conferences

Why you shouldn't work at Visuality

14
.
11
.
2023
Michał Piórkowski
Visuality
HR

SaaS Meetup #People

14
.
11
.
2023
Michał Piórkowski
Conferences

Startup Safary Berlin 2015

14
.
11
.
2023
Michał Piórkowski
Conferences

Emmet makes HTML and CSS easier

14
.
11
.
2023
Michał Piórkowski
Frontend
Tutorial

Optional dependencies in gems

14
.
11
.
2023
Karol Słuszniak
Ruby on Rails
Backend

Text messaging with textris gem

14
.
11
.
2023
Karol Słuszniak
Ruby on Rails
Backend