29
.
04
.
2024
29
.
04
.
2024
Backend
Postgresql
Ruby on Rails

Table partitioning in Rails, part 2 - Postgres Stories

Jarosław Kowalewski
Ruby Developer

Hello again! In the previous part about partitioning tables in Rails, I introduced you to the basics of how partitioning works and how you can implement it on a freshly created table. Also, I promised that I would show you how to handle it for an already existing table which holds some (probably much) data. So, let’s dive into the topic.

Migrating existing table

Partitioning primarily focuses on large datasets, which is seldom a consideration at the beginning of a project. It's rare to plan table partitioning upfront.

Typically, the situation involves hitting the performance threshold and deciding to partition an existing large table. If this table is huge it likely plays a significant role in the system. Given its widespread use, continuous availability is crucial. Any downtime would be unpleasant to the end-user.

Meanwhile, we aim to migrate or modify a large dataset, a process that often involves locking or shutting down the table for longer period. That's why it's important to find a solution that minimizes potential unavailability. This is particularly significant since most PostgreSQL documentation and other tutorials focus on development of partitioning from scratch.

The basic approach is simple:

  • Create a new, partitioned table.
  • Create partitions.
  • Use INSERT INTO/SELECT or COPY to transfer data into the partitioned table.
  • Carry out any additional steps like recreating indexes/triggers, etc.
  • Switch or rename the old table and the partitioned table.
  • Delete the old table.

At first glance, this method seems effective, but it complicates write operations. Despite having triggers on update on the old table, maintaining complete integrity between the old and new tables is a challenge. Furthermore, this approach requires at least twice the storage for the operation.

Partitioning a large table with no down time

Let’s see alternative, better approach - I will present it for 2 different types of partitioning - range and list. More about such types you can read in separate article about partitioning types

Range partitioning

This is the first part of transforming our table into a partitioned one, which includes the following steps:

  • Rename the old table
  • Create a new partitioned table
  • Attach the old table as the default partition for all historical data
  • Create partitions for future use
BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY RANGE(created_at)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;

// here we assume that our current data is ending before 2023-12-01

    CREATE TABLE users_2023_12 PARTITION OF users
        FOR VALUES FROM ('2023-12-01') TO ('2024-01-01');

    CREATE TABLE users_2024_01 PARTITION OF users
        FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

// add indexes/triggers for new partitions
COMMIT;

The second part will involve transforming historical data into separate partitions:

The steps described below are as follows:

  • Detach the default partition
  • Create a new partition for a portion of the historical data
  • Remove data from the historical table and insert it into the new partition
  • Reattach the old table
BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_2023_11 PARTITION OF users
        FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');

    // add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE created_at >= '2023-11-01' AND created_at < '2023-12-01'
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

We detach the default partition because it's impossible to add a new partition if records in the default one belong to the new one.

This example pertains to range partitioning. An additional example for list partitioning is the next one, which follows a similar process.

List partitioning

BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY LIST(status)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

In this case, additional partitions aren't created by default. This is because we assume that all statuses are in use. Similarly to range partitioning, we cannot create partitions for common values.

BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_0 PARTITION OF users
        FOR VALUES IN (0)

    //add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE status = 0
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

Once again, we detach the default partition, create a new one, transfer data to it, and reattach the default partition.

Using Views and Triggers to introduce Partitioning

The last approach that I found handy is to use different mechanisms of Postgres - views and triggers. Long story short - this one has following steps:

  • Create partitioned table along with partitions/indexes
  • Rename old table
  • Create view with UNION ALL both tables
  • Create proper triggers for writing operations - INSERT is executed only on partitioned table, UPDATE/DELETE are proceeded on both tables
  • Move data from old table to new one in batches
  • Remove view and old table

You can read more about this approach in this article. The benefit of this method is that it allows us to easily manage the size of batches we want to transfer into partitions. In the previous method, batch sizes were defined by the new partition's size (either by the range of values or by a specific value in list partitioning). This restricts our ability to control transaction duration lengths and the locks they caused.

Partitioning large tables in Rails

Now, its time to prepare migration for partitioning within Rails app. Exactly as in previous article, I am going to utilize https://github.com/rkrage/pg_party gem. In this case, I’ve chosen list partitioning by the status field. The first migration will initialize the partitioned table and attach the old table as the default partition.

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        rename_table :users, :users_temp

    create_list_partition :users, partition_key: :status, template: false do |t|
      t.string :email, null: false
      t.string :first_name
      t.string :last_name
      t.string :username, null: false
      t.integer :status, default: 0
      t.jsonb :settings, default: {}
      t.timestamps
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp
        drop_table :users
        rename_table :users_temp, :users
    end
end

The following steps outline the process I've used in SQL:

  1. Rename the old table as temporary
  2. Create a new partitioned table
  3. Attach the old table as default

Next, I will create the first list partition for the status value equal to 0. To simplify data migration, I will also create a temporary model. Alternatively, if you need to, you can also persist such a model directly in a migration.

class UsersTemp < ApplicationRecord; end

Following migration:

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        detach_partition :users, :users_temp
    create_list_partition_of :users_0, :users, values: 0

    ActiveRecord::Base.transaction do
      UsersTemp.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        User.insert_all!(batch.map { |user_temp| user_temp.attributes.except("id") })
      end

      UsersTemp.where(status: 0).delete_all
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp

        ActiveRecord::Base.transaction do
            User.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        UsersTemp.insert_all!(batch.map { |user| user.attributes.except("id") })
      end
        end

        detach_partition :users, :users_0
        drop_table :users_0
        attach_default_partition :users, :users_temp
    end
end

Again, I’ve tried to reproduce steps from SQL:

  1. Detach default partition
  2. Create a new partition for specific value of status = 0
  3. Migrate data from default detached partition
  4. Attach default partition again

Such migration should be prepared for every possible value of status field. I assumed, that amount of status values are limited.

In the revert migration, I have the option to detach the users_0 partition and directly copy its data into the users partitioned table. This would automatically assign the data to the default partition, and then I could simply drop this partition.

To keep with the ActiveRecord approach, this might require creating a temporary model like Users0 or correctly assigning the table_name. Alternatively, writing raw SQL with ActiveRecord::Base.connection.execute for this data migration could allow us to avoid avoid the need for temporary models to access our data. This approach should be considered depending on the scenario.

Summary

In general, partitioning existing tables is always a significant challenge that may demand even weeks of proper planning and choosing the right approach for the specific case. It's important to highlight a few issues that warrant concern:

  • Maintaining data consistency between tables
  • Minimizing lock duration and downtimes
  • Keeping an eye on possible storage usage

Following these points should be a core part of preparing such a migration and should lead to a happy ending. With that, I'll wrap up this quite exciting (I trust not only for me) journey through partitioning in PostgreSQL. I’m happy that I took part in explaining you the details of this topic and I hope you enjoyed it!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

CSS Modules in Rails

14
.
11
.
2023
Adam Król
Ruby on Rails
Tutorial
Backend
Frontend

How to choose a software house.

14
.
11
.
2023
Michał Piórkowski
Ruby on Rails
Business
Visuality

JSON API versus the NIH syndrome

14
.
11
.
2023
Nadia Miętkiewicz
Backend
Frontend
Tutorial

From Idea to Concept

02
.
10
.
2024
Michał Krochecki
Ruby on Rails
Business
Startups

Styling React Components

14
.
11
.
2023
Umit Naimian
Ruby on Rails
Frontend
Tutorial

How good design can help your business grow

14
.
11
.
2023
Lukasz Jackiewicz
Design
Business
Marketing

TODO not. Do, or do not.

29
.
11
.
2023
Stanisław Zawadzki
Ruby on Rails
Software

CS Lessons #003: Density map in three ways

14
.
11
.
2023
Michał Młoźniak
Ruby
Backend
Tutorial
Software

Clean code for the win

14
.
11
.
2023
Michał Piórkowski
Ruby on Rails
Backend
Frontend
Business

Crowd-operated Christmas Lights

14
.
11
.
2023
Nadia Miętkiewicz
Ruby on Rails
Backend

How to startup and be mature about it

14
.
11
.
2023
Rafał Maliszewski
Ruby on Rails
Startups
Business

A journey of a thousand miles begins with workshops

14
.
11
.
2023
Michał Piórkowski
Software
HR

CS Lessons #002: Data structures

14
.
11
.
2023
Michał Młoźniak
Ruby
Software

Summary of Phoenix workshop at Visuality

14
.
11
.
2023
Karol Słuszniak
Ruby on Rails
Visuality
Backend

CS Lessons #000: Introduction and motivation

14
.
11
.
2023
Michał Młoźniak
Ruby
Software

CS Lessons #001: Working with binary files

14
.
11
.
2023
Michał Młoźniak
Ruby
Software

Working with 40-minute intervals

14
.
11
.
2023
Sakir Temel
Software
HR

THE MATURE TECH STARTUP DILEMMA: WHAT'S NEXT

14
.
11
.
2023
Susanna Romantsova
Startups

Win MVP workshop!

14
.
11
.
2023
Susanna Romantsova
Startups

FINTECH WEEK IN OSLO: WHATs & WHYs

14
.
11
.
2023
Susanna Romantsova
Conferences

MY FIRST MONTH AT VISUALITY

14
.
11
.
2023
Susanna Romantsova
Visuality
HR

NASA 1st global hackaton in Poland? Visuality Created it!

14
.
11
.
2023
Rafał Maliszewski
Ruby on Rails

Berlin StartupCamp 2016 summary

14
.
11
.
2023
Michał Piórkowski
Conferences
Startups

Investment Days for productivity

14
.
11
.
2023
Rafał Maliszewski
Visuality
HR