29
.
04
.
2024
29
.
04
.
2024
Backend
Postgresql
Ruby on Rails

Table partitioning in Rails, part 2 - Postgres Stories

Jarosław Kowalewski
Ruby Developer

Hello again! In the previous part about partitioning tables in Rails, I introduced you to the basics of how partitioning works and how you can implement it on a freshly created table. Also, I promised that I would show you how to handle it for an already existing table which holds some (probably much) data. So, let’s dive into the topic.

Migrating existing table

Partitioning primarily focuses on large datasets, which is seldom a consideration at the beginning of a project. It's rare to plan table partitioning upfront.

Typically, the situation involves hitting the performance threshold and deciding to partition an existing large table. If this table is huge it likely plays a significant role in the system. Given its widespread use, continuous availability is crucial. Any downtime would be unpleasant to the end-user.

Meanwhile, we aim to migrate or modify a large dataset, a process that often involves locking or shutting down the table for longer period. That's why it's important to find a solution that minimizes potential unavailability. This is particularly significant since most PostgreSQL documentation and other tutorials focus on development of partitioning from scratch.

The basic approach is simple:

  • Create a new, partitioned table.
  • Create partitions.
  • Use INSERT INTO/SELECT or COPY to transfer data into the partitioned table.
  • Carry out any additional steps like recreating indexes/triggers, etc.
  • Switch or rename the old table and the partitioned table.
  • Delete the old table.

At first glance, this method seems effective, but it complicates write operations. Despite having triggers on update on the old table, maintaining complete integrity between the old and new tables is a challenge. Furthermore, this approach requires at least twice the storage for the operation.

Partitioning a large table with no down time

Let’s see alternative, better approach - I will present it for 2 different types of partitioning - range and list. More about such types you can read in separate article about partitioning types

Range partitioning

This is the first part of transforming our table into a partitioned one, which includes the following steps:

  • Rename the old table
  • Create a new partitioned table
  • Attach the old table as the default partition for all historical data
  • Create partitions for future use
BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY RANGE(created_at)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;

// here we assume that our current data is ending before 2023-12-01

    CREATE TABLE users_2023_12 PARTITION OF users
        FOR VALUES FROM ('2023-12-01') TO ('2024-01-01');

    CREATE TABLE users_2024_01 PARTITION OF users
        FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

// add indexes/triggers for new partitions
COMMIT;

The second part will involve transforming historical data into separate partitions:

The steps described below are as follows:

  • Detach the default partition
  • Create a new partition for a portion of the historical data
  • Remove data from the historical table and insert it into the new partition
  • Reattach the old table
BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_2023_11 PARTITION OF users
        FOR VALUES FROM ('2023-11-01') TO ('2023-12-01');

    // add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE created_at >= '2023-11-01' AND created_at < '2023-12-01'
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

We detach the default partition because it's impossible to add a new partition if records in the default one belong to the new one.

This example pertains to range partitioning. An additional example for list partitioning is the next one, which follows a similar process.

List partitioning

BEGIN;
    ALTER TABLE users RENAME TO users_temp;

    CREATE TABLE users
    ...
    PARTITION BY LIST(status)

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

In this case, additional partitions aren't created by default. This is because we assume that all statuses are in use. Similarly to range partitioning, we cannot create partitions for common values.

BEGIN;
    ALTER TABLE users DETACH PARTITION users_temp;

    CREATE TABLE users_0 PARTITION OF users
        FOR VALUES IN (0)

    //add indexes/triggers

    WITH copied_data AS (
        DELETE FROM users_temp
        WHERE status = 0
        RETURNING *
    )
    INSERT INTO users
    SELECT * FROM copied_data;

    ALTER TABLE users ATTACH PARTITION users_temp DEFAULT;
COMMIT;

Once again, we detach the default partition, create a new one, transfer data to it, and reattach the default partition.

Using Views and Triggers to introduce Partitioning

The last approach that I found handy is to use different mechanisms of Postgres - views and triggers. Long story short - this one has following steps:

  • Create partitioned table along with partitions/indexes
  • Rename old table
  • Create view with UNION ALL both tables
  • Create proper triggers for writing operations - INSERT is executed only on partitioned table, UPDATE/DELETE are proceeded on both tables
  • Move data from old table to new one in batches
  • Remove view and old table

You can read more about this approach in this article. The benefit of this method is that it allows us to easily manage the size of batches we want to transfer into partitions. In the previous method, batch sizes were defined by the new partition's size (either by the range of values or by a specific value in list partitioning). This restricts our ability to control transaction duration lengths and the locks they caused.

Partitioning large tables in Rails

Now, its time to prepare migration for partitioning within Rails app. Exactly as in previous article, I am going to utilize https://github.com/rkrage/pg_party gem. In this case, I’ve chosen list partitioning by the status field. The first migration will initialize the partitioned table and attach the old table as the default partition.

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        rename_table :users, :users_temp

    create_list_partition :users, partition_key: :status, template: false do |t|
      t.string :email, null: false
      t.string :first_name
      t.string :last_name
      t.string :username, null: false
      t.integer :status, default: 0
      t.jsonb :settings, default: {}
      t.timestamps
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp
        drop_table :users
        rename_table :users_temp, :users
    end
end

The following steps outline the process I've used in SQL:

  1. Rename the old table as temporary
  2. Create a new partitioned table
  3. Attach the old table as default

Next, I will create the first list partition for the status value equal to 0. To simplify data migration, I will also create a temporary model. Alternatively, if you need to, you can also persist such a model directly in a migration.

class UsersTemp < ApplicationRecord; end

Following migration:

class MigrateExistingUsersTable < ActiveRecord::Migration[7.1]
    def up
        detach_partition :users, :users_temp
    create_list_partition_of :users_0, :users, values: 0

    ActiveRecord::Base.transaction do
      UsersTemp.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        User.insert_all!(batch.map { |user_temp| user_temp.attributes.except("id") })
      end

      UsersTemp.where(status: 0).delete_all
    end

    attach_default_partition :users, :users_temp
    end

    def down
        detach_partition :users, :users_temp

        ActiveRecord::Base.transaction do
            User.where(status: 0).find_in_batches(batch_size: 1000) do |batch|
        UsersTemp.insert_all!(batch.map { |user| user.attributes.except("id") })
      end
        end

        detach_partition :users, :users_0
        drop_table :users_0
        attach_default_partition :users, :users_temp
    end
end

Again, I’ve tried to reproduce steps from SQL:

  1. Detach default partition
  2. Create a new partition for specific value of status = 0
  3. Migrate data from default detached partition
  4. Attach default partition again

Such migration should be prepared for every possible value of status field. I assumed, that amount of status values are limited.

In the revert migration, I have the option to detach the users_0 partition and directly copy its data into the users partitioned table. This would automatically assign the data to the default partition, and then I could simply drop this partition.

To keep with the ActiveRecord approach, this might require creating a temporary model like Users0 or correctly assigning the table_name. Alternatively, writing raw SQL with ActiveRecord::Base.connection.execute for this data migration could allow us to avoid avoid the need for temporary models to access our data. This approach should be considered depending on the scenario.

Summary

In general, partitioning existing tables is always a significant challenge that may demand even weeks of proper planning and choosing the right approach for the specific case. It's important to highlight a few issues that warrant concern:

  • Maintaining data consistency between tables
  • Minimizing lock duration and downtimes
  • Keeping an eye on possible storage usage

Following these points should be a core part of preparing such a migration and should lead to a happy ending. With that, I'll wrap up this quite exciting (I trust not only for me) journey through partitioning in PostgreSQL. I’m happy that I took part in explaining you the details of this topic and I hope you enjoyed it!

Articles in this series

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

LLM Embeddings in Ruby - Paweł Strzałkowski

LLM Embeddings in Ruby

17
.
03
.
2024
Paweł Strzałkowski
Ruby
LLM
Embeddings
ChatGPT
Ollama
Handling Errors in Concurrent Ruby, Michał Łęcicki

Handling Errors in Concurrent Ruby

14
.
11
.
2023
Michał Łęcicki
Ruby
Ruby on Rails
Tutorial
Recap of Friendly.rb 2024 conference

Insights and Inspiration from Friendly.rb: A Ruby Conference Recap

02
.
10
.
2024
Kaja Witek
Conferences
Ruby on Rails

Covering indexes - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Ruby on Rails
Postgresql
Backend
Ula Sołogub - SQL Injection in Ruby on Rails

The Deadly Sins in RoR security - SQL Injection

14
.
11
.
2023
Urszula Sołogub
Backend
Ruby on Rails
Software
Michal - Highlights from Ruby Unconf 2024

Highlights from Ruby Unconf 2024

14
.
11
.
2023
Michał Łęcicki
Conferences
Visuality
Cezary Kłos - Optimizing Cloud Infrastructure by $40 000 Annually

Optimizing Cloud Infrastructure by $40 000 Annually

14
.
11
.
2023
Cezary Kłos
Backend
Ruby on Rails

Smooth Concurrent Updates with Hotwire Stimulus

14
.
11
.
2023
Michał Łęcicki
Hotwire
Ruby on Rails
Software
Tutorial

Freelancers vs Software house

02
.
10
.
2024
Michał Krochecki
Visuality
Business

Table partitioning in Rails, part 2 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
Ruby on Rails

N+1 in Ruby on Rails

14
.
11
.
2023
Katarzyna Melon-Markowska
Ruby on Rails
Ruby
Backend

Turbo Streams and current user

29
.
11
.
2023
Mateusz Bilski
Hotwire
Ruby on Rails
Backend
Frontend

Showing progress of background jobs with Turbo

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Ruby
Hotwire
Frontend
Backend

Table partitioning in Rails, part 1 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend
Ruby on Rails

Table partitioning types - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend

Indexing partitioned table - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
SQL Views in Ruby on Rails

SQL views in Ruby on Rails

14
.
11
.
2023
Jan Grela
Backend
Ruby
Ruby on Rails
Postgresql
Design your bathroom in React

Design your bathroom in React

14
.
11
.
2023
Bartosz Bazański
Frontend
React
Lazy Attributes in Ruby - Krzysztof Wawer

Lazy attributes in Ruby

14
.
11
.
2023
Krzysztof Wawer
Ruby
Software

Exporting CSV files using COPY - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Ruby
Ruby on Rails
Michał Łęcicki - From Celluloid to Concurrent Ruby

From Celluloid to Concurrent Ruby: Practical Examples Of Multithreading Calls

14
.
11
.
2023
Michał Łęcicki
Backend
Ruby
Ruby on Rails
Software

Super Slide Me - Game Written in React

14
.
11
.
2023
Antoni Smoliński
Frontend
React
Jarek Kowalewski - ILIKE vs LIKE/LOWER - Postgres Stories

ILIKE vs LIKE/LOWER - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Ruby
Ruby on Rails
Postgresql