18
.
02
.
2024
14
.
12
.
2023
Ruby
Ruby on Rails
Postgresql

ILIKE vs LIKE/LOWER - Postgres Stories

Jarosław Kowalewski
Ruby Developer
Jarek Kowalewski - ILIKE vs LIKE/LOWER - Postgres Stories

In web development, the search feature is commonplace. This can be implemented in numerous ways, ranging from simple where conditions in your database to powerful search engines like Elasticsearch, Solr, or Sphinx. In between, there are database tools - LIKE, ILIKE, regular expression queries (potentially enhanced with Trigram Indexes), or Full Text Search, which in Rails is well supported by the pg_search gem.

In this post, I'll examine the ILIKE query and a workaround for improving its efficiency - the use of the LIKE query with the LOWER function. I'll begin with a brief introduction to both.

ILIKE operator

So, in simple terms, ILIKE allows case-insensitive pattern matching. Similarly to LIKE, it can be used with a leading and/or trailing wildcard to find a match:

  • aBc% - strings that start with 'aBc'
  • %aBc - strings that end with 'aBc'
  • %aBc% - strings that contain 'aBc'

Remember that, not as LIKE, ILIKE is not part of the SQL standard - it is only available in PostgreSQL. In other databases such as MySQL or SQLite, you can replicate this functionality by using the COLLATE keyword and choosing the correct collation.

LOWER function

The LOWER function essentially converts all capital letters in text to lowercase. For the purpose of this article, it allows us to substitute an ILIKE query with the combined use of LIKE and LOWER.

Thus, the following two queries should yield equivalent search results:


User.where('last_name ILIKE ?', "%aBc%")

User.where('lower(last_name) LIKE lower(?)', "%aBc%")

Test cases

Let's determine which option is more efficient. For this, I created a users table with approximately 1.5 million records and a string(varchar) column named last_name. This was populated with random English surnames using the Faker gem. My local PostgreSQL version is 13. I will test it with different types of wildcards. To measure the results, I will use the explain analyze option which lets me track the Postgres planner and execution time.

I've developed some methods for the User model in Ruby to simplify query usage in the Rails console. In the first example, I’ve used a leading wildcard. Please note that I have adjusted phrases with correct % in next examples for trailing and leading/trailing wildcards.It is apparent in each plan presented below.


class User < ApplicationRecord
  def self.ilike_search(exp)
    where('last_name ILIKE ?', exp)
  end

  def self.lower_search(exp)
    where('LOWER(last_name) LIKE LOWER(?)', exp)
  end
end

Leading wildcard

I will choose a random phrase that predicts the start of the last name (e.g., 'and' to find 'Anderson', etc.):

User.ilike_search('and%').explain(:analyze)
User.lower_search('and%').explain(:analyze)

The Postgres planner provides a comprehensive plan. The most relevant aspect here is the execution time, especially since I didn't apply any indexes and the query itself is straightforward and brief. To cut to the chase, LIKE/LOWER is faster than ILIKE. This is evident when comparing execution times: 157.634 ms for LOWER/LIKE and 225.187 ms for ILIKE.

Trailing wildcard

For trailing wildcard searches, I'm looking for last names ending with 'Re'. This should also find names ending in 're' (such as D'Amore, Moore, etc.):

User.ilike_search('%Re').explain(:analyze)
User.lower_search('%Re').explain(:analyze)

For the trailing wildcard, the story is similar. LIKE/LOWER takes 171.703 ms, while ILIKE takes 261.146 ms. Thus, LIKE/LOWER wins again.

Combined wildcard

In this step, I will search for all last names containing the phrase 'Son', which also includes 'son'.

User.ilike_search('%Son%').explain(:analyze)
User.lower_search('%Son%').explain(:analyze)

For this example, the story is similar. LIKE/LOWER proved to be faster, taking 195.931 ms compared to 257.449 ms for ILIKE.

In many cases I found, using LOWER with LIKE is faster than ILIKE. However, I will demonstrate an example where it is actually slower. This is an example for combined wildcards, using the phrase ‘So’.

User.ilike_search('%So%').explain(:analyze)
User.lower_search('%So%').explain(:analyze)

Here's an example where LIKE/LOWER is slower (416.909 ms) than ILIKE (255.328 ms). From my tests, this only occurs with combined wildcards which are resource-intensive and time-consuming. Due to their pattern comparison method for each string chunk, combined wildcards are highly sensitive to data distribution. For instance, if many records match the substring 'So,' the case-insensitive ILIKE might perform better as it doesn't require a LOWER operation on every row.

This might suggest that the longer the search term is, which in general yields fewer results, the faster LOWER becomes. Let’s check it by measuring average execution times for specific wildcards.

Average results

To get comprehensive results I will loop each query 100 times with random strings with length from 1 to 5 characters and extract execution time from its planner results to get final outcome. I will manage it with script to calculate average execution time. Obviously, I will also correctly replace wildcards in execution_benchmark method.


def self.random_string(min_length, max_length)
  raise ArgumentError, "Invalid range: #{min_length} to #{max_length}" if min_length > max_length

  characters = ('a'..'z').to_a + ('A'..'Z').to_a
  length = rand(min_length..max_length)

  Array.new(length) { characters.sample }.join
end

def self.execution_benchmark
  total_execution_time = [0,0]
  num_runs = 100
  num_runs.times do
    string = User.random_string(1, 5)
    ilike_result = User.ilike_search("#{string}%").explain(:analyze)
    lower_result = User.lower_search("#{string}%").explain(:analyze)
    ilike_execution_time = ilike_result.split("\n")[-2].match(/Execution Time: (\d+\.\d+) ms/)[1].to_f
    lower_execution_time = lower_result.split("\n")[-2].match(/Execution Time: (\d+\.\d+) ms/)[1].to_f

    total_execution_time[0] += ilike_execution_time
        total_execution_time[1] += lower_execution_time
  end

  average_ilike_time = total_execution_time[0]/num_runs
    average_lower_time = total_execution_time[1]/num_runs

    puts average_ilike_time
    puts average_lower_time
end

Leading wildcard

User.execution_benchmark for leading wildcard:


User Load (218.5ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE 'RsuXJ%')
User Load (142.9ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('RsuXJ%'))
User Load (217.8ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE 'rGPS%')
User Load (146.6ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('rGPS%'))
User Load (228.7ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE 'EFJN%')
User Load (143.9ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('EFJN%'))

...

Average output for ILIKE: 244.70 ms
Average output for LOWER: 173.55 ms
Percentage difference(approximately): LOWER is ****40.1% faster.

Trailing wildcard

User.execution_benchmark for trailing wildcard:


User Load (416.6ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%GWvK')
User Load (146.0ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%GWvK'))
User Load (219.0ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%vlDk')
User Load (145.3ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%vlDk'))
User Load (342.3ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%l')
User Load (581.5ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%l'))

...

Average output for ILIKE: 246.33 ms
Average output for LOWER: 212.78 ms
Percentage difference(approximately): LOWER is 15.8% faster.

Combined wildcard

In this case, it's beneficial to consider three scenarios involving different pattern lengths. The first scenario will address short patterns, second - medium, while the third one will focus on long patterns. I will replace line string = User.random_string(1, 5) with correct ranges.

User.execution_benchmark for combined wildcard and string lengths ranging from 1 to 2:


User Load (278.4ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%BS%')
User Load (457.3ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%BS%'))
User Load (220.4ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%LX%')
User Load (416.8ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%LX%'))
User Load (1924.9ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%C%')
User Load (2084.6ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%C%'))
User Load (2641.5ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%i%')
User Load (3047.9ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%i%'))
...

Average output for ILIKE: 542.89 ms
Average output for LOWER: 559.93 ms
Percentage difference(approximately): LOWER is 3% slower.

User.execution_benchmark for combined wildcard and string lengths ranging from 2 to 5:


User Load (221.3ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%xP%')
User Load (422.2ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%xP%'))
User Load (223.3ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%zXwt%')
User Load (155.8ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%zXwt%'))
User Load (217.3ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%QC%')
User Load (411.0ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%QC%'))
User Load (220.5ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%PpFt%')
User Load (147.1ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%PpFt%'))

Average output for ILIKE: 237.03 ms
Average output for LOWER: 228.27 ms
Percentage difference(approximately): LOWER is 3.8% faster.

User.execution_benchmark for combined wildcard and string lengths ranging from 3 to 7:


User Load (235.6ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%LcdFatK%')
User Load (162.5ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%LcdFatK%'))
User Load (246.4ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%zCSXH%')
User Load (153.7ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%zCSXH%'))
User Load (238.3ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%Luf%')
User Load (151.4ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%Luf%'))
User Load (233.9ms)  SELECT "users".* FROM "users" WHERE (last_name ILIKE '%wUPxBw%')
User Load (150.7ms)  SELECT "users".* FROM "users" WHERE (LOWER(last_name) LIKE LOWER('%wUPxBw%'))

Average output for ILIKE: 230.97 ms
Average output for LOWER: 163.00 ms
Percentage difference(approximately): LOWER is 41% faster.

Conclusion

In conclusion, the LIKE/LOWER combination is usually faster than using ILIKE in PostgreSQL, especially for leading and trailing wildcard searches. However, with combined wildcards, the efficiency can vary based on the length of the search term and the distribution of the data. Therefore, it's crucial to test different scenarios in your production environment to determine the most efficient option. To measure directly on the database server, you might consider using such tool like pganalyze. Alternatively, popular web performance monitoring applications such as Sentry, Rollbar, or Airbrake are also good options. Each one provides tools to measure the speed of query execution in your application and/or tracking for most time consuming queries, enabling you to choose the best fitting option.

Jarosław Kowalewski
Ruby Developer

Check my Twitter

Check my Linkedin

Did you like it? 

Sign up To VIsuality newsletter

READ ALSO

LLM Embeddings in Ruby - Paweł Strzałkowski

LLM Embeddings in Ruby

17
.
03
.
2024
Paweł Strzałkowski
Ruby
LLM
Embeddings
ChatGPT
Ollama
Handling Errors in Concurrent Ruby, Michał Łęcicki

Handling Errors in Concurrent Ruby

14
.
11
.
2023
Michał Łęcicki
Ruby
Ruby on Rails
Tutorial
Recap of Friendly.rb 2024 conference

Insights and Inspiration from Friendly.rb: A Ruby Conference Recap

02
.
10
.
2024
Kaja Witek
Conferences
Ruby on Rails

Covering indexes - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Ruby on Rails
Postgresql
Backend
Ula Sołogub - SQL Injection in Ruby on Rails

The Deadly Sins in RoR security - SQL Injection

14
.
11
.
2023
Urszula Sołogub
Backend
Ruby on Rails
Software
Michal - Highlights from Ruby Unconf 2024

Highlights from Ruby Unconf 2024

14
.
11
.
2023
Michał Łęcicki
Conferences
Visuality
Cezary Kłos - Optimizing Cloud Infrastructure by $40 000 Annually

Optimizing Cloud Infrastructure by $40 000 Annually

14
.
11
.
2023
Cezary Kłos
Backend
Ruby on Rails

Smooth Concurrent Updates with Hotwire Stimulus

14
.
11
.
2023
Michał Łęcicki
Hotwire
Ruby on Rails
Software
Tutorial

Freelancers vs Software house

02
.
10
.
2024
Michał Krochecki
Visuality
Business

Table partitioning in Rails, part 2 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
Ruby on Rails

N+1 in Ruby on Rails

14
.
11
.
2023
Katarzyna Melon-Markowska
Ruby on Rails
Ruby
Backend

Turbo Streams and current user

29
.
11
.
2023
Mateusz Bilski
Hotwire
Ruby on Rails
Backend
Frontend

Showing progress of background jobs with Turbo

14
.
11
.
2023
Michał Łęcicki
Ruby on Rails
Ruby
Hotwire
Frontend
Backend

Table partitioning in Rails, part 1 - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend
Ruby on Rails

Table partitioning types - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Backend

Indexing partitioned table - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Backend
Postgresql
SQL Views in Ruby on Rails

SQL views in Ruby on Rails

14
.
11
.
2023
Jan Grela
Backend
Ruby
Ruby on Rails
Postgresql
Design your bathroom in React

Design your bathroom in React

14
.
11
.
2023
Bartosz Bazański
Frontend
React
Lazy Attributes in Ruby - Krzysztof Wawer

Lazy attributes in Ruby

14
.
11
.
2023
Krzysztof Wawer
Ruby
Software

Exporting CSV files using COPY - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Postgresql
Ruby
Ruby on Rails
Michał Łęcicki - From Celluloid to Concurrent Ruby

From Celluloid to Concurrent Ruby: Practical Examples Of Multithreading Calls

14
.
11
.
2023
Michał Łęcicki
Backend
Ruby
Ruby on Rails
Software

Super Slide Me - Game Written in React

14
.
11
.
2023
Antoni Smoliński
Frontend
React
Jarek Kowalewski - ILIKE vs LIKE/LOWER - Postgres Stories

ILIKE vs LIKE/LOWER - Postgres Stories

14
.
11
.
2023
Jarosław Kowalewski
Ruby
Ruby on Rails
Postgresql