Escape function for regular expression or LIKE patterns Escape function for regular expression or LIKE patterns postgresql postgresql

Escape function for regular expression or LIKE patterns


To address the question at the top:

Regular expression escape function

Let's start with a complete list of characters with special meaning in regular expression patterns:

!$()*+.:<=>?[\]^{|}-

Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:

  • - needs to be first or last or it signifies a range of characters.
  • ] and \ have to be escaped with \ (in the replacement, too).

After adding capturing parentheses for the back reference below we get this regexp pattern:

([!$()*+.:<=>?[\\\]^{|}-])

Using it, this function escapes all special characters with a backslash (\) - thereby removing the special meaning:

CREATE OR REPLACE FUNCTION f_regexp_escape(text)  RETURNS text  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS$func$SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')$func$;

Add PARALLEL SAFE (because it is) in Postgres 10 or later to allow parallelism for queries using it.

Demo

SELECT f_regexp_escape('test(1) > Foo*');

Returns:

test\(1\) \> Foo\*

And while:

SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';

returns FALSE, which may come as a surprise to naive users,

SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');

Returns TRUE as it should now.

LIKE escape function

For completeness, the pendant for LIKE patterns, where only three characters are special:

\%_

The manual:

The default escape character is the backslash but a different one can be selected by using the ESCAPE clause.

This function assumes the default:

CREATE OR REPLACE FUNCTION f_like_escape(text)  RETURNS text  LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS$func$SELECT replace(replace(replace($1         , '\', '\\')  -- must come 1st         , '%', '\%')         , '_', '\_');$func$;

We could use the more elegant regexp_replace() here, too, but for the few characters, a cascade of replace() functions is faster.

Again, PARALLEL SAFE in Postgres 10 or later.

Demo

SELECT f_like_escape('20% \ 50% low_prices');

Returns:

20\% \\ 50\% low\_prices


how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);insert into my_table(name) values ('John Bernard'),                                   ('John Bernard (1)'),                                   ('John Bernard (2)'),                                   ('John Bernard (3)');select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1) from my_table where substring(name, 1, 12)='John Bernard'       and substring(name, 13)~'^ \([1-9][0-9]*\)$'; max-----   4(1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.


Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:

name text not null,number integer not null,primary key (name, number)

It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.

If you like, you can create a view for this duty. Here's the data:

=> select * from foo;  name  | number --------+-------- Fred   |      0 Fred   |      1 Barney |      0 Betty  |      0 Betty  |      1 Betty  |      2(6 rows)

The view:

create or replace view foo_view asselect *,case  when number = 0 then    name  else    name || ' (' || number || ')'end as name_and_numberfrom foo;

And the result:

=> select * from foo_view;  name  | number | name_and_number --------+--------+----------------- Fred   |      0 | Fred Fred   |      1 | Fred (1) Barney |      0 | Barney Betty  |      0 | Betty Betty  |      1 | Betty (1) Betty  |      2 | Betty (2)(6 rows)