Unescape a string with escaped newlines and carriage returns Unescape a string with escaped newlines and carriage returns database database

Unescape a string with escaped newlines and carriage returns


To "unescape" a string, you have to "execute" it - literally. Use the EXECUTE command in plpgsql.
You can wrap this into a function. Naive approach:

CREATE OR REPLACE FUNCTION f_unescape(text, OUT _t text) AS$func$BEGIN   EXECUTE 'SELECT E''' || $1 || ''''   INTO _t;END$func$  LANGUAGE plpgsql STABLE;

Call:

SELECT f_unescape('\r\nabcdef\t0123\x123\n');

This naive form is vulnerable to single quotes in the original string, which need to be escaped. But that's a bit tricky. Single quotes can be escaped in two ways in a posix escape string syntax: \' or ''. But we could also have \\' etc. Basics:

We could enclose the string in dollar quoting, but that does not work for posix escape string syntax. E'\'' cannot be replaced with E$$\'$$. We could add SET standard_conforming_strings = off to the function, then we wouldn't have to prepend strings with E. But that would disable function inlining and interpret escapes everywhere in the function body.

Instead, escape all ' and all (optionally) leading \ with regexp_replace():

regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g')

(\\*) .. 0 or more leading \
(\''+) .. capture 1 or more '
'\1\1\2\2' .. double up each match
'g' .. replace all occurrences, not just the first

Safe function

CREATE OR REPLACE FUNCTION f_unescape(IN text, OUT _t text)  RETURNS text AS$BODY$BEGIN   EXECUTE $$SELECT E'$$ || regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g') || $$'$$   INTO _t;END$BODY$  LANGUAGE plpgsql STABLE;

I assume you are aware that the operation cannot be reversed reliable. There is not way to tell which special character was escaped before and which wasn't. You can escape all or none. Or do it manually like before. But if the same character was included in literal and escape form, you cannot tell them apart any more.

Test case:

SELECT t, f_unescape(t)FROM  (   VALUES     ($$'$$)   , ($$''$$)   , ($$'''$$)   , ($$\'$$)   , ($$\\'$$)   , ($$\\\'$$)   , ($$\\\'''$$)   , ($$\r\\'nabcdef\\\t0123\x123\\\\\'''\n$$)   ) v(t);