Humanized or natural number sorting of mixed word-and-number strings

sql postgresql sorting collation natural-sort

Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.

CREATE TYPE ai AS (a text, i int);

The trick is to form an array of ai from each value in the column.

regexp_matches() with the pattern (\D*)(\d*) and the g option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}' Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings ('') with 0 in the integer component (as '' cannot be cast to integer).

NULL values sort first - or you have to special case them - or use the whole shebang in a STRICT function like @Craig proposes.

Postgres 9.4 or later

SELECT dataFROM   alnumORDER  BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai                FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)        , data;

db<>fiddle here

Postgres 9.1 (original answer)

Tested with PostgreSQL 9.1.5, where regexp_replace() had a slightly different behavior.

SELECT dataFROM  (    SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x    FROM   alnum    ) xGROUP  BY ctid, data   -- ctid as stand-in for a missing pkORDER  BY regexp_replace (left(data, 1), '[0-9]', '0')        , array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)        , data         -- for special case of trailing 0

Add regexp_replace (left(data, 1), '[1-9]', '0') as first ORDER BY item to take care of leading digits and empty strings.

~~If special characters like {}()"', can occur, you'd have to escape those accordingly.~~
@Craig's suggestion to use a ROW expression takes care of that.

~~BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:~~

~~Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is not yet implemented.~~

This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1

sql postgresql sorting collation natural-sort

I faced this same problem, and I wanted to wrap the solution in a function so I could re-use it easily. I created the following function to achieve a 'human style' sort order in Postgres.

CREATE OR REPLACE FUNCTION human_sort(text)  RETURNS text[] AS$BODY$     /* Split the input text into contiguous chunks where no numbers appear,     and contiguous chunks of only numbers. For the numbers, add leading      zeros to 20 digits, so we can use one text array, but sort the      numbers as if they were big integers.       For example, human_sort('Run 12 Miles') gives            {'Run ', '00000000000000000012', ' Miles'}  */  select array_agg(    case      when a.match_array[1]::text is not null         then a.match_array[1]::text               else lpad(a.match_array[2]::text, 20::int, '0'::text)::text                                          end::text)    from (      select regexp_matches(        case when $1 = '' then null else $1 end, E'(\\D+)|(\\d+)', 'g'      ) AS match_array          ) AS a  $BODY$  LANGUAGE sql IMMUTABLE;

tested to work on Postgres 8.3.18 and 9.3.5

No recursion, should be faster than recursive solutions
Can be used in just the order by clause, don't have to deal with primary key or ctid
Works for any select (don't even need a PK or ctid)
Simpler than some other solutions, should be easier to extend and maintain
Suitable for use in a functional index to improve performance
Works on Postgres v8.3 or higher
Allows an unlimited number of text/number alternations in the input
Uses just one regex, should be faster than versions with multiple regexes
Numbers longer than 20 digits are ordered by their first 20 digits

Here's an example usage:

select * from (values   ('Books 1', 9),  ('Book 20 Chapter 1', 8),  ('Book 3 Suffix 1', 7),  ('Book 3 Chapter 20', 6),  ('Book 3 Chapter 2', 5),  ('Book 3 Chapter 1', 4),  ('Book 1 Chapter 20', 3),  ('Book 1 Chapter 3', 2),  ('Book 1 Chapter 1', 1),  ('', 0),  (null::text, 0)) as a(name, sort)order by human_sort(a.name)-----------------------------|name               |  sort |-----------------------------|                   |   0   ||                   |   0   ||Book 1 Chapter 1   |   1   ||Book 1 Chapter 3   |   2   ||Book 1 Chapter 20  |   3   ||Book 3 Chapter 1   |   4   ||Book 3 Chapter 2   |   5   ||Book 3 Chapter 20  |   6   ||Book 3 Suffix 1    |   7   ||Book 20 Chapter 1  |   8   ||Books 1            |   9   |-----------------------------

sql postgresql sorting collation natural-sort

Adding this answer late because it looked like everyone else was unwrapping into arrays or some such. Seemed excessive.

CREATE FUNCTION rr(text,int) RETURNS text AS $$SELECT regexp_replace(    regexp_replace($1, '[0-9]+', repeat('0',$2) || '\&', 'g'),     '[0-9]*([0-9]{' || $2 || '})',     '\1',     'g')$$ LANGUAGE sql;SELECT t,rr(t,9) FROM mixed ORDER BY t;      t       |             rr              --------------+----------------------------- AAA02free    | AAA000000002free AAA10bob     | AAA000000010bob AAA2bbb03boo | AAA000000002bbb000000003boo AAA2bbb3baa  | AAA000000002bbb000000003baa AAA2fred     | AAA000000002fred(5 rows)(reverse-i-search)`OD': SELECT crypt('richpass','$2$08$aJ9ko0uKa^C1krIbdValZ.dUH8D0R0dj8mqte0Xw2FjImP5B86ugC');richardh=> richardh=> SELECT t,rr(t,9) FROM mixed ORDER BY rr(t,9);      t       |             rr              --------------+----------------------------- AAA2bbb3baa  | AAA000000002bbb000000003baa AAA2bbb03boo | AAA000000002bbb000000003boo AAA2fred     | AAA000000002fred AAA02free    | AAA000000002free AAA10bob     | AAA000000010bob(5 rows)

I'm not claiming two regexps are the most efficient way to do this, but rr() is immutable (for fixed length) so you can index it. Oh - this is 9.1

Of course, with plperl you could just evaluate the replacement to pad/trim it in one go. But then with perl you've always got just-one-more-option (TM) than any other approach :-)

CodeHunter

Humanized or natural number sorting of mixed word-and-number strings

Postgres 9.4 or later

Postgres 9.1 (original answer)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last