Retrieving full hierarchy sorted by a column under PostgreSQL's Ltree module Retrieving full hierarchy sorted by a column under PostgreSQL's Ltree module database database

Retrieving full hierarchy sorted by a column under PostgreSQL's Ltree module


You were on the right track with WITH RECURSIVE.

Solution with recursive CTE

WITH RECURSIVE t AS (    SELECT t.votes         , t.path         , 1::int AS lvl         , to_char(t2.votes, 'FM0000000')  AS sort    FROM   tbl t    JOIN   tbl t2 ON t2.path = subltree(t.path, 0, 1)    UNION ALL    SELECT t.votes         , t.path         , t.lvl + 1         , t.sort || to_char(t2.votes, 'FM0000000')    FROM   t    JOIN   tbl t2 ON t2.path = subltree(t.path, 0, t.lvl + 1)    WHERE  nlevel(t.path) > t.lvl    )SELECT votes, path, max(sort) AS sortFROM   tGROUP  BY 1, 2ORDER  BY max(sort), path;

Major points

  • The crucial part is to replace every level of the path with the value of votes. Thereby we assemble one column we can ORDER BY at the end. This is necessary, because the path has an unknown depth and we cannot order by an unknown number of expressions in static SQL.

  • In order to get a stable sort, I convert votes to a string with leading zeroes using to_char(). I use seven digits in the demo, which works for vote values below 10.000.000. Adjust according to your maximum vote count.

  • In the final SELECT I exclude all intermediary states to eliminate duplicates. Only the last step with max(sort) remains.

  • This works in standard SQL with a recursive CTE, but is not very efficient for large trees. A plpgsql function that recursively updates the sort path in a temporary table without creating temporary dupes might perform better.

  • Only works with the additional module ltree installed, which provides the functions subltree() and nlevel(), as well as the ltree data type.

My test setup, for review convenience:

CREATE TEMP TABLE tbl(votes int, path ltree);INSERT INTO tbl VALUES  (1, '1'), (2, '1.1'), (4, '1.2'), (1, '1.2.1'), (3, '2'), (1, '2.1'), (2, '2.1.1'), (4, '2.1.2'), (1, '2.1.3'), (2, '3'), (17, '3.3'), (99, '3.2'), (10, '3.1.1'), (2345, '3.1.2'), (1, '3.1.3');

PL/pgSQL table function doing the same

Should be faster with huge trees.

CREATE OR REPLACE FUNCTION f_sorted_ltree()  RETURNS TABLE(votes int, path ltree)  LANGUAGE plpgsql VOLATILE AS$func$DECLARE   lvl integer := 0;BEGIN   CREATE TEMP TABLE t ON COMMIT DROP AS   SELECT tbl.votes        , tbl.path        , ''::text AS sort        , nlevel(tbl.path) AS depth   FROM   tbl;   -- CREATE INDEX t_path_idx ON t (path);   -- beneficial for huge trees   -- CREATE INDEX t_path_idx ON t (depth);   LOOP      lvl := lvl + 1;      UPDATE t SET sort = t.sort || to_char(v.votes, 'FM0000000')      FROM  (         SELECT t2.votes, t2.path         FROM   t t2         WHERE  t2.depth = lvl         ) v      WHERE  v.path = subltree(t.path, 0 ,lvl);      EXIT WHEN NOT FOUND;   END LOOP;   -- Return sorted rows   RETURN QUERY   SELECT t.votes, t.path   FROM   t   ORDER  BY t.sort;END$func$;

Call:

SELECT * FROM f_sorted_ltree();

Read in the manual about setting temp_buffers.

I would be interested which performs faster with your real life data.


create table comments (  id serial,  parent_id int,  msg text,  primary key (id));insert into comments (id, parent_id, msg) values (1, null, 'msg 1');insert into comments (id, parent_id, msg) values (2, null, 'msg 2');insert into comments (id, parent_id, msg) values (3, 1, 'msg 1 / ans 1');insert into comments (id, parent_id, msg) values (4, null, 'msg 3');insert into comments (id, parent_id, msg) values (5, 2, 'msg 2 / ans 1');insert into comments (id, parent_id, msg) values (6, 2, 'msg 2 / ans 2');insert into comments (id, parent_id, msg) values (7, 2, 'msg 2 / ans 3');

desc

WITH RECURSIVE q AS(  SELECT  id, msg, 1 as level, ARRAY[id] as path  FROM  comments c  WHERE parent_id is null  UNION ALL  SELECT  sub.id, sub.msg, level + 1, path || sub.id  FROM  q    JOIN  comments sub      ON  sub.parent_id = q.id)SELECT  id, msg, levelFROM    qorder by path || array_fill(100500, ARRAY[8 - level]) desc;

results in

4,"msg 3",12,"msg 2",17,"msg 2 / ans 3",26,"msg 2 / ans 2",25,"msg 2 / ans 1",21,"msg 1",13,"msg 1 / ans 1",2

asc

WITH RECURSIVE q AS(  SELECT  id, msg, 1 as level, ARRAY[id] as path  FROM  comments c  WHERE parent_id is null  UNION ALL  SELECT  sub.id, sub.msg, level + 1, path || sub.id  FROM  q    JOIN  comments sub      ON  sub.parent_id = q.id)SELECT  id, msg, levelFROM    q--order by path || array_fill(100500, ARRAY[8 - level]) desc;order by path;

results in

1,"msg 1",13,"msg 1 / ans 1",22,"msg 2",15,"msg 2 / ans 1",26,"msg 2 / ans 2",27,"msg 2 / ans 3",24,"msg 3",1