PostgreSQL - tree organization PostgreSQL - tree organization postgresql postgresql

PostgreSQL - tree organization


retrieve category and its subcategories

If you only have a limited depth of subitems, you can do this using a self-join, eg. two levels deep:

SELECT *FROM categories AS childLEFT JOIN categories AS parent ON parent.id=child.parentLEFT JOIN categories AS grandparent ON grandparent.id=parent.parentWHERE child.id=(id) OR parent.id=(id) OR grandparent.id=(id);

You can't do this for an arbitrary-depth hierarchy using standard SQL over a ‘parent-id-foreign-key’ type schema.

Some DBMSs provide non-standard hierarchical tools that allow something like this in various ways, but if you want to stick to cross-DBMS-compatible code you'll need to rejig your schema to one of the better models of representing hierarchies. The two popular ones are:

  • Nested Set. Stores a linear ordering representing a depth-first search of the tree in two columns of the target table (one of which you'll already have if your target has explicit ordering).

  • Adjacency Relation. Stores each ancestor/descendent pair in a separate join table.

There are advantages and disadvantages to each approach, and numerous variants (eg. sparse nested set numbering, ‘distance’ in AR) which can affect how expensive various types of add/delete/move-position operations are. Personally I tend to gravitate towards a simplified nested set model by default as it contains less redundancy than AR.


I've been playing around with ltree, which is a PostgreSQL contrib module to see if it is a good fit for threaded comments. You create a column in your table that stores the path and create an ltree index on it.. You can then perform queries like this:

 ltreetest=# select path from test where path ~ '*.Astronomy.*';                     path                      ----------------------------------------------- Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts

I haven't played around with it enough to determine how well it performs with things like inserts, updates or deletes. I assume a delete would look like:

DELETE FROM test WHERE path ~ '*.Astronomy.*';

I'm thinking, a threaded comment table might look like:

CREATE SEQUENCE comment_id_seq  INCREMENT 1  MINVALUE 1  MAXVALUE 9223372036854775807  START 78616  CACHE 1;CREATE TABLE comments (comment_id int PRIMARY KEY,path ltree,comment text);CREATE INDEX comments_path_idx ON comments USING gist (path);

An insert would crudely (and untested-ly) look like:

CREATE FUNCTION busted_add_comment(text the_comment, int parent_comment_id) RETURNS void AS$BODY$DECLARE    INT _new_comment_id; -- our new comment_id    TEXT _parent_path;   -- the parent pathBEGIN    _new_comment_id := nextval('comment_id_seq'::regclass);    SELECT path INTO _parent_path FROM comments WHERE comment_id = parent_comment_id;    -- this is probably busted SQL, but you get the idea... this comment's path looks like    --   the.parent.path.US    --    -- eg (if parent_comment_id was 5 and our new comment_id is 43):    --  3.5.43    INSERT INTO comments (comment_id, comment, path) VALUES (_new_comment_id, the_comment, CONCAT(_parent_path, '.', _new_comment_id));END;$BODY$LANGUAGE 'plpgsql' VOLATILE;

Or something. Basically the path is just a hierarchy made up of all the primary keys.