Sorting tree with a materialized path? Sorting tree with a materialized path? postgresql postgresql

Sorting tree with a materialized path?


I believe your materialized path is not right.

What logic do you get to sort things like this

11.211.5

Why is the second 1 not together with the first one?

If you had

11.222.5

This would be trivial.

EDIT:I have looked at your example and you are not storing materialized path of a row, but you are storing a materialized path of the parent row.Here's how the materialized path of the row actually should look like. Sorting directly on matpath would work if you would not have more than 9 branches if you stored it as:

 id | parent_id | matpath   |          created----+-----------+-----------+----------------------------  2 |         1 | 1.2       | 2010-05-08 15:18:37.987544  6 |         2 | 1.2.6     | 2010-05-08 17:50:43.288759  8 |         6 | 1.2.6.8   | 2010-05-09 14:01:17.632695  3 |         1 | 1.3       | 2010-05-08 17:38:14.125377  4 |         1 | 1.4       | 2010-05-08 17:38:57.26743  5 |         1 | 1.5       | 2010-05-08 17:43:28.211708  9 |         5 | 1.5.9     | 2010-05-09 14:02:43.818646  7 |         1 | 1.7       | 2010-05-08 18:18:11.849735

otherwise (>9) you would have to turn the matpath into something like

001.002.006001.002.006.008

that would support up to 999 branches.

Please note

  • even the approach with 4 fixed digits, such as 0001.0002.0006 would give you a field that is shorter then in the accepted answer
  • you could parse matpath an produce sorting value on the fly with a user function
  • you could directly store matpath in this format (it has some other nice properties, too)


I typically create an additional columnn for this, called something like SortPath. It would contain the data that you need to sort by, concatenated together. That column would be of type varchar, and get sorted as a string. Something like this:

id | parent_id | matpath |          created            |                   sortpath---+-----------+---------+-----------------------------+-------------------------------------------------------------------------------------- 2 |         1 | 1       | 2010-05-08 15:18:37.987544  | 2010-05-08 15:18:37.987544-2 6 |         2 | 1.2     | 2010-05-08 17:50:43.288759  | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6 8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695  | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6.2010-05-09 14:01:17.632695-8 3 |         1 | 1       | 2010-05-08 17:38:14.125377  | 2010-05-08 17:38:14.125377-3 4 |         1 | 1       | 2010-05-08 17:38:57.26743   | 2010-05-08 17:38:57.267430-4  5 |         1 | 1       | 2010-05-08 17:43:28.211708  | 2010-05-08 17:43:28.211708-5 9 |         5 | 1.5     | 2010-05-09 14:02:43.818646  | 2010-05-08 17:43:28.211708-5.2010-05-09 14:02:43.818646-9 7 |         1 | 1       | 2010-05-08 18:18:11.849735  | 2010-05-08 18:18:11.849735-7

A couple of things to note here:

  • sortpath will be sorted as a string, so it is important all dates have the same length for it to correctly sort. E.g., observe how 2010-05-08 17:38:57.26743 has an extra zero added in the sortpath column.
  • I have appended the PK of each node to the end of its date. This is so that if you happen to have two rows with the exact same date, they will always get returned in the same order due to the additional data we are appending.
  • To me, the data looks asymmetrical the way I have written it, because we are showing the current node's date in sortpath, but it is not in matpath. I would prefer to see it in both.
  • You may want to put the date of node ID 1 at the beginning of each sortcolumn as well. This is so that if you ever want to query for more than one forum at a time (you probably won't), then it will still sort correctly.


I'm not sure I understand why the accepted solution makes any sense at all. It works, but it is even less normalized and less efficient (more disk space, more indexes, etc) than @Unreason's solution (to just pad the ID's in the materialized path).

The whole scenario that the OP faces seems to stem from the fact that, as @Unreason correctly points out, the implementation of the materialized path (MP) is incorrect. The OP has provided the MP to the parent, not to the current node. In the accepted solution the SortPath column corrects this by providing the materialized path to the current node (this time using dates -- why? -- instead of the primary key).

For reference consider the following excerpt:

Materialized Path

In this approach each record stores the whole path to the root. In our previous example, lets assume that KING is a root node. Then, the record with ename = 'SCOTT' is connected to the root via the path SCOTT->JONES->KING. Modern databases allow representing a list of nodes as a single value, but since materialized path has been invented long before then, the convention stuck to plain character string of nodes concatenated with some separator; most often '.' or '/'.