Sorting tree with a materialized path?
I believe your materialized path is not right.
What logic do you get to sort things like this
11.211.5
Why is the second 1 not together with the first one?
If you had
11.222.5
This would be trivial.
EDIT:I have looked at your example and you are not storing materialized path of a row, but you are storing a materialized path of the parent row.Here's how the materialized path of the row actually should look like. Sorting directly on matpath would work if you would not have more than 9 branches if you stored it as:
id | parent_id | matpath | created----+-----------+-----------+---------------------------- 2 | 1 | 1.2 | 2010-05-08 15:18:37.987544 6 | 2 | 1.2.6 | 2010-05-08 17:50:43.288759 8 | 6 | 1.2.6.8 | 2010-05-09 14:01:17.632695 3 | 1 | 1.3 | 2010-05-08 17:38:14.125377 4 | 1 | 1.4 | 2010-05-08 17:38:57.26743 5 | 1 | 1.5 | 2010-05-08 17:43:28.211708 9 | 5 | 1.5.9 | 2010-05-09 14:02:43.818646 7 | 1 | 1.7 | 2010-05-08 18:18:11.849735
otherwise (>9) you would have to turn the matpath
into something like
001.002.006001.002.006.008
that would support up to 999 branches.
Please note
- even the approach with 4 fixed digits, such as
0001.0002.0006
would give you a field that is shorter then in the accepted answer - you could parse matpath an produce sorting value on the fly with a user function
- you could directly store matpath in this format (it has some other nice properties, too)
I typically create an additional columnn for this, called something like SortPath
. It would contain the data that you need to sort by, concatenated together. That column would be of type varchar
, and get sorted as a string. Something like this:
id | parent_id | matpath | created | sortpath---+-----------+---------+-----------------------------+-------------------------------------------------------------------------------------- 2 | 1 | 1 | 2010-05-08 15:18:37.987544 | 2010-05-08 15:18:37.987544-2 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6.2010-05-09 14:01:17.632695-8 3 | 1 | 1 | 2010-05-08 17:38:14.125377 | 2010-05-08 17:38:14.125377-3 4 | 1 | 1 | 2010-05-08 17:38:57.26743 | 2010-05-08 17:38:57.267430-4 5 | 1 | 1 | 2010-05-08 17:43:28.211708 | 2010-05-08 17:43:28.211708-5 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 | 2010-05-08 17:43:28.211708-5.2010-05-09 14:02:43.818646-9 7 | 1 | 1 | 2010-05-08 18:18:11.849735 | 2010-05-08 18:18:11.849735-7
A couple of things to note here:
sortpath
will be sorted as a string, so it is important all dates have the same length for it to correctly sort. E.g., observe how2010-05-08 17:38:57.26743
has an extra zero added in thesortpath
column.- I have appended the PK of each node to the end of its date. This is so that if you happen to have two rows with the exact same date, they will always get returned in the same order due to the additional data we are appending.
- To me, the data looks asymmetrical the way I have written it, because we are showing the current node's date in
sortpath
, but it is not inmatpath
. I would prefer to see it in both. - You may want to put the date of node ID 1 at the beginning of each
sortcolumn
as well. This is so that if you ever want to query for more than one forum at a time (you probably won't), then it will still sort correctly.
I'm not sure I understand why the accepted solution makes any sense at all. It works, but it is even less normalized and less efficient (more disk space, more indexes, etc) than @Unreason's solution (to just pad the ID's in the materialized path).
The whole scenario that the OP faces seems to stem from the fact that, as @Unreason correctly points out, the implementation of the materialized path (MP) is incorrect. The OP has provided the MP to the parent, not to the current node. In the accepted solution the SortPath
column corrects this by providing the materialized path to the current node (this time using dates -- why? -- instead of the primary key).
For reference consider the following excerpt:
Materialized Path
In this approach each record stores the whole path to the root. In our previous example, lets assume that KING is a root node. Then, the record with ename = 'SCOTT' is connected to the root via the path SCOTT->JONES->KING. Modern databases allow representing a list of nodes as a single value, but since materialized path has been invented long before then, the convention stuck to plain character string of nodes concatenated with some separator; most often '.' or '/'.