How To Develop A Database Schema For A Tree Structure(Directed acyclic graph) How To Develop A Database Schema For A Tree Structure(Directed acyclic graph) php php

How To Develop A Database Schema For A Tree Structure(Directed acyclic graph)


Your db structure doesn't seem to be a tree,it's just graph.

I suggest you to throw out relational database for this structure and take look at some of Graph Database like Neo4j ,OrientDB and Infinite Graph.

But if your forced to use MySQL you can use FlockDB which can be used to traverse MySQL Node(see row as node) but with some limitation. or you can test another MySQL engine like OQGRAPH which provide graph engine for MySQL and MariaDB.


Let me restate the question just to make sure I understand it right. You have Nodes and two kind of relations - arrows and vertical lines. Then given a Node N, you want to generate a subset S(N) of Nodes with the following recursive rules:

  • Rule 0: N is in S(N).
  • Rule 1: If Node M is connected with N via vertical relation, it is also in S(N).
  • Rule 2: If Node M is in S(N), then any Node with arrow pointing to M is also in S(N).

The set S(N) is the minimal set of Nodes satisfying those rules.

The example your give with N being Y, seem to confirm these rules.

However, there is another different but (at least to me) more natural set of Rules,where Rule 1 above is replaced by

  • Rule 1': If Node M is in S(N), then any Node connected with M via vertical relation is also in S(N).

In the sequel I will assume Rules 0,1,2 confirmed by your example, but similar approach can be used for Rules 0,1',2 or any modification.


Also I understand there is a mistake in your table in row 7, it should be:

7    B2    B    B1,B3 (not B2,B3)

Now to the proposed solution.

I would first slightly modify your table structure:Since "id" is your primary key, the rule for a foreign key is to point to the primary key of the related entry. That is, in your case, I would replace "node_id" by "node_name" or something like that just not to confuse with "id", and replace entries of "node_parent_id" and "cross_ref" by their "id"s. For instance, the row number 15 would look like that:

15    'Y'    [13]    [14,16]

Alternatively, if you prefer for readability reasons, you can use A, B, X, Y etc. as primary keys, provided they are unique, of course, then your table would remain the same, with exception of the "id" field that is not needed. I will assume the first case, but you can adapt it to the second with simple replacement.

That is all you need, as far as the table goes.


Now you need a recursive function to generate the subgraph S(N) for each given Node N.

I will implement the set S as array $set of all 'id's of its Nodes.Then I will define two functions - one to expand the set initially based on Rules 1,2and the other to expand the set subsequently based on Rule 2 only.

For simplicity, I will assume your table is imported into memory as associative array $rows of rows, such that $rows[$id] represents the row with 'id' equal to $id as, again, associative array. So

$rows[15] = array('id'=>15,                   'node_name'=>'Y',                   'node_parent_id'=>array(13),                   'cross_ref'=>array(14,16));

Here is the code for the functions:

function initial_expand_set ($node_id) {    global($rows); // to access the array from outside    $set = array($node_id);    // Rule 0    $row = $rows[$node_id];    // Get current Node    $parents = $row['node_parent_id'];  // Get parents of the Node    $set = $parents ? array_merge ($set, $parents) : $set;   // Join parents to the set    $vert_brothers = $row['cross_ref'];  // Get vertical relations    $set = $vert_brothers ? array_merge ($set, $vert_brothers) : $set;    $set = expand_set($set);  // Recursive function defined below    return $set;}

And the recursive function:

// iterate over nodes inside the set, expand each node, and merge all togetherfunction expand_set (array $set) {    global($rows);    $expansion = $set;  //  Initially $expansion is the same as $set    foreach ($set as $node_id) {        $row = $rows[$node_id];    // Get Node         // Get the set of parents of the Node - this is our new set        $parents = $row['node_parent_id'];  // Get parents of the Node        // apply the recursion        $parents_expanded = expand_set($parents);        // merge with previous expansion        $expansion = array_merge($expansion, $parents_expanded);    }    // after the loop is finished getting rid of redundant entries    // array_unique generates associative array, for which I only collect values    $expansion = array_values( array_unique($expansion) );    return $expansion;}

Hope it works for you. :)

If any further details or explanations needed, I will be happy to assist.

PS. For the pedants among readers note that I use space before '(' for function definition and no space for function calls, as recommended by Douglas Crokford.


Your database structure is not normalised, because you have multiple ids in both node_parent_id and cross_refer. You should separate this information out into separate tables.

So, you would have your nodes table, plus a second table to describe the parent-child relationships; this table would have a child node id and a parent node id.

The cross-references should be in a third table, which again has two node id columns, but there are two ways you can do this, because the cross-references are bi-directional. One way is to store each cross-reference only once, which means when you query the table, you have to check both possibilities (a cross reference between X and Y could be stored with X in the first column and Y in the second column, or the other way around, so to find X you would have to check both columns). The other way is to store each cross-reference twice, once for each direction. This makes querying simpler but it is storing redundant data and can lead to inconsistencies, e.g. if for some reason one reference is deleted but the other is not.

Finding paths with this structure becomes a much simpler matter because you don't have to additionally parse comma-separated strings, which is both more complex and less efficient.

You can also use this to ensure referential integrity, so that, for example, a node doesn't have a parent id that doesn't actually exist in the database.

For more information, research "database normalisation". (Or you can spell it with a "z" if you are so inclined ;-P)