How to store Linked Tree with children in Relational Database? How to store Linked Tree with children in Relational Database? database database

How to store Linked Tree with children in Relational Database?


i would've commented asking for more information (particularly sample data & the rules of your hierarchy) so forgive this first cut for being overly general

i've made the following assumptions:

  • that your structure is greater than three levels deep - if this is not the case then i wouldn't do it this way, because it wouldn't be worth it

  • your load is read-heavy and writes to portions of the tree are concurrent but not conflicting, while conflicting writes are rare or non-existent

  • you don't require parallel, partial and shared-nothing or lock-free access to the tree to build or process it (in this case you need to signal deletes, which you could do by pointing to the node you replaced ie. superseded)

my proposed data model would look something like:

create table treemodel ( `node` int not null  , `parent` int not null  , `principal` int not null  , `state` smallint unsigned not null  , ... , `supersedes` int    /*version instead of lossy update or delete*/ , `supersededby` int) engine = innodb;alter table treemodel add primary key (`principal`, `node`) using btree; 
  1. the treemodel table would hold structural identifiers only: i would hold node data in a separate table but i would not join to get at it, rather, i would perform a second select ... where node in (...) - this is basically saying that 'the structure of my data is independent of my data'

  2. this model is designed to limit the number of round-trips to the database and may seem counter-intuitive, but will allow you to read or lock portions of the tree in a single database instruction with no joins

  3. this would run contrary to your data model, as you don't store an additional 'principal' node with your nested children - but if you can change this structure then you could take advantage of this proposal to avoid queries within loops, ie. multiple selects, or reentrant querying or self/ unary joins

  4. ... but your business logic would need to support the notion of what i have called the 'principal' node

  5. it's up to your use case as to what gets put in the principal node - i have used this model to store the causal relationship between observation records and their derivations, regardless of the parent child relationships below this point - another example might be: 1) a client raises a support case, or 2) a new message is sent, or 3) a new file is created, ...

  6. it would make no sense to store the principal as the actual root node in your tree structure (ie. node id '1', or 'data directory', or whatever) - instead you would store the 'next level down' for argument's sake, ie. 'user root directory' or 'first level below user root directory' - this is where knowing your use case & business rules will help

  7. ... and your java code will need to be updated to find or copy and store this concept - it will always be a copy from the parent node on any insert within a given branch of the tree - and if you are moving a branch and your logic requires a change to this number it is an update ... set principal=y where principal=x and update ... set parent=newid where node=current_principal

  8. ... and having said all of that, i wouldn't update the rows per se, only insert on change and recreate the whole tree branch (which explains the state field, ie. CURRENT, DELETED, ... where the root node of a deleted branch still points to its current parent node, eg. for 'deleted items')

  9. you still keep your pointers to each adjacent node in prev & next - ordered inserts of nodes into a branch of the tree would, in the worst case, require a select ... where principal=x for update, but probably only a select ... where (node=x or prev=x or next=x) for update

edits:

  • primary key/ clustered index must be unique - you could also partition on principal to ensure fast parallel access

  • recreating instead of updating branches


I had same requirements a year ago and then I explored a lot of options. Then I decided to go with graph database. Try out Neo4j, a graph database.