SQL - How to store and navigate hierarchies?
I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
- nested set also known as modified preorder tree traversal algorithm
- adjacency list model
It's well explained here:
- http://www.sitepoint.com/article/hierarchical-data-database
- Managing Hierarchical Data in MySQL
- http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
- http://en.wikipedia.org/wiki/Tree_%28data_structure%29
- http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
nested set
- http://www.sqlsummit.com/AdjacencyList.htm
- http://www.edutech.ch/contribution/nstrees/index.php
- http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
- http://www.dbmsmag.com/9604d06.html
- http://en.wikipedia.org/wiki/Tree_traversal
- http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
Classes :
Nested Sets DB Tree Adodb
Visitation Model ADOdb
PEAR::DB_NestedSet
- http://pear.php.net/package/DB_NestedSet
- utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
- http://pear.php.net/package/Tree/download/0.3.0/
- http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here
What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (person_id integer autoincrement primary key,name varchar(255) not null,dob date,mother integer,father integer);
And inserting data like this:
person_id name dob mother father 1 Pops 1900/1/1 null null 2 Grandma 1903/2/4 null null 3 Dad 1925/4/2 2 1 4 Uncle Kev 1927/3/3 2 15 Cuz Dave 1953/7/8 null 46 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (person_id integer autoincrement primary key,name varchar(255) not null,dob date);create table ancestor (ancestor_id integer,descendant_id integer,distance integer);
Data is created like this:
person_id name dob 1 Pops 1900/1/1 2 Grandma 1903/2/4 3 Dad 1925/4/2 4 Uncle Kev 1927/3/35 Cuz Dave 1953/7/8 6 Billy 1954/8/1 ancestor_id descendant_id distance1 1 02 2 03 3 04 4 05 5 06 6 01 3 12 3 11 4 12 4 11 5 22 5 24 5 11 6 22 6 23 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in (select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in (select descendant_id from ancestor where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor where distance=1 and ancestor_id in (select ancestor_id from ancestor where distance=2 and not exists (select ancestor_id from ancestor where distance=1 and ancestor_id=uncle) )
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.