Help:Tree Structures

Comparisons: Hierarchical Data Storage Technique Comparisons

Main Source: http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database

http://www.evoio.org/wiki/Phylotastic/Datastore

http://www.dbmonster.com/Uwe/Forum.aspx/db-theory/986/Nested-Sets-vs-Nested-Intervals

Options for Storing Hierarchical Data in a Relational Database: http://www.cnblogs.com/ttltry-air/archive/2012/08/10/2633164.html

Trees in the Database: Advanced Data Structures: http://www.alberton.info/talks/show/id/14

Modeling trees with Nested Sets and Nested Intervals: http://www.postgresql.org/message-id/200604070028.15476.db@kavod.com

Using Arrays as Materialized Paths in Postgres: http://justcramer.com/2012/04/08/using-arrays-as-materialized-paths-in-postgres/

Model Tree Structures in MongoDB: http://docs.mongodb.org/manual/tutorial/model-tree-structures/

http://en.wikipedia.org/wiki/Trie

http://www.evoio.org/wiki/Phylotastic/Datastore

PostgreSQL Tree Structure and Recursion: http://stackoverflow.com/questions/13184334/tree-structure-and-recursion

Create large persistent data structures that also persists on disk automatically.: https://github.com/ningke/PersistDS

ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of any type of hierarchical trees.: https://github.com/jhcepas/ete

(python) A simple, demonstrative implementation of trees of nodes.: https://github.com/ejones/nodetree

https://github.com/topher200/data_structures/blob/master/tree.py

Python program that finds the path associated to a rational number in the Calkin-Wilf tree.: https://github.com/bharathdandala/Calkin-Wilf-Tree

ruby: https://github.com/the-teacher/the_sortable_tree

Tree Implementation in python: simple to use for you.: https://github.com/caesar0301/pyTree

A Python 3 implementation of Tree data structure: https://github.com/yoyzhou/pyTree

A list-derived tree data structure library for Python.: https://github.com/errantlinguist/PyTree

A tree structure for Mongoid documents using the materialized path pattern: https://github.com/benedikt/mongoid-tree

Trigger Trees

http://search.cpan.org/~mlawren/SQL-Tree-0.04/bin/sqltree

http://www.depesz.com/2008/04/11/my-take-on-trees-in-sql/

http://search.cpan.org/~mlawren/SQL-Tree-0.04/lib/SQL/Tree.pm

Trees in SQL : an approach based on materialized paths and normalization for MySQL: http://www.cloudconnected.fr/2009/05/26/trees-in-sql-an-approach-based-on-materialized-paths-and-normalization-for-mysql/

http://www.postgresql.org/docs/9.2/static/ltree.html

Nested Sets design AKA "Modified Pre-order Tree Traversal" (left and right columns for numbering):

Nested Sets AKA Modified Pre-order Tree Traversal


 * Columns: Left, Right


 * First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties


 * - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)


 * - Finding ancestors is easy but inefficient


 * - Finding node's children as all the descendants restricted to the next level is inefficient (e.g. consider root node)


 * - Finding node's parent as ancestor on the previous level is inefficient due to inefficiency of ancestors search


 * - Aggregate queries are relatively slow (except counting, which is super fast)!


 * - Tree reorg is hard


 * Cheap level, ancestry, descendants


 * Compared to Adjacency List, moves, inserts, deletes more expensive.


 * Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.


 * easy to retrieve data, hard to add data


 * Moving nodes is difficult because it requires renumbering all other nodes to the right (can use -number trick to speed up a bit)


 * Efficient to retrieve a fully ordered hierarchical tree, Typically, a read/traverse/retrieve operation requires a single simple SQL statement


 * Model very well adapted to implement tree structures with databases (whatever that means... indexing.. views?)


 * Allows for compact and elegant implementations with nice code symmetries


 * Inefficient model to modify tree structure. Typically, an insert/reorder operation requires an SQL statement that modifies n/2 nodes at the average, or n nodes in the worst case (n = # nodes in a tree)


 * Non-intuitive database values. A simple db query of a table doesn't intuitively reveal the tree structure. As a consequence, nested set trees need to be handled and interpreted programmatically

Nested Intervals AKA Nested Sets using Indexed Rational Numbers AKA Matrix Encoding AKA Farey Fraction Encoding:

Nested Intervals AKA Matrix Encoding


 * Columns: Left/Right as floating point numbers


 * Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.


 * - Same as MP: Finding descendants is easy and and relatively efficient (i.e. proportional to the size of the subtree)


 * - Same as MP: Finding ancestors is tricky but efficient


 * - Same as AR: Finding node's children is trivial


 * - Same as AR: Finding node's parent is trivial


 * - All aggregate queries are relatively slow


 * - Tree reorg is easy (but not as simple as in AR)

Adjacency Relation AKA Adjacency List Design AKA Recursion (columns like parent_id and level):

Adjacency List Design AKA Recursion


 * Columns: ID, ParentID


 * - Have to use proprietory SQL extensions for finding ancestors and descendants; although the queries are efficient


 * - Finding descendants is relatively efficient (i.e. proportional to the size of the subtree)


 * - Finding node's children is trivial


 * - Finding node's parent is trivial


 * - Aggregate queries are relatively slow


 * - Tree reorg is very simple


 * Easy to implement.


 * Cheap node moves, inserts, and deletes.


 * Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).


 * Use Common Table Expressions in those databases that support them to traverse.


 * hard to retrieve data, easy to add data


 * you have both an ID column and a Parent ID column in a table. This is easy to understand and maintain but can be difficult or inefficient when you want to retrieve hierarchies of records.


 * hard to retrieve a fully ordered hierarchical tree


 * more-intuitive database values (parent id, depth)

Materialized Path AKA Legal Numbering AKA Lineage Column:

Materialized Path


 * Column: lineage (e.g. /parent/child/grandchild/etc...)


 * - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)


 * - Finding ancestors is tricky but efficient


 * - Finding node's children as descendants on next level is inefficient


 * - Finding node's parent as ancestor on the previous level is efficient


 * - All aggregate queries are relatively slow


 * - Tree reorg is easy


 * Limit to how deep the hierarchy can be.


 * Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')


 * Ancestry tricky (database specific queries)

Closure Table (aka "Bridge Table" aka "Adjacency Relation"):

Closure Table


 * Columns: ancestor, descendant


 * Stands apart from table it describes.


 * Can include some nodes in more than one hierarchy.


 * Cheap ancestry and descendants (albeit not in what order)


 * For complete knowledge of a hierarchy needs to be combined with another option.

Flat Table Design

Flat Table


 * Columns: ID, ParentID, Level/Rank


 * A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.


 * Expensive move and delete


 * Cheap ancestry and descendants


 * Good Use: threaded discussion - forums / blog comments

XML DB's:

http://basex.org/

http://en.wikipedia.org/wiki/Sedna_%28database%29

http://xml.apache.org/xindice/

NoSQL:

http://redis.io/topics/introduction

http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

Top DBA Shell Scripts for Monitoring the Database: https://communities.bmc.com/communities/docs/DOC-9942

IMS: https://communities.bmc.com/communities/docs/DOC-9908 Efficient lineage tracking for scientific workflows: http://portal.acm.org/citation.cfm?doid=1376616.1376716