Help:Tree Structures
Comparisons: Hierarchical Data Storage Technique Comparisons
http://www.evoio.org/wiki/Phylotastic/Datastore
http://www.dbmonster.com/Uwe/Forum.aspx/db-theory/986/Nested-Sets-vs-Nested-Intervals
Options for Storing Hierarchical Data in a Relational Database: http://www.cnblogs.com/ttltry-air/archive/2012/08/10/2633164.html
Trees in the Database: Advanced Data Structures: http://www.alberton.info/talks/show/id/14
Modeling trees with Nested Sets and Nested Intervals: http://www.postgresql.org/message-id/200604070028.15476.db@kavod.com
Using Arrays as Materialized Paths in Postgres: http://justcramer.com/2012/04/08/using-arrays-as-materialized-paths-in-postgres/
Model Tree Structures in MongoDB: http://docs.mongodb.org/manual/tutorial/model-tree-structures/
http://en.wikipedia.org/wiki/Trie
http://www.evoio.org/wiki/Phylotastic/Datastore
PostgreSQL Tree Structure and Recursion: http://stackoverflow.com/questions/13184334/tree-structure-and-recursion
Create large persistent data structures that also persists on disk automatically.: https://github.com/ningke/PersistDS
ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of any type of hierarchical trees.: https://github.com/jhcepas/ete
(python) A simple, demonstrative implementation of trees of nodes.: https://github.com/ejones/nodetree
https://github.com/topher200/data_structures/blob/master/tree.py
Python program that finds the path associated to a rational number in the Calkin-Wilf tree.: https://github.com/bharathdandala/Calkin-Wilf-Tree
ruby: https://github.com/the-teacher/the_sortable_tree
Tree Implementation in python: simple to use for you.: https://github.com/caesar0301/pyTree
A Python 3 implementation of Tree data structure: https://github.com/yoyzhou/pyTree
A list-derived tree data structure library for Python.: https://github.com/errantlinguist/PyTree
A tree structure for Mongoid documents using the materialized path pattern: https://github.com/benedikt/mongoid-tree
http://search.cpan.org/~mlawren/SQL-Tree-0.04/bin/sqltree
http://www.depesz.com/2008/04/11/my-take-on-trees-in-sql/
http://search.cpan.org/~mlawren/SQL-Tree-0.04/lib/SQL/Tree.pm
Trees in SQL : an approach based on materialized paths and normalization for MySQL: http://www.cloudconnected.fr/2009/05/26/trees-in-sql-an-approach-based-on-materialized-paths-and-normalization-for-mysql/
http://www.postgresql.org/docs/9.2/static/ltree.html
Nested Sets design AKA "Modified Pre-order Tree Traversal" (left and right columns for numbering):
Nested Sets AKA Modified Pre-order Tree Traversal
- Columns: Left, Right
- First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties
- - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
- - Finding ancestors is easy but inefficient
- - Finding node's children as all the descendants restricted to the next level is inefficient (e.g. consider root node)
- - Finding node's parent as ancestor on the previous level is inefficient due to inefficiency of ancestors search
- - Aggregate queries are relatively slow (except counting, which is super fast)!
- - Tree reorg is hard
- Cheap level, ancestry, descendants
- Compared to Adjacency List, moves, inserts, deletes more expensive.
- Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
- easy to retrieve data, hard to add data
- Moving nodes is difficult because it requires renumbering all other nodes to the right (can use -number trick to speed up a bit)
- Efficient to retrieve a fully ordered hierarchical tree, Typically, a read/traverse/retrieve operation requires a single simple SQL statement
- Model very well adapted to implement tree structures with databases (whatever that means... indexing.. views?)
- Allows for compact and elegant implementations with nice code symmetries
- Inefficient model to modify tree structure. Typically, an insert/reorder operation requires an SQL statement that modifies n/2 nodes at the average, or n nodes in the worst case (n = # nodes in a tree)
- Non-intuitive database values. A simple db query of a table doesn't intuitively reveal the tree structure. As a consequence, nested set trees need to be handled and interpreted programmatically
Nested Intervals AKA Nested Sets using Indexed Rational Numbers AKA Matrix Encoding AKA Farey Fraction Encoding:
Nested Intervals AKA Matrix Encoding
- Columns: Left/Right as floating point numbers
- Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
- - Same as MP: Finding descendants is easy and and relatively efficient (i.e. proportional to the size of the subtree)
- - Same as MP: Finding ancestors is tricky but efficient
- - Same as AR: Finding node's children is trivial
- - Same as AR: Finding node's parent is trivial
- - All aggregate queries are relatively slow
- - Tree reorg is easy (but not as simple as in AR)
Adjacency Relation AKA Adjacency List Design AKA Recursion (columns like parent_id and level):
Adjacency List Design AKA Recursion
- Columns: ID, ParentID
- - Have to use proprietory SQL extensions for finding ancestors and descendants; although the queries are efficient
- - Finding descendants is relatively efficient (i.e. proportional to the size of the subtree)
- - Finding node's children is trivial
- - Finding node's parent is trivial
- - Aggregate queries are relatively slow
- - Tree reorg is very simple
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).
- Use Common Table Expressions in those databases that support them to traverse.
- hard to retrieve data, easy to add data
- you have both an ID column and a Parent ID column in a table. This is easy to understand and maintain but can be difficult or inefficient when you want to retrieve hierarchies of records.
- hard to retrieve a fully ordered hierarchical tree
- more-intuitive database values (parent id, depth)
Materialized Path AKA Legal Numbering AKA Lineage Column:
- Column: lineage (e.g. /parent/child/grandchild/etc...)
- - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
- - Finding ancestors is tricky but efficient
- - Finding node's children as descendants on next level is inefficient
- - Finding node's parent as ancestor on the previous level is efficient
- - All aggregate queries are relatively slow
- - Tree reorg is easy
- Limit to how deep the hierarchy can be.
- Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
- Ancestry tricky (database specific queries)
Closure Table (aka "Bridge Table" aka "Adjacency Relation"):
- Columns: ancestor, descendant
- Stands apart from table it describes.
- Can include some nodes in more than one hierarchy.
- Cheap ancestry and descendants (albeit not in what order)
- For complete knowledge of a hierarchy needs to be combined with another option.
Flat Table Design
- Columns: ID, ParentID, Level/Rank
- A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Expensive move and delete
- Cheap ancestry and descendants
- Good Use: threaded discussion - forums / blog comments
XML DB's:
http://en.wikipedia.org/wiki/Sedna_%28database%29
http://xml.apache.org/xindice/
NoSQL:
http://redis.io/topics/introduction
http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Top DBA Shell Scripts for Monitoring the Database: https://communities.bmc.com/communities/docs/DOC-9942
IMS: https://communities.bmc.com/communities/docs/DOC-9908
Efficient lineage tracking for scientific workflows: http://portal.acm.org/citation.cfm?doid=1376616.1376716