Help:Tree Structures

From 911datasets
Jump to: navigation, search

Comparisons: Hierarchical Data Storage Technique Comparisons

Main Source: http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database

http://www.evoio.org/wiki/Phylotastic/Datastore

http://www.dbmonster.com/Uwe/Forum.aspx/db-theory/986/Nested-Sets-vs-Nested-Intervals

Options for Storing Hierarchical Data in a Relational Database: http://www.cnblogs.com/ttltry-air/archive/2012/08/10/2633164.html

Trees in the Database: Advanced Data Structures: http://www.alberton.info/talks/show/id/14

Modeling trees with Nested Sets and Nested Intervals: http://www.postgresql.org/message-id/200604070028.15476.db@kavod.com

Using Arrays as Materialized Paths in Postgres: http://justcramer.com/2012/04/08/using-arrays-as-materialized-paths-in-postgres/

Model Tree Structures in MongoDB: http://docs.mongodb.org/manual/tutorial/model-tree-structures/

http://en.wikipedia.org/wiki/Trie

http://www.evoio.org/wiki/Phylotastic/Datastore

PostgreSQL Tree Structure and Recursion: http://stackoverflow.com/questions/13184334/tree-structure-and-recursion

Create large persistent data structures that also persists on disk automatically.: https://github.com/ningke/PersistDS

ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of any type of hierarchical trees.: https://github.com/jhcepas/ete

(python) A simple, demonstrative implementation of trees of nodes.: https://github.com/ejones/nodetree

https://github.com/topher200/data_structures/blob/master/tree.py

Python program that finds the path associated to a rational number in the Calkin-Wilf tree.: https://github.com/bharathdandala/Calkin-Wilf-Tree

ruby: https://github.com/the-teacher/the_sortable_tree

Tree Implementation in python: simple to use for you.: https://github.com/caesar0301/pyTree

A Python 3 implementation of Tree data structure: https://github.com/yoyzhou/pyTree

A list-derived tree data structure library for Python.: https://github.com/errantlinguist/PyTree

A tree structure for Mongoid documents using the materialized path pattern: https://github.com/benedikt/mongoid-tree


Trigger Trees

http://search.cpan.org/~mlawren/SQL-Tree-0.04/bin/sqltree

http://www.depesz.com/2008/04/11/my-take-on-trees-in-sql/

http://search.cpan.org/~mlawren/SQL-Tree-0.04/lib/SQL/Tree.pm

Trees in SQL : an approach based on materialized paths and normalization for MySQL: http://www.cloudconnected.fr/2009/05/26/trees-in-sql-an-approach-based-on-materialized-paths-and-normalization-for-mysql/

http://www.postgresql.org/docs/9.2/static/ltree.html


Nested Sets design AKA "Modified Pre-order Tree Traversal" (left and right columns for numbering):

Nested Sets AKA Modified Pre-order Tree Traversal

  • Columns: Left, Right
  • First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties
  • - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding ancestors is easy but inefficient
  • - Finding node's children as all the descendants restricted to the next level is inefficient (e.g. consider root node)
  • - Finding node's parent as ancestor on the previous level is inefficient due to inefficiency of ancestors search
  • - Aggregate queries are relatively slow (except counting, which is super fast)!
  • - Tree reorg is hard
  • Cheap level, ancestry, descendants
  • Compared to Adjacency List, moves, inserts, deletes more expensive.
  • Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
  • easy to retrieve data, hard to add data
  • Moving nodes is difficult because it requires renumbering all other nodes to the right (can use -number trick to speed up a bit)
  • Efficient to retrieve a fully ordered hierarchical tree, Typically, a read/traverse/retrieve operation requires a single simple SQL statement
  • Model very well adapted to implement tree structures with databases (whatever that means... indexing.. views?)
  • Allows for compact and elegant implementations with nice code symmetries
  • Inefficient model to modify tree structure. Typically, an insert/reorder operation requires an SQL statement that modifies n/2 nodes at the average, or n nodes in the worst case (n = # nodes in a tree)
  • Non-intuitive database values. A simple db query of a table doesn't intuitively reveal the tree structure. As a consequence, nested set trees need to be handled and interpreted programmatically


Nested Intervals AKA Nested Sets using Indexed Rational Numbers AKA Matrix Encoding AKA Farey Fraction Encoding:

Nested Intervals AKA Matrix Encoding

  • Columns: Left/Right as floating point numbers
  • Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
  • - Same as MP: Finding descendants is easy and and relatively efficient (i.e. proportional to the size of the subtree)
  • - Same as MP: Finding ancestors is tricky but efficient
  • - Same as AR: Finding node's children is trivial
  • - Same as AR: Finding node's parent is trivial
  • - All aggregate queries are relatively slow
  • - Tree reorg is easy (but not as simple as in AR)



Adjacency Relation AKA Adjacency List Design AKA Recursion (columns like parent_id and level):

Adjacency List Design AKA Recursion

  • Columns: ID, ParentID
  • - Have to use proprietory SQL extensions for finding ancestors and descendants; although the queries are efficient
  • - Finding descendants is relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding node's children is trivial
  • - Finding node's parent is trivial
  • - Aggregate queries are relatively slow
  • - Tree reorg is very simple
  • Easy to implement.
  • Cheap node moves, inserts, and deletes.
  • Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).
  • Use Common Table Expressions in those databases that support them to traverse.
  • hard to retrieve data, easy to add data
  • you have both an ID column and a Parent ID column in a table. This is easy to understand and maintain but can be difficult or inefficient when you want to retrieve hierarchies of records.
  • hard to retrieve a fully ordered hierarchical tree
  • more-intuitive database values (parent id, depth)


Materialized Path AKA Legal Numbering AKA Lineage Column:

Materialized Path

  • Column: lineage (e.g. /parent/child/grandchild/etc...)
  • - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding ancestors is tricky but efficient
  • - Finding node's children as descendants on next level is inefficient
  • - Finding node's parent as ancestor on the previous level is efficient
  • - All aggregate queries are relatively slow
  • - Tree reorg is easy
  • Limit to how deep the hierarchy can be.
  • Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
  • Ancestry tricky (database specific queries)


Closure Table (aka "Bridge Table" aka "Adjacency Relation"):

Closure Table

  • Columns: ancestor, descendant
  • Stands apart from table it describes.
  • Can include some nodes in more than one hierarchy.
  • Cheap ancestry and descendants (albeit not in what order)
  • For complete knowledge of a hierarchy needs to be combined with another option.



Flat Table Design

Flat Table

  • Columns: ID, ParentID, Level/Rank
  • A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
  • Expensive move and delete
  • Cheap ancestry and descendants
  • Good Use: threaded discussion - forums / blog comments



XML DB's:

http://basex.org/

http://en.wikipedia.org/wiki/Sedna_%28database%29

http://xml.apache.org/xindice/

NoSQL:

http://redis.io/topics/introduction

http://www.mongodb.org/display/DOCS/Trees+in+MongoDB


Top DBA Shell Scripts for Monitoring the Database: https://communities.bmc.com/communities/docs/DOC-9942

IMS: https://communities.bmc.com/communities/docs/DOC-9908

Efficient lineage tracking for scientific workflows: http://portal.acm.org/citation.cfm?doid=1376616.1376716

Personal tools
Namespaces

Variants
Actions
Navigation
Wiki HOWTO
Old Forum
Toolbox