Help:Tree Structures

From 911datasets
Revision as of 06:59, 1 August 2012 by Jkeogh (Talk | contribs)

Jump to: navigation, search

Comparisons: Hierarchical Data Storage Technique Comparisons

Main Source: http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database

http://www.evoio.org/wiki/Phylotastic/Datastore

http://www.dbmonster.com/Uwe/Forum.aspx/db-theory/986/Nested-Sets-vs-Nested-Intervals


Nested Sets AKA Modified Pre-order Tree Traversal

Nested Sets design AKA "Modified Pre-order Tree Traversal" (left and right columns for numbering):

  • Columns: Left, Right
  • First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties
  • - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding ancestors is easy but inefficient
  • - Finding node's children as all the descendants restricted to the next level is inefficient (e.g. consider root node)
  • - Finding node's parent as ancestor on the previous level is inefficient due to inefficiency of ancestors search
  • - Aggregate queries are relatively slow (except counting, which is super fast)!
  • - Tree reorg is hard
  • Cheap level, ancestry, descendants
  • Compared to Adjacency List, moves, inserts, deletes more expensive.
  • Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
  • easy to retrieve data, hard to add data
  • Moving nodes is difficult because it requires renumbering all other nodes to the right (can use -number trick to speed up a bit)
  • Efficient to retrieve a fully ordered hierarchical tree, Typically, a read/traverse/retrieve operation requires a single simple SQL statement
  • Model very well adapted to implement tree structures with databases (whatever that means... indexing.. views?)
  • Allows for compact and elegant implementations with nice code symmetries
  • Inefficient model to modify tree structure. Typically, an insert/reorder operation requires an SQL statement that modifies n/2 nodes at the average, or n nodes in the worst case (n = # nodes in a tree)
  • Non-intuitive database values. A simple db query of a table doesn't intuitively reveal the tree structure. As a consequence, nested set trees need to be handled and interpreted programmatically


Nested Intervals AKA Nested Sets using Indexed Rational Numbers AKA Matrix Encoding AKA Farey Fraction Encoding:

Nested Intervals AKA Matrix Encoding

  • Columns: Left/Right as floating point numbers
  • Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
  • - Same as MP: Finding descendants is easy and and relatively efficient (i.e. proportional to the size of the subtree)
  • - Same as MP: Finding ancestors is tricky but efficient
  • - Same as AR: Finding node's children is trivial
  • - Same as AR: Finding node's parent is trivial
  • - All aggregate queries are relatively slow
  • - Tree reorg is easy (but not as simple as in AR)


Adjacency List Design AKA Recursion

Adjacency Relation AKA Adjacency List Design AKA Recursion (columns like parent_id and level):

  • Columns: ID, ParentID
  • - Have to use proprietory SQL extensions for finding ancestors and descendants; although the queries are efficient
  • - Finding descendants is relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding node's children is trivial
  • - Finding node's parent is trivial
  • - Aggregate queries are relatively slow
  • - Tree reorg is very simple
  • Easy to implement.
  • Cheap node moves, inserts, and deletes.
  • Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).
  • Use Common Table Expressions in those databases that support them to traverse.
  • hard to retrieve data, easy to add data
  • you have both an ID column and a Parent ID column in a table. This is easy to understand and maintain but can be difficult or inefficient when you want to retrieve hierarchies of records.
  • hard to retrieve a fully ordered hierarchical tree
  • more-intuitive database values (parent id, depth)


Materialized Path AKA Legal Numbering AKA Lineage Column:

Materialized Path

  • Column: lineage (e.g. /parent/child/grandchild/etc...)
  • - Finding descendants is easy and relatively efficient (i.e. proportional to the size of the subtree)
  • - Finding ancestors is tricky but efficient
  • - Finding node's children as descendants on next level is inefficient
  • - Finding node's parent as ancestor on the previous level is efficient
  • - All aggregate queries are relatively slow
  • - Tree reorg is easy
  • Limit to how deep the hierarchy can be.
  • Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
  • Ancestry tricky (database specific queries)


Closure Table (aka "Bridge Table" aka "Adjacency Relation"):

Closure Table

  • Columns: ancestor, descendant
  • Stands apart from table it describes.
  • Can include some nodes in more than one hierarchy.
  • Cheap ancestry and descendants (albeit not in what order)
  • For complete knowledge of a hierarchy needs to be combined with another option.



Flat Table Design

Flat Table

  • Columns: ID, ParentID, Level/Rank
  • A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
  • Expensive move and delete
  • Cheap ancestry and descendants
  • Good Use: threaded discussion - forums / blog comments



XML DB's:

http://basex.org/

http://en.wikipedia.org/wiki/Sedna_%28database%29

http://xml.apache.org/xindice/

NoSQL:

http://redis.io/topics/introduction

http://www.mongodb.org/display/DOCS/Trees+in+MongoDB


Top DBA Shell Scripts for Monitoring the Database: https://communities.bmc.com/communities/docs/DOC-9942

IMS: https://communities.bmc.com/communities/docs/DOC-9908

Efficient lineage tracking for scientific workflows: http://portal.acm.org/citation.cfm?doid=1376616.1376716

Personal tools
Namespaces

Variants
Actions
Navigation
Wiki HOWTO
Old Forum
Toolbox