YouTip LogoYouTip

Dsa Advanced Tree

Advanced Tree Structures

After mastering basic tree structures such as binary trees and binary search trees (BSTs), we are about to enter a broader and more efficient domainβ€”advanced tree structures.

If basic trees are ordinary roads in the world of data structures, then advanced tree structures are meticulously designed expressways and interchanges. Through clever balancing rules and storage strategies, they ensure excellent performance even when dealing with large amounts of data or frequent operations.

This section will introduce four classic advanced tree structures: AVL trees, Red-Black trees, B-trees, and B+ trees.


Why Do We Need Advanced Tree Structures?

Before diving into the details, let's ask a question: With efficient binary search trees (BSTs) available, why do we need more complex trees?

Imagine you run a library and use BSTs to store books sorted by title. If you insert books one by one in alphabetical order (e.g., `A...`, `B...`, `C...`), the tree may degenerate into a linked list. In this case, the time complexity for searching a book degrades from the ideal `O(log n)` to the worst-case `O(n)`, resulting in extremely low efficiency.

Image 1

Figure: A diagram showing how a binary search tree degenerates into a linked list when inserting ordered data, leading to degraded search performance.

Advanced tree structures (self-balancing trees) were created specifically to solve this problem. They automatically adjust the tree structure during node insertion or deletion to maintain balance, ensuring that operations like search, insertion, and deletion consistently remain at `O(log n)` complexity.


AVL Tree: The Guardian of Height Balance

AVL trees are the earliest self-balancing binary search trees, named after their inventors, Adelson-Velsky and Landis.

The core idea of AVL trees is simple: For any node in the tree, the height difference between its left and right subtrees must not exceed 1.

Core Concept: Balance Factor

Each node has a balance factor, calculated as: Balance Factor = Height of Left Subtree - Height of Right Subtree. In AVL trees, each node’s balance factor can only be -1, 0, or 1.

Imbalance and Rotation Adjustment

After inserting or deleting a node, if a node’s balance factor becomes 2 or -2, the tree becomes unbalanced. To restore balance, AVL trees use four types of rotations:

  • Right rotation: Handles "right-right" imbalance.
  • Left rotation: Handles "left-left" imbalance.
  • Left-right rotation: First left rotate, then right rotateβ€”handles "left-right" imbalance.
  • Right-left rotation: First right rotate, then left rotateβ€”handles "right-left" imbalance.

Example

class AVLNode:
    def __init__(self, key):
        self.key = key
        self.height = 1  # Node height, initialized to 1
        self.left = None
        self.right = None

def get_height(node):
    """Get node height (handles null nodes)"""
    return node.height if node else 0

def get_balance_factor(node):
    """Calculate node's balance factor"""
    if not node:
        return 0
    return get_height(node.left) - get_height(node.right)

def right_rotate(y):
    """
    Perform right rotation on subtree rooted at y
    y   x
     \ / \
      x T4  β†’ Right rotate (y) z  y
     / \ - - - - - - - - - - - / \ / \
    z  T3  T1 T2 T3 T4
   / \
  T1  T2
    """
    x = y.left
    T3 = x.right

    # Perform rotation
    x.right = y
    y.left = T3

    # Update heights (must update y first, then x)
    y.height = 1 + max(get_height(y.left), get_height(y.right))
    x.height = 1 + max(get_height(x.left), get_height(x.right))

    # Return new root node
    return x

# Left rotate function is symmetric to right rotate; implementation omitted here

Summary of AVL Tree Characteristics

Feature Description
Strict Balance Strict balance condition (height difference ≀ 1); tree height closest to log n, excellent search performance.
Frequent Adjustments Insertion/deletion may require multiple rotations upward to restore balance.
Use Cases Suitable for scenarios with more queries and fewer insertions/deletions, e.g., intermediate layers of database indexes or in-memory lookup tables.

Red-Black Tree: The King of Engineering Practice

Red-Black trees are another type of self-balancing binary search tree. They make some "compromises" compared to AVL trees. Instead of pursuing absolute balance like AVL trees, they use looser rules to achieve approximate balance, thereby reducing the number of rotations required during insertion/deletion and achieving better overall engineering performance. Java's `TreeMap`, `TreeSet`, and C++ STL's `map`, `set` all use Red-Black trees under the hood.

Five Core Rules of Red-Black Trees

Red-Black trees maintain balance by adding a color attribute (red or black) to nodes and adhering to the following rules:

  • Every node is either red or black.
  • The root node is black.
  • All leaf nodes (NIL null nodes) are black.
  • The two children of any red node must be black (i.e., no consecutive red nodes).
  • For any node, all paths from it to its descendant leaves contain the same number of black nodes.

Rules 4 and 5 are critical for Red-Black tree balance. Rule 5 ensures no path is more than twice as long as any other, thus achieving approximate balance.

Insertion Adjustment Strategy

When inserting a new node, we always initially color it red (to avoid violating Rule 5). If this causes a conflict (mainly violating Rule 2 or Rule 4), we resolve it through recoloring and rotation. The core idea is to propagate the conflict (red) upward toward the root until it can be resolved via recoloring or rotation.

Example

class RBNode:
    RED = "RED"
    BLACK = "BLACK"

    def __init__(self, key):
        self.key = key
        self.color = self.RED  # New node starts as red
        self.left = None
        self.right = None
        self.parent = None  # Red-black trees require parent pointers

def insert_fixup(tree, node):
    """Fix Red-Black tree properties after insertion (simplified logic description)"""
    while node.parent and node.parent.color == RBNode.RED:
        # Case 1: Uncle node is red β†’ Recolor
        # Case 2 & 3: Uncle node is black β†’ Resolve using rotation and recoloring
        # ... (specific implementation involves extensive pointer manipulation; this is a logical description)
        pass

    tree.root.color = RBNode.BLACK  # Ensure root is black

Red-Black Tree vs AVL Tree

Feature AVL Tree Red-Black Tree
Balancing Criterion Strict (height difference ≀ 1) Loose (approximate balance via color rules)
Search Performance Better (more balanced tree) Slightly worse, but still O(log n))
Insert/Delete May require more rotations Fewer rotations, higher efficiency
Use Cases Scenarios requiring frequent lookups High overall performance requirements, e.g., language standard libraries, file systems

B-tree and B+ tree: The Engine for Disk Storage

When data volume is too large to fit entirely in memory, it must be stored on disk. Disk I/O (read/write) is significantly slower than memory access. B-trees and B+ trees are multi-way balanced search trees specifically designed to reduce disk I/O operations, and they are widely used in database and file system indexes.

B-tree: Balanced Multi-Way Search Tree

B-trees no longer store just one data item and two pointers per node. A node in an m-order B-tree has the following characteristics:

  • Each node can have at most m child nodes.
  • Each non-root, non-leaf node must have at least ⌈m/2βŒ‰ child nodes.
  • The root node must have at least 2 child nodes (unless it is itself a leaf).
  • All leaf nodes reside at the same level.
  • If a non-leaf node has k child nodes, it contains kβˆ’1 keys, which partition the value ranges of its subtrees.

Image 2

Figure: Diagram of a 3rd-order B-tree node structure. It contains two keys and three pointers, dividing the data range into three intervals.

Advantages of B-trees lie in their "short and wide" tree shape. Since a single node can store multiple keys and have multiple children, the tree height is greatly reduced. Consequently, fewer disk pages (nodes) need to be loaded when searching for data, improving efficiency.

B+ tree: The Enhanced Version of B-tree

B+ trees represent a key improvement over B-trees and are the de facto standard for modern database indexes.

Core differences between B+ trees and B-trees:

  1. Data Storage Location: In B+ trees, all data records (or pointers to records) are stored only in leaf nodes. Internal (non-leaf) nodes store only keys for routing purposes.
  2. Leaf Node Linked List: All leaf nodes are connected into an ordered doubly-linked list via pointers.

Image 3

Figure: B+ tree diagram. Note that data resides only in leaf nodes, and leaf nodes form an ordered linked list.

Significant Advantages of B+ Trees

Advantage Explanation
More Stable Query Performance Any search must reach a leaf node, so path lengths are equal, ensuring consistent performance.
Higher Space Utilization Internal nodes store no data, allowing more keys per node, resulting in a shorter tree and fewer I/O operations.
Strong Range Query Support Using the linked list of leaf nodes, range queries like `WHERE age BETWEEN 20 AND 30` can be executed efficiently, whereas B-trees require complex inorder traversal.
Better Suitability for Disk Prefetching Disk reads/writes occur in pages (blocks). B+ tree nodes are typically designed to fit exactly one page size, so a single I/O operation loads more keys, further reducing I/O overhead.

Summary

Let’s summarize these four advanced tree structures in a table:

Tree Structure Core Objective Key Features Typical Applications
AVL Tree Maximize query speed Strict height balance, frequent rotation adjustments Scenarios requiring fast lookups in memory
Red-Black Tree Balance overall performance Approximate balance via color rules, high efficiency for insertions/deletions Language standard libraries (Map/Set), process scheduling
B-tree Minimize disk I/O Multi-way balanced tree, data can reside in internal nodes Early file systems, certain database indexes
B+ tree Optimize database indexing Data stored only in leaves, leaves form a linked list, strong range query support Modern relational databases (MySQL, PostgreSQL) indexes
← Dsa SortingDsa Greedy Algorithm β†’