Advanced Tree Structures
After mastering basic tree structures such as binary trees and binary search trees (BSTs), we are about to enter a broader and more efficient domainβadvanced tree structures.
If basic trees are ordinary roads in the world of data structures, then advanced tree structures are meticulously designed expressways and interchanges. Through clever balancing rules and storage strategies, they ensure excellent performance even when dealing with large amounts of data or frequent operations.
This section will introduce four classic advanced tree structures: AVL trees, Red-Black trees, B-trees, and B+ trees.
Why Do We Need Advanced Tree Structures?
Before diving into the details, let's ask a question: With efficient binary search trees (BSTs) available, why do we need more complex trees?
Imagine you run a library and use BSTs to store books sorted by title. If you insert books one by one in alphabetical order (e.g., `A...`, `B...`, `C...`), the tree may degenerate into a linked list. In this case, the time complexity for searching a book degrades from the ideal `O(log n)` to the worst-case `O(n)`, resulting in extremely low efficiency.

Figure: A diagram showing how a binary search tree degenerates into a linked list when inserting ordered data, leading to degraded search performance.
Advanced tree structures (self-balancing trees) were created specifically to solve this problem. They automatically adjust the tree structure during node insertion or deletion to maintain balance, ensuring that operations like search, insertion, and deletion consistently remain at `O(log n)` complexity.
AVL Tree: The Guardian of Height Balance
AVL trees are the earliest self-balancing binary search trees, named after their inventors, Adelson-Velsky and Landis.
The core idea of AVL trees is simple: For any node in the tree, the height difference between its left and right subtrees must not exceed 1.
Core Concept: Balance Factor
Each node has a balance factor, calculated as: Balance Factor = Height of Left Subtree - Height of Right Subtree. In AVL trees, each nodeβs balance factor can only be -1, 0, or 1.
Imbalance and Rotation Adjustment
After inserting or deleting a node, if a nodeβs balance factor becomes 2 or -2, the tree becomes unbalanced. To restore balance, AVL trees use four types of rotations:
- Right rotation: Handles "right-right" imbalance.
- Left rotation: Handles "left-left" imbalance.
- Left-right rotation: First left rotate, then right rotateβhandles "left-right" imbalance.
- Right-left rotation: First right rotate, then left rotateβhandles "right-left" imbalance.
Example
class AVLNode:
def __init__(self, key):
self.key = key
self.height = 1 # Node height, initialized to 1
self.left = None
self.right = None
def get_height(node):
"""Get node height (handles null nodes)"""
return node.height if node else 0
def get_balance_factor(node):
"""Calculate node's balance factor"""
if not node:
return 0
return get_height(node.left) - get_height(node.right)
def right_rotate(y):
"""
Perform right rotation on subtree rooted at y
y x
\ / \
x T4 β Right rotate (y) z y
/ \ - - - - - - - - - - - / \ / \
z T3 T1 T2 T3 T4
/ \
T1 T2
"""
x = y.left
T3 = x.right
# Perform rotation
x.right = y
y.left = T3
# Update heights (must update y first, then x)
y.height = 1 + max(get_height(y.left), get_height(y.right))
x.height = 1 + max(get_height(x.left), get_height(x.right))
# Return new root node
return x
# Left rotate function is symmetric to right rotate; implementation omitted here
Summary of AVL Tree Characteristics
| Feature | Description |
|---|---|
| Strict Balance | Strict balance condition (height difference β€ 1); tree height closest to log n, excellent search performance. |
| Frequent Adjustments | Insertion/deletion may require multiple rotations upward to restore balance. |
| Use Cases | Suitable for scenarios with more queries and fewer insertions/deletions, e.g., intermediate layers of database indexes or in-memory lookup tables. |
Red-Black Tree: The King of Engineering Practice
Red-Black trees are another type of self-balancing binary search tree. They make some "compromises" compared to AVL trees. Instead of pursuing absolute balance like AVL trees, they use looser rules to achieve approximate balance, thereby reducing the number of rotations required during insertion/deletion and achieving better overall engineering performance. Java's `TreeMap`, `TreeSet`, and C++ STL's `map`, `set` all use Red-Black trees under the hood.
Five Core Rules of Red-Black Trees
Red-Black trees maintain balance by adding a color attribute (red or black) to nodes and adhering to the following rules:
- Every node is either red or black.
- The root node is black.
- All leaf nodes (NIL null nodes) are black.
- The two children of any red node must be black (i.e., no consecutive red nodes).
- For any node, all paths from it to its descendant leaves contain the same number of black nodes.
Rules 4 and 5 are critical for Red-Black tree balance. Rule 5 ensures no path is more than twice as long as any other, thus achieving approximate balance.
Insertion Adjustment Strategy
When inserting a new node, we always initially color it red (to avoid violating Rule 5). If this causes a conflict (mainly violating Rule 2 or Rule 4), we resolve it through recoloring and rotation. The core idea is to propagate the conflict (red) upward toward the root until it can be resolved via recoloring or rotation.
Example
class RBNode:
RED = "RED"
BLACK = "BLACK"
def __init__(self, key):
self.key = key
self.color = self.RED # New node starts as red
self.left = None
self.right = None
self.parent = None # Red-black trees require parent pointers
def insert_fixup(tree, node):
"""Fix Red-Black tree properties after insertion (simplified logic description)"""
while node.parent and node.parent.color == RBNode.RED:
# Case 1: Uncle node is red β Recolor
# Case 2 & 3: Uncle node is black β Resolve using rotation and recoloring
# ... (specific implementation involves extensive pointer manipulation; this is a logical description)
pass
tree.root.color = RBNode.BLACK # Ensure root is black
Red-Black Tree vs AVL Tree
| Feature | AVL Tree | Red-Black Tree |
|---|---|---|
| Balancing Criterion | Strict (height difference β€ 1) | Loose (approximate balance via color rules) |
| Search Performance | Better (more balanced tree) | Slightly worse, but still O(log n)) |
| Insert/Delete | May require more rotations | Fewer rotations, higher efficiency |
| Use Cases | Scenarios requiring frequent lookups | High overall performance requirements, e.g., language standard libraries, file systems |
B-tree and B+ tree: The Engine for Disk Storage
When data volume is too large to fit entirely in memory, it must be stored on disk. Disk I/O (read/write) is significantly slower than memory access. B-trees and B+ trees are multi-way balanced search trees specifically designed to reduce disk I/O operations, and they are widely used in database and file system indexes.
B-tree: Balanced Multi-Way Search Tree
B-trees no longer store just one data item and two pointers per node. A node in an m-order B-tree has the following characteristics:
- Each node can have at most m child nodes.
- Each non-root, non-leaf node must have at least βm/2β child nodes.
- The root node must have at least 2 child nodes (unless it is itself a leaf).
- All leaf nodes reside at the same level.
- If a non-leaf node has k child nodes, it contains kβ1 keys, which partition the value ranges of its subtrees.

Figure: Diagram of a 3rd-order B-tree node structure. It contains two keys and three pointers, dividing the data range into three intervals.
Advantages of B-trees lie in their "short and wide" tree shape. Since a single node can store multiple keys and have multiple children, the tree height is greatly reduced. Consequently, fewer disk pages (nodes) need to be loaded when searching for data, improving efficiency.
B+ tree: The Enhanced Version of B-tree
B+ trees represent a key improvement over B-trees and are the de facto standard for modern database indexes.
Core differences between B+ trees and B-trees:
- Data Storage Location: In B+ trees, all data records (or pointers to records) are stored only in leaf nodes. Internal (non-leaf) nodes store only keys for routing purposes.
- Leaf Node Linked List: All leaf nodes are connected into an ordered doubly-linked list via pointers.

Figure: B+ tree diagram. Note that data resides only in leaf nodes, and leaf nodes form an ordered linked list.
Significant Advantages of B+ Trees
| Advantage | Explanation |
|---|---|
| More Stable Query Performance | Any search must reach a leaf node, so path lengths are equal, ensuring consistent performance. |
| Higher Space Utilization | Internal nodes store no data, allowing more keys per node, resulting in a shorter tree and fewer I/O operations. |
| Strong Range Query Support | Using the linked list of leaf nodes, range queries like `WHERE age BETWEEN 20 AND 30` can be executed efficiently, whereas B-trees require complex inorder traversal. |
| Better Suitability for Disk Prefetching | Disk reads/writes occur in pages (blocks). B+ tree nodes are typically designed to fit exactly one page size, so a single I/O operation loads more keys, further reducing I/O overhead. |
Summary
Letβs summarize these four advanced tree structures in a table:
| Tree Structure | Core Objective | Key Features | Typical Applications |
|---|---|---|---|
| AVL Tree | Maximize query speed | Strict height balance, frequent rotation adjustments | Scenarios requiring fast lookups in memory |
| Red-Black Tree | Balance overall performance | Approximate balance via color rules, high efficiency for insertions/deletions | Language standard libraries (Map/Set), process scheduling |
| B-tree | Minimize disk I/O | Multi-way balanced tree, data can reside in internal nodes | Early file systems, certain database indexes |
| B+ tree | Optimize database indexing | Data stored only in leaves, leaves form a linked list, strong range query support | Modern relational databases (MySQL, PostgreSQL) indexes |
YouTip