2-3 tree

Idea:
  • all items are stored in a ternary tree
  • each node contains 1 or 2 keys, all the non-leaves have 2 or 3 children (respectively; hence the name)
  • all the leaves are at the same level
  • when a node overflows during insertion, it is split in two and the middle element is inserted to parent (a cascade of splits may occur)
  • when a node underflows during deletion, it may borrow some elements from neighbouring nodes or, if the nodes are small, they may be merged
Advantages:
  • guaranteed \(O(\log n)\) time (in the worst case) for each dictionary operation; the height of a 2-3 tree is between \(\log_2 n\) and \(\log_3 n\)
Variations:
  • 2-3+ tree: items are only stored at leaves; the keys in internal vertices only serve as guides during the search
[You can download the application for offline viewing.]

Structure and balancing

In a 2-3 tree, every node stores either 1 or 2 keys and has either 2 or 3 children, respectively (unless it is a leaf). We call them accordingly, 2-nodes and 3-nodes:

All leaves must have the same depth - they have to be at the same level. This is achieved by splitting nodes that overflow and joining nodes that underflow.

Consequently, a 2-3 tree of height \(h\) stores at least \(2^h-1\) keys (if all nodes are 2-nodes) and at most \((3^h-1)/2\) keys (if all nodes are 3-nodes). This implies height between \(\log_3 (2n+1)\approx 0.631\log_2 n\) and \(\log_2(n+1)\).

Insertion

When we insert a new key into a 2-node, we just get a 3-node, which is OK.

However, when we insert a key into a 3-node, it overflows and we need to split it:

The procedure continues by inserting \(y\) to the parent node. This may start a cascade of overflown 3-nodes which ends either when the parent has enough capacity or we get to the top of the tree and create a new root (at this point the height of the whole tree increases by 1).

Deletion

Assume we are deleting a key from leaf. (If not, we replace the key by its successor, which is in a leaf - the same trick we used in BST). When we delete from a 3-node, we just get a 2-node, which is OK.

A problem occurs, when deleting from a 2-node which only has a single key, so after deletion, the node is empty and this is not allowed. There are two cases:

If a sibling next to it is a 3-node, we can borrow a key from it:

Otherwise the adjacent sibling is a 2-node. We join it with a key from the parent node and recursively deal with deleting from the parent node (this may start a cascade of joins):

Thus, the entire delete operation may involve a series of joins. It ends either with a share, or with join when parent is a 3-node, or continues all the way up to the root. In that case the final join deletes root and height of the tree decreases by 1.