B-tree

Invented by Bayer, McCreight (1972)
Idea:
  • a generalization of 2-3 and 2-3-4 trees: internal nodes of a B-tree of order t have between \(t/2\) and \(t\) children (for \(t=3\) and 4, we get 2-3 and 2-3-4 trees)
  • B-trees are often used as an external data structure, i.e., for data stored on disk
  • since hard disks are 1000 – 100 000-times slower than RAM, it is crucial to minimize the number of disk accesses; this is achieved by choosing very large \(t\) (usually so that one node fills one disk page)
  • when a node overflows, it is split in two and the middle element is inserted to parent (a cascade of splits may occur)
  • when a node underflows, it may borrow some elements from neighbouring nodes or, if the nodes are small, they may be merged
Advantages:
  • great for storing large amounts of data which don’t fit into the main memory, while it minimizes the number of disk accesses
Variations:
  • B+ tree: items are only stored at leaves; the keys in internal vertices only serve as guides during the search; there are two advantages: 1. since internal nodes do not store any additional data, this allows for higher degree, smaller height, and less disk accesses; 2. data may be easily traversed sequentially from left to right (contrast this with in-order traversal of a B-tree)
  • pre-emptive splitting and merging: split/merge nodes on the way down the tree; advantage: no need to return (and thus less disk accesses); disadvantage: sometimes nodes are split even when it was not necessary (which leads to slightly increased height and memory consumption)
  • B* tree: non-root nodes have to be at least 2/3 full; advantage: better use of memory; disadvantage: more splitting and merging is needed
[You can download the application for offline viewing.]

References

  • R. Bayer and E.M. McCreight. Organization and maintenance of large ordered indexes. Acta informatica, 1(3):173-189, 1972. [bib] [pdf]
  • D. Comer. Ubiquitous B-tree. ACM Computing Surveys (CSUR), 11(2):121-137, 1979. [bib]
  • A.C.C. Yao. On random 2-3 trees. Acta Informatica, 9(2):159-170, 1978. [bib]
  • R.A. Baeza-Yates. Fringe analysis revisited. ACM Computing Surveys (CSUR), 27(1):109-119, 1995. [bib] [pdf]