Union-find

Imagine we have a graph (e.g., a network) like this

and we are interested in reachability queries. For example

Is g reachable from a?   —   NO
Is there a path between i and h?   —   YES

However, to make things more interesting, our graph is not static
and we can link two nodes.

Link f and j:

Is there a path between e and j?   —   YES
Is there a path between c and i?   —   NO

Link b and g:

Is there a path between c and i now?   —   YES

Link a and h:

Link a and d:

Is there a path between e and h?   —   NO
Is there a path between d and g?   —   YES

This problem can be formulated a little more abstractly: given a collection of disjoint sets, we would like to support the following two operations:

  • union(A,B) – replaces sets A and B with their union
  • find(x) – find set X where x belongs.

In our network application, the nodes reachable from each other form disjoint sets (called connected components). By linking two nodes from two different components, we get one big component – their union. Nodes x and y are connected by a path if and only if they belong to the same component, i.e., if find(x)=find(y).

Link f and j: union C and E

Link b and g: union A and D

Link a and h; link a and d

Applications

Kruskal’s algorithm

Unification

Other:
Gilbert et al. (1994) An efficient algorithm to compute row and column counts for sparse Cholesky factorization. SIAM Journal on Matrix Analysis and Applications, 15:1075.
Gabow, H. and Tarjan, R. (1985). A linear-time algorithm for a special case of disjoint set union. Journal of computer and system sciences, 30(2):209–221.