Union-Find, Explained Simply
Union-Find (also called Disjoint Set Union, DSU) is a tiny toolkit for tracking which items belong together. Think of it as a label maker for groups: it can quickly tell you whether two people are in the same “friend circle,” and it can glue two circles together.
What We’re Managing: Disjoint Sets
Imagine n
people split into separate circles. Two rules:
- No overlap. Each person is in exactly one circle. If Sarah is in
S1
, she isn’t inS2
. Two questions we care about.
- Find: “Which circle is Sarah in?”
- Union: “Can we merge
S1
andS2
into one circle?”
Example circles:
S1 = {Alice, Bob, Carol}
S2 = {Dave, Eve}
S3 = {Grace, Helen, Frank}
They don’t overlap, so they’re perfect for Union-Find.
The Two Moves
Find(x) Returns the representative (the label) of x’s circle. Example:
Find(Bob) → S1
.Union(a, b) Merges the circles that contain
a
andb
. Example:Union(S1, S2)
→{Alice, Bob, Carol, Dave, Eve}
becomes one circle.
That’s the whole API.
How It Works Under the Hood
Internally, each circle is a tiny tree:
- Every person points to a “parent.”
- The root of the tree is the circle’s representative.
So:
- Find(x) walks parent pointers until it reaches the root (the label of x’s circle).
- Union(a, b) connects one root under the other, turning two trees into one.
Speed Tricks (Why It’s So Fast)
Path Compression (during Find) After we climb to the root, we flatten the path: everyone we touched points directly to the root. Future Finds get almost instant.
Union by Rank/Size Always attach the smaller tree under the larger tree’s root. This keeps trees shallow, which keeps Finds quick.
With both tricks, operations are effectively near O(1) on average—even for huge datasets.
Quick Mental Model
- Label maker:
Find
reads the label;Union
combines labels. - Family tree: the root is the “head of the family.”
Find
walks up to the head;Union
marries two families by making one head point to the other. - Autotune: path compression and union-by-size keep the trees short so everything stays fast.