DSU/Union Find Fundamentals - Advanced Data Structures / Disjoint Set Union | Union Find

https://algo.monster/problems/dsu_intro

another implementation that tracks counts of disconnected component, size of each component and does path compression

@dataclass
class Counter:
key: int = 0

def union_find(graph: Dict[int, List[int]]):
def find(x: int) → int:
while parent[x] != x:
parent[x] = parent[parent[x]] # 1. path compression
x = parent[x]
return x

def union(x: int, y: int, counter: Counter):
    root_x = find(x)
    root_y = find(y)
    if root_x == root_y:
        return

    # 2. connect smaller subtree to bigger tree
    if size[root_x] > size[root_y]:
        parent[root_y] = root_x
        size[root_x] += size[root_y]
    else:
        parent[root_x] = root_y
        size[root_y] += size[root_x]

    counter.key -= 1

count = Counter()  # maintains count of disconnected component
parent, size = {}, {}  # size ~~ "weight"
for v in graph:
    parent[v] = v
    size[v] = 1
    count.key += 1

# run UF over each edge
for v1, v1_edges in graph.items():
    logger.debug(f"{v1} -> {v1_edges}")
    for v2 in v1_edges:
        union(v1, v2, count)

For the first implementation of UnionFind, it says that: “The following code accomplishes a Union operation in O(1)”. But how does it do it in O(1) if it uses “find” which is O(n) in the worst case?

Within the rank choice, why do we continuously call find(x) and find(y) for each if statement. Wouldn’t we just call it once and then do an analysis through the if conditions?

You can do that too. But since the find() is being called a constant number of times, the overall complexity will be O( constant * log(n)) which is the same as log(n). Hence it doesn’t matter whether you call find() once, or a constant number of times.

When we talk about the merge operation or insert operations in general, we assume that the location or the index at which the value is to be inserted is already known. So any complexity spent to find that insertion index is not considered in the insert operation. Insertion in a list is O(n) because even after we find the index to insert the value at beforehand, we still need to move the rest of the elements in the list by 1 to make an empty spot for the new value. In union find, there is no need to move the elements as we are simply mapping the node to a different parent which is a simple assignment operation that can be done in O(1) time.

this should specify that a quick union implementation is used first over a quick find.

 self.id[self.find(y)] = self.find(x)
28            # if rank is the same then we update x rank and increment by 1
29            # we make y's parent equal to x's parent, so x has increased depth
30            if self.rank[self.find(x)] == self.rank[self.find(y)]:

is this right? line 30 will always return true, because self.id[self.find(y)] = self.find(x), which means y and x are pointing to the same parent already. Do i miss something?

Typo in Implementation section:

How do we construct the tree strcture to maintain disjoint sets?