Merge User Accounts - Advanced Data Structures / Disjoint Set Union | Union Find

algo.monster · November 24, 2022, 3:54pm

https://algo.monster/problems/dsu_account_merge

patutinskiy · November 24, 2022, 3:54pm

I think lexicographical sorting works differently for different languages.
For example, for C# I am expected to get an answer
Jimmy beastboi2@funmail.com beastboi@funmail.com jimmyj@company.com johnson@fastmail.com

while default sorting in C# produces the answer
Jimmy beastboi@funmail.com beastboi2@funmail.com jimmyj@company.com johnson@fastmail.com.

This is because ASCI code of the “@” comes before numbers codes (https://yorktown.cbe.wwu.edu/sandvig/shared/asciicodes.aspx).
Could you clarify the rules of sorting expected for this question? For lexicographical sorting should “@” come before numbers of after?

am1 · December 1, 2022, 5:12am

C# is indeed a little different than other languages, the answers should be sorted in lexicographical order (i.e. ‘2’ comes before ‘@’). The default C# sorting does not sort lexicographically, you should specify the string comparer in the built in sorting function to StringComparer.Ordinalfor lexicographical sorting.

Yijie_Yang · August 18, 2023, 12:27am

A more detailed explanation:
#use (username, email) as set id
# collect all (username,email) and first put those from same account into one set
# use a dict[ancestor] to house an account’s all (user, email)
# Iterate all (username,email) set and add them to dict[ancestor] - >remove duplicate
# for each ancestor
# the dict[ancester], sort and extract all emails and put them in list [ancestor[0], ancestor[1], email[2]]
# final res = [sorted for each ancestor list]

Yijie_Yang · August 18, 2023, 1:17am

My python solution:

from collections import defaultdict

class Union_find():
def init(self):
self.id = {}

def find(self,x):
    y = self.id.get(x,x)
    if y != x:
        y = self.find(y)
        self.id[x] = y
    return y

def merge(self,x,y):
    self.id[self.find(x)] = self.find(y)

def merge_accounts(accounts: List[List[str]]) → List[List[str]]:
# merge account = set assignment
# since username needs to be the same, use username as an id，(username, email)
# collect all (username,email) and first put those from same account into one set
# use a dict[ancestor] to house an account’s all (user, email)
# Iterate all (username,email) set and add them to dict[ancestor] - >remove duplicate
# for each ancestor
# the dict[ancester], sort and extract all emails and put them in list [ancestor[0], ancestor[1], email[2]]
# final res = [sorted for each ancestor]
# Time: O(nlogn) Space: O(n)

uf = Union_find()
res = []
all_user_emails = set()

for info in accounts:
    username = info[0]
    set_heart = None
    for email in info[1:]:
        pair = (username, email)
        all_user_emails.add(pair)
        if set_heart is None:
            set_heart = pair
        else:
            uf.merge(pair, set_heart)

user_dict = defaultdict(list)

for pair in all_user_emails:
    ancestor = uf.find(pair)
    user_dict[ancestor].append(pair)
    
# sort the res
for heart_pair, all_pairs in user_dict.items():   
    username = heart_pair[0]
    user_info = [username]
    for pair in sorted(all_pairs):
        user_info.append(pair[1])
    res.append(user_info)

res.sort(key = lambda x: (x[0],x[1]))    
return res