Could you add what’s the time complexity of this approach? Define n to be number of nodes in the graph and m the number of edges. Since we are using a priority queue instead of a regular queue, each operation takes O(log(queue size)) instead of O(1). So my guess is total time complexity is O( (n+m)*log(n+m))?
Please add test case for [“abc”, “ab”]. output should be “”
It sounds like you are assuming the characters in a single word are lexographically sorted, e.g. if the word is sell then “s” must come before “e” and “e” must come before “l”. I made that assumption and initially drew out a graph based on that and saw the cycles with no node to start from (all nodes had incoming edges).
I think we can only tell order by comparing letters in different words, not by comparing letters within a single word. For example, “seashell, seashore” tells us that “e” comes before “o” because these words share the same prefix “seash” and so the order is decided by the next characters after that common prefix, which are “e” and “o”. The order of the characters in the “seash” prefix is irrelevant. Also, the order of the words with a common prefix tells us nothing about the letters following the “e” and “o” because the “e” and “o” are what decides the order, not the “ll” following the “e” or the “re” following the “o”.
I think the answer to your question is in this statement:
“There may be multiple valid orders. If that’s the case, return the smallest in normal lexicographical order.”
The Test Case #3
she sell seashell seashore seahorse on a
does NOT tell us what the order of “n” and “r” are. Note that it does NOT tell us the order of “l” or “s” either! So we could technically put these letters in any order relative to each other:
nrlsheoa, rnlsheoa, nsrlheoa, rsnlheoa, lnrsheoa, lrnsheoa, etc…
But if we apply the statement above, then we should order n, r, l and s based on NORMAL lexicographical order, which I assume they mean to be ENGLISH lexicographical order. With the provided words we know the order of letters, other than n, r, l or s, is h->e->o->a. Since we need an order between n, r, l and s in order to come to a specific, single solution, we apply English lexicographical order to these letters only which gives us additional edges of l->n->r->s. With the provided words, we also know s has to come before h, o and a, so if we combine the two orderings above we get l->n->r->s->h->e->o->a, which is the answer to Test Case #3: lnrsheoa.
Note this also seems to be true for Test Case #4
stdlib stl scanf sscanf printf
If you look at the answer “abdfilnrtcsp” you can see that a, b, d are all in the words but the words don’t tell us what order they should be in, so in the answer these letters are ordered according to the English alphabet: a, b, (c), d, etc.
The edges that the words give us are: d->l and t->c->s->p. Since we don’t know the order between d and t, we order them according to english alphabet (“normal lexicographical order”), so d->l->t->c->s->p so that d comes before t (note this is only one possible ordering, d->t->l->c->s->p would also work at this point). Since we don’t know the ordering of abfinr, we also order them according to english alphabet, a->b->f->i->n->r. Combining the two, we get a->b-d->f->i->l->n->r->t->c->s->p. Why not “abdfinrtclsp”? Because having the l follow i will keep it in alphabetic order while also maintaining the the d->l order. If the alphabetic order of l doesn’t matter, then the following would also be valid outputs: “abdfinrtlcsp”, “abdfinrltcsp”, “abdfinrtcspl”, etc. and we know there is only 1 valid output.
^^ I really needed to be pointed to req 4 on this, Thanks @Mod!
How does a min heap (priority queue initialized with natural ordering of characters ) return “wertf” as the smallest natural order for Test Input #1?
During topo sort for this example, w is inserted first into the q, then goes e. Lexicographically e is smaller than w.
Am I missing something fundamental here?
yes, it’s been added
I had a similar question:
We’re accustomed to popping a queue level-by-level to get a topological sort. In the solution to this problem, we pop from the entire heap, not just the same level. In other words, if a child is lexicographically smaller than elements already in the heap from a prior level, it could be added to the solution before those from a prior level. To me this seemed like a child could be placed in the solution incorrectly, before preceding letters in the alien alphabet.
My guess is that this is impossible because of the constraints of the problem. But I’m having a hard time proving it. Does anyone see why this issue does not come up?
Yes I have the same question.
I think I figured it out - the constraint I was worried about violating was a child getting added to the result before a direct ancestor. For example, a child “a” and its ancestor “z” ending up in the heap at the same time. “a” would be added to the solution before “z”, which would be incorrect according to the alien dictionary. As it turns out, this scenario is impossible. In this solution, a child can only be added to the heap once all its direct ancestors have been added to the string.
However, mixing of letters from different levels is still possible. So why is this not a problem? In short, because the order of letters between different levels can be ambiguous. We can look at an example provided by “crimson” in a comment below: aaa abb abm azb yay yak.
“a” is clearly the first letter in the dictionary. Once “a” is added to the solution, we add “b” and “y” to the heap. “b” precedes “m” and “z”, and y precedes “k”. However, and this is the important part, it is ambiguous whether “b” precedes “y”. This is where the English alphabetical order comes in. We add “b” to the solution first, which adds “m” and “z” to the heap. Again, it is ambiguous whether “m” and/or “z” precede y. Therefore the English alphabetical order rule comes into play. Fortunately, all three letters are currently in the heap. We pop “m”, and then “y”. Adding “y” to the solution adds “k” to the heap. (Note again that a child, “k”, will not precede “y”, its parent.) It follows that we add “k” then “z” to the solution. The final string becomes “abmykz”. Note that this satisfies every rule in the dictionary (e.g. “b” precedes “m” and “z”, and “y” precedes “k”), as well as the rule that letters be order lexicographically when the dictionary is ambiguous. The solution produces this answer. (Checked using the “Custom Input” tool.)
In sum, allowing letters from lower levels before letters from higher levels in the solution is a feature, not a bug! In the future I will keep an eye out for additional rules that clarify ambiguity, since it’s possible that additional rules imply that levels of the graph may mix in the solution.
test case #5 is wrong. it should be “adeglnfjhitsbu” instead of empty string???
Can someone explain test case 6 ? Why did the output came as “lnaeoiprts” ? How was the position of P determined ?
I have the same question as crimson. But I think I understand it now.
Any node put in the queue have 0 in_degree, even they may be pushed in at different time (different levels). It does NOT matter which one is popped first, the result will always be a topological order. For our case, it requires to use the NORMAL lexicographical order, using heap to guarantee the order.
I think the solution is wrong the use of minHeap is not correct. the minHeep is sorted based on what ???. The other issue is the test cases they don’t make sense to me
For example
she sell seashell seashore seahorse on a
should result in : sheolarn not lnrsheoa why n and l before s . that’s wrong for sure
Since the
graph = {‘s’: [‘h’, ‘o’], ‘h’: [‘e’], ‘e’: [‘o’], ‘l’: [‘a’], ‘a’: , ‘o’: [‘a’], ‘r’: , ‘n’: }
in_degree = {‘s’: 0, ‘h’: 1, ‘e’: 1, ‘l’: 0, ‘a’: 2, ‘o’: 2, ‘r’: 0, ‘n’: 0}
Here is my solution that I think its the correct way to go about it
from typing import List
from collections import deque
def alien_order(words: List[str]) -> str:
# initialize graph and indegree
in_degree = {}
graph = {}
for word in words:
for ch in word:
in_degree[ch] = 0
graph[ch] = []
# build the graph and indegree
for i in range(1, len (words)):
n1 = len(words[i -1])
n2 = len(words[i])
n = max(n1, n2)
first_diff_found = False
for j in range(n):
if j < n1 and j < n2:
if first_diff_found: # No need to compare the adjacent words anymore
break
elif words[i -1][j] != words[i][j]:
first_diff_found = True
in_degree[words[i][j]] += 1
graph[words[i -1][j]].append(words[i][j])
# Go deep for each of the start nodes and append topology to the result. we don't want to start will the 0 indegree nodes at the same time as we want to do this level by level
start_nodes = [k for k in graph if in_degree[k] == 0]
result = []
for node in start_nodes:
q = deque([node])
while q:
node_out = q.popleft()
result.append(node_out)
neighbors = graph[node_out]
for neighbor in neighbors:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
q.append(neighbor)
return ''.join(result) if len(result) == len(graph) else ''