Task Scheduling 2 - Graph / Directed Graph / Topological Sort

I can confirm that the correct answer for test case #3 is 101.
The reason is that, tasks {a,b,c, and d} can be completed while task g is being worked on, because tasks {a,b,c, and d} are independent of task g. And therefore, minimum time to complete tasks {a, b, c, d, AND g} is the time it takes to complete task g plus the time it takes to complete task f (since task f can only be completed after task g has been completed).
The minimum time to complete ALL tasks is the maximum between the time it takes to complete tasks {a, b, c, d, g, and f} and the maximum time in the times array.

sorry,
I meant to say, "And therefore, minimum time to complete tasks {a, b, c, d, g, and f} is the time it takes to complete task g plus the time it takes to complete task f.

At first, I got confused as well and I thought case #3 was wrong… but now I see it is not. Try to draw the graph and you will see: It is clear that F must be started after B and G. However, B will get “free” to be executed in just 1 unit of time (because A, D and C) can be executed at the same time and all of them demand 1 single unit of time. G demands 100 units of time… more than enough for B get finished. So, in summary, tasks A, C, D, and G has no dependencies, so it all can start at the same time. B depends on A, C and D, so it will have to wait for 1 unit of time before starting. F depends on G and B… however, as it was said, B and G can still execute concomitantly, even thou B starts after G, because G takes so much more time. This way, to total time is indeed 101, which is basically the time for G + the time for F.

Used BFS and had some hard time to accommodate the times in the graph nodes

from typing import List, Dict
from collections import deque

def task_scheduling_2(tasks: List[str], times: List[int], requirements: List[List[str]]) → int:
# WRITE YOUR BRILLIANT CODE HERE

def count_parents(graph):
    
    counts = {node: 0 for node in graph}
    for parent in graph:
        for node in graph[parent]:
            counts[node] += 1
            
    return counts
    
def topo_sort(graph, task_times):
    total_time = 0
    q = deque()
    counts = count_parents(graph)
    
    # define a time_exec dictionary for task and its time.
    # and initiaize it to 0
    time_exec: Dict[str, int] = dict() 
    for node in graph:
        time_exec[node] = 0
    
    for node in counts:
        if counts[node] == 0:
            q.append(node)
            # calculating time for task node andits total time 
            time_exec[node] = task_times[node]
            total_time = max(total_time, time_exec[node])
                             
    
    while(len(q) > 0):
        
        node = q.popleft()
        for child in graph[node]:
            counts[child] -= 1
            # calculating time for task node andits total time 
            time_exec[child] = max(time_exec[child], time_exec[node] + task_times[child])                 
            total_time = max(total_time, time_exec[child])
                             
            if counts[child] == 0:
                q.append(child)
                             
    return total_time             
        
# Create and init graph and task_times
graph: Dict[str, List[str]] = dict()
task_times: Dict[str, int] = dict()
                             
# Populate the hash map graph with tasks, 
# and the hash map task_times with tasks and thier time
for i in range(len(tasks)):
    graph[tasks[i]] = list()
    task_times[tasks[i]] = times[i]

# Populate the requirements in the graph
for req in requirements:
    graph[req[0]].append(req[1])

return topo_sort(graph, task_times)

I think this could be a stupid doubt… why do we find the maximal path? Isn’t the question asking for minimum time? Why use max function?

Same question

It sounds confusing, but max time here means how much we have to wait for all siblings to complete before we can move forward. If nodes A thru C take 1 unit and D takes 10 units and all A, B, C, D are linked to E, we cannot go to E sooner than in 10 units.

The Java solution loc on updating child distance should be further simplfiied, why Math.max() is needed here? To avoid cases where user inputs a negative taskTimes? the loc I refer to is:

dis.put(child, Math.max(dis.get(child), dis.get(node) + taskTimes.get(child)));

nvm, I saw it wrong, one is dis.gret(child), one is dis.get(node)

Since we build and store the graph in memory, wouldn’t the space complexity be O(n + m)? Since we store each vertice (task) and their edges (all dependents for each of the tasks)

Can someone point me a simiilar leetcode problem?
Thanks in advance.

My solution using Kahn’s algorithm template. I have introduced current_time which accounts the maximum time for independently running tasks and appends it with the maximum time taken.

Tried to add comments to my best

from typing import List
from collections import deque

def find_indegree(graph):
    counts = {node: 0 for node in graph}
    for parent in graph:
        for node in graph[parent]:
            counts[node] += 1
    return counts

def topo_sort(graph, task_times):
    res = 0
    queue = deque()

    # set the initial distance as 0
    distances = { node: 0 for node in graph }
    indegree = find_indegree(graph)

    # fetching the node with 0 dependency
    for node in indegree:
        if indegree[node] == 0:
            queue.append(node)

            # update the distance for this node
            # distance is beleived to be the initial time taken for the task
            distances[node] = task_times[node]
            res = max(res, distances[node])
    max_time = 0
    while queue:
        N = len(queue)
        current_time = 0
        
        # as we reach this stage, we have all independent nodes running at same time
        # so the node with maximum time will be current_time
        for _ in range(N):

            node = queue.popleft()
            current_time = max(current_time, task_times[node])
            
            for neighbor in graph[node]:
                # As we have visited this node, we need to dedude the indegree from the neighbor
                indegree[neighbor] -= 1

                if indegree[neighbor] == 0:
                    queue.append(neighbor)
        # we just append the max_time with the current time
        max_time += current_time
    return max_time


def task_scheduling_2(tasks: List[str], times: List[int], requirements: List[List[str]]) -> int:
    graph, task_times = {}, {}
    for i in range(len(tasks)):
        graph[tasks[i]] = []
        task_times[tasks[i]] = times[i]

    for req in requirements:
        graph[req[0]].append(req[1])

    return topo_sort(graph, task_times)

My solution building a graph with a tuple containing the time for each task and neighbors:

def task_scheduling_with_time(tasks, times, requirements):
    def build_graph():
        graph = {tasks[i]: (times[i], []) for i in range(len(tasks))}  # type: ignore
        for task, dependant in requirements:
            graph[task][1].append(dependant)
        return graph

    def get_in_degrees(graph):
        in_degrees = {node: 0 for node in graph}
        for _, dependant in requirements:
            in_degrees[dependant] += 1
        return in_degrees

    graph = build_graph()
    in_degrees = get_in_degrees(graph)
    queue = deque()
    for node, in_degree in in_degrees.items():
        if in_degree == 0:
            queue.append(node)

    result = 0
    temp = 0
    while queue:
        for _ in range(len(queue)):
            node = queue.popleft()
            temp = max(temp, graph[node][0])
            for neighbor in graph[node][1]:
                in_degrees[neighbor] -= 1
                if in_degrees[neighbor] == 0:
                    queue.append(neighbor)
        result += temp
        temp = 0
    return result
from collections import defaultdict, deque

def task_scheduling_2(tasks: List[str], times: List[int], requirements: List[List[str]]) -> int:
    graph = dict()
    task_to_indegree = {}
    task_to_time = {}
    for task, time in zip(tasks, times):
        graph[task] = {"time": time, "children": set()}
        task_to_indegree[task] = 0
        task_to_time[task] = time
    for req in requirements:
        graph[req[0]]['children'].add(req[1])
        task_to_indegree[req[1]] += 1

    queue = deque([(task, time) for task, time  in zip(
        tasks, times) if task_to_indegree[task] == 0])
    max_time = 0
    max_parent_time = defaultdict(int)
    while queue:
        task, time = queue.popleft()
        max_time = max(max_time, time)
        for child_task in graph[task]['children']:
            task_to_indegree[child_task] -= 1
            max_parent_time[child_task] = max(time, max_parent_time[child_task])
            child_time = task_to_time[child_task]
            if task_to_indegree[child_task] == 0:
                queue.append((child_task, max_parent_time[child_task] + child_time))
    return max_time

@algo.monster any updates on whether test case #3 is correct or not?
To me it seems the answer should be 102, whereas the test case says it’s 101.

@watermelon999
I think if you draw it out. You will see why the answer is actually 101. At the beginning we see that nodes a, c, d and g have an in degree = 0 so we can start them all at the same time. Node b is dependent on a, c, d finishing and they all take a time of 1. The problem states we can work on any number of nodes concurrently as long as the dependency is met. So essentially what’s happening is that we start to process tasks [a, c, d, g] at time 0. By time 1, [a, c, d] are finished while task g is still processing. But since b’s dependent tasks are now finished we can start task b at time 1 and it will finish at time 2. Then all we have left in our graph is g and its child f. So at time 100 finally task g finishes and then we can do task f which takes 1 unit of time. So the full time it took was simply time of task G plus time of task f = 100 + 1. It’s a very tricky edge case. And I’m unsatisfied with how the template does not seem to fit super well into this problem. I wish @algo.monster took more time to explain the changes that need to be made to the template.

Khan’s Algorithm wont wok nicely here. We are going deeper here “DFS not BFS”. The objective to find the max rout from the root node all the way to the end.

There is no topology is needed for each level in the graph. A simple approach would be be identifying the start graph then do a DFS . My solution is simplest and shortest :slight_smile:

def task_scheduling_2(tasks: List[str], times: List[int], requirements: List[List[str]]) -> int:
  
  task_times = {tasks[x]: times[x] for x in range(len(tasks))}
  graph = {x: set() for x in tasks}
  in_degree = {x: 0 for x in tasks}
  
  for a, b in requirements:
      graph[a].add(b)
      in_degree[b] += 1
  
  result = 0

  graph_start = [k for k, v in in_degree.items() if v == 0]
      
  def dfs(item):
      
      result = task_times[item]
      
      result +=  max(dfs(x) for x in graph[item]) if graph[item] else 0 
  
      return result
  
  return max([dfs(x) for x in graph_start]) if graph_start else 0
  

@algo.monster can you please draw the graph of test case 3? Why is the answer not 5623749 ? Even if aggregate starts at time=0, it will finish at 5623749 which is way after others.

Particularly, in code, dis.get(node) + taskTimes.get(child) does not make sense. Why are we adding the whole time a child might need when it can overlap with the node execution?

dis.put(child, Math.max(dis.get(child), dis.get(node) + taskTimes.get(child)));

brad cad dag ethereum forget aggregate
20 190 2930 2375 123 5623749
5
forget ethereum
ethereum dag
dag cad
cad brad
brad aggregate