Newspapers - Binary Search / Advanced

Time complexity is O(log(n) * n).

Why? Let sum of all read times be totalReadTime. It’s easy to see method feasible’s time complexity is O(n). And binary search itself takes O(log(totalReadTime)) so overall time complexity is O(log(totalReadTime) * n). Given read time is less than 10^5, in worse case, O(log(totalReadTime) * n) would be O(log(n * 10^5) * n) . When n is large enough, 10^5 can be ignored. So from Big O notation, time complexity is O(log(n) * n).

I’d like to share my solutions here.

public static int newspapersSplit(List<Integer> newspapersReadTimes, int numCoworkers) {
    // WRITE YOUR BRILLIANT CODE HERE
    int left = 0; // Min time needed
    int right = 0; // Max time needed
    for(int readTime : newspapersReadTimes) {
        left = Math.max(left, readTime);
        right += readTime;
    }
    
    int result = 0;
    while(left <= right) {
        int mid = left + (right - left) / 2;
        if(isReadTimeEnough(mid, newspapersReadTimes, numCoworkers)) {
            result = mid;
            right = mid - 1;
        }
        else {
            left = mid + 1;
        }
    }
    
    return result;
}

private static boolean isReadTimeEnough(int allocatedReadTime, List<Integer> newspapersReadTimes, int numCoworkers) {
    int numCoworkersNeeded = 1;
    int totalReadTime = 0;
    for(int paperReadTime : newspapersReadTimes) {
        totalReadTime += paperReadTime;
        if(totalReadTime > allocatedReadTime) {
            numCoworkersNeeded++;
            totalReadTime = paperReadTime;
            if(numCoworkersNeeded > numCoworkers) {
                return false;
            }
        }
    }
    
    return true;
}

I’m curious if anyone can provide a good explanation for why the greedy solution to take as many newspapers as possible without passing the limit in the feasible function for a worker provides the best solution. I realize it works by testing it but curious how to develop the intuition that this could be solved through that method.

1 Like

Since it metioned in the question:

“The newspapers are carefully laid out in a line in a particular order that must not be broken when assigning the newspapers. You cannot pick and choose newspapers randomly from the line to assign to a co-worker. Instead, you must take newspapers from a particular subsection of the line, make a pile and give that to a co-worker.”

This is what makes it work.

My solution is slightly different from the given one:

bool are_readable(std::vector newspapers_read_times, int num_coworkers, int read_time) {
int rt = read_time;
int cw = 1;

for (int i = 0; i < newspapers_read_times.size(); ++i) {
    if (newspapers_read_times[i] <= rt) {
        rt -= newspapers_read_times[i];
    } else {
        rt = read_time - newspapers_read_times[i];
        ++cw;
    }
}

return cw <= num_coworkers;

}

int newspapers_split(std::vector newspapers_read_times, int num_coworkers) {
int min_time = *std::max_element(newspapers_read_times.begin(), newspapers_read_times.end());
int max_time = std::accumulate(newspapers_read_times.begin(), newspapers_read_times.end(), 0);

while (min_time < max_time) {
    const int mid = (min_time + max_time)/2;
    if (are_readable(newspapers_read_times, num_coworkers, mid)) {
        max_time = mid;
    } else {
        min_time = mid + 1;
    }
}

return max_time;

}

To use binary search to find optimal solution, two components should be considered
(1) search of the optimal solution – it is the O(log n) when the solution space is a polynomial function N.
(2) verification of the optimal solution.

So, i think when we try to solve a problem, we can ask if
(1) we already know the solution space – min and max
(2) the time complexity when combining the search and verification of the optimal solution is better than the brute force algorithm.

In this example, given m workers and n newspapers, there are nCm ways to split the newspapers into m sets. The time complexity of this is O(n!) when n >> m. So, using binary search to search for optimal solution worth a try.

1 Like

I don’t know why feasible() function in python is written differently but it is equivalent to the one in the previous problem.

Structured programming JS Solution:

function newspapersSplit(readTimes, numCoworkers) {
    let min = Math.max(...readTimes);
    let max = readTimes.reduce((a, b) => a + b);
    let result = -1;
                               
    while (min <= max) {
        const mid = Math.floor((min + max) / 2);
        
        if (isFeasible(readTimes, mid, numCoworkers)) {
            result = mid;
            max = mid - 1;
        } else {
            min = mid + 1;   
        }
    }
    
    return result;
}

function isFeasible(readTimes, mid, numCoworkers) {    
    let workerCount = 1;
    let accumulatedTime = 0;
    let result = true
    
    for (let i = 0; i < readTimes.length && result; i++) {
        const time = readTimes[i];
        
        accumulatedTime += time;
        
        if (accumulatedTime > mid) {
            workerCount += 1;
            accumulatedTime = time;
            
            if (workerCount > numCoworkers) {
                result = false;
            }
        }
    }
    
    return result;
}

Isn’t this question same as the previous question?

C++
int newspapers_split(std::vector newspapers_read_times, int num_coworkers) {
int max_time = std::accumulate(newspapers_read_times.begin(), newspapers_read_times.end(), 0);
int min_time = *std::max_element(newspapers_read_times.begin(), newspapers_read_times.end());
int res = max_time;
while(min_time <= max_time) {
int mid = min_time + (max_time - min_time)/2;
int i = 0;
int mid_min = mid;
int coworker = 1;
while(i < newspapers_read_times.size()) {
if(newspapers_read_times[i] <= mid_min) {
mid_min = mid_min - newspapers_read_times[i];
i++;
}
else {
mid_min = mid;
coworker++;
}
}

    if(coworker <= num_coworkers) {
        res = mid;
        max_time = mid -1;
    }
    else {
        min_time = mid + 1;
    }
    
}
return res;

}

When the number of coworkers are greater or equal to the number of newspaper, we can return the lower bound of the possible time without ever entering the while loop.

if max_coworkers >= len(newspaper_read_times):
return max(newspaper_read_times)

Cool problem!

can there by a possibility where some of the coworkers do not need to do a work?

The provided solution is invalid. Input of

1 2 4 7 8 
2

returns 14, but should return 11.

For those who are having a hard time understanding the problem, let’s use newspapers_read_times = [7,2,5,10,8], num_coworkers = 2 as an example:

  • Each worker can only be distributed newspapers once. This means distributing [7,2] to Worker#1, [5,10] to Worker#2, and [8] to Worker#1 again is not allowed, because that way Worker#1 would have been distributed newspapers twice.
  • Newspapers must be distributed consecutively. This means we cannot distribute [7,2,8] to Worker#1 and [5,10] to Worker#2, because [7,2] and 8 are not consecutive. On the contrary, [7,2] and 5 are consecutive.
  • Newspapers are read in parallel by workers. If we distribute [7,2,5] to Worker#1 and [10,8] to Worker#2, Worker#1 would have finished reading all of his newspapers in 12 minutes (7+2+5=12), while Worker#2 would need 6 more minutes before finishing his newspaper (10+8-12=6). In other words, by the time Worker#2 finishes reading 10, Worker#1 would have finished reading 7 and 2, and has started reading 5 for 1 minute (so he needs 4 more minutes until finish).
  • Our goal is to find the most optimized way to distribute these newspapers, so that workers can finish reading them in the shortest time. For example, the distribution of [7,2] and [5,10,8] would take Worker#1 9 minutes (7+2=9) to read, and take Worker#2 23 minutes (5+10+8=23) to read. But if we go [7,2,5] and [10,8] it would take Worker#1 12 minutes (7+2+5=12) to read, and take Worker#2 18 minutes (10+8=18) to read, which is the most optimized distribution in this case.
1 Like

14 is correct based on the constraints of the original problem:

Additionally, the newspapers came in a particular order, and you must not disarrange the original ordering when distributing the newspapers. In other words, you cannot pick and choose newspapers randomly from the whole pile to assign to a co-worker, but rather you must take a subsection of consecutive newspapers from the whole pile.

So we have to split the array into 2 consecutive subsections, we cannot do

[1 2 8] [4 7]

Because that breaks the original ordering of

1 2 4 7 8

Splitting the array while preserving the ordering would look as follows:

[1, 2, 4, 7] [8]

Minimum amount of time it will take is 1+2+4+7 = 14.

1 Like

Maybe you can explicitly point out (what is to me, at least) the most salient aspect of this problem and its solution: specifically, that we’re not performing binary search on the data provided in the input array, but on an implicit array whose values are a meta property of the data in the input array.

I.e. [minPossibleReadTime, … mid, …, maxPossibleReadTime]

Once I realized that, and made it part of my mental model, the explanation made sense right away. But without it, the explanation just confused me. Maybe thinking of it like that will help others? I’m also guessing that the general tactic applies to other problems, so it might be helpful to point it out. Thanks!

def newspapers_split(newspapers_read_times: List[int], num_coworkers: int) → int:
# WRITE YOUR BRILLIANT CODE HERE
minTime,maxTime = max(newspapers_read_times) , sum(newspapers_read_times)
while (minTime <= maxTime):
numWorker = 1
timeWorker = 0
mid = (minTime+ maxTime)//2
for time in newspapers_read_times:
if(timeWorker + time <= mid):
timeWorker += time
else:
timeWorker = time
numWorker+= 1
if numWorker <= num_coworkers:
maxTime = mid - 1
else:
minTime = mid + 1
return minTime

You don’t actually need the edge case check in the python solution. If you do

return num_workers < num_coworkers

instead of

if read_time != 0:
    num_workers += 1
return num_workers <= num_coworkers

the behavior is identical.

To explain: we almost always have the situation where N workers are fully occupied and one worker is partially occupied. If N == num_coworkers, there will always be leftover time because in the code above we increment num_workers only if we detect a spillover into the next worker.

Therefore, if num_workers is equal to num_coworkers, there’s extra time that needs to be handled by an extra worker somewhere.

We should only return true if num_workers is strictly less than num_coworkers. That lets us skip the check for extra time, because it’s implicit in the strictly < comparison.

// edge case to check if we needed an extra coworker at the end
40        if (time != 0) {
41            numWorkers++;
42        }

why don’t need this part of the feasible method if we delare numWorkers = 1 at the beginning of the feasible method, right?