First Occurrence - Binary Search / Sorted Array

https://algo.monster/problems/binary_search_duplicates

Requesting adding test case:
10 11 12 13
10

expected: 0

Why do you need to separate these 2 conditions?

    if arr[mid] == target:
        ans = mid
        r = mid - 1
    elif arr[mid] < target:
        l = mid + 1

def find_first_occurrence(arr: List[int], target: int) → int:
left = 0
right = len(arr) -1
new_target = -1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
while arr[mid] == target and mid > -1:
new_target = mid
mid = mid - 1
if arr[mid] < left:
left = mid + 1
else:
right = mid - 1
return new_target

There must be 2 separate actions if it is equal to or greater than. If mid is equal to the target, that’s when you should update the boundary index. If it is simply greater than the target, you do not want to update the boundary, but you DO want to throw away the right half of the search area. Test Case #4 shows you why; if the array contains numbers in sorted order, but it does not contain the target number, you never want to update the boundary and return -1. That’s why you need the 3 separate conditionals.

Not sure what you mean by this. All test cases passed by changing Line 8 from “First Element Not Smaller Than Target” solution to be “if arr[mid] == target:” instead of >=… What is the specific test case that would cause the failure and the need for 3 different test cases?

This is an invalid test case. Problem states that the input list is sorted, your input list is not.

Test case 4. If you only use 2 conditions for >= and then an else, it will update the boundary to index 5, which is 20, which is greater than the target. You do not want to update the boundary if its greater than the value, you only want to update the boundary when it’s exactly the value you’re looking for. They both result in right = mid - 1, but the difference is you only update the boundary if its ==, not when its >, hence they have to be broken out.

I don’t think this explanation is great because it doesn’t really explain the difference between this question and the last one. Technically elements that are > 3 aren’t necessarily true for this problem.

I came up with the same solution as for the original “Finding the Boundary with Binary Search”, and I still did not get why the extra comparisons they make in this code…

public static int findBoundary(boolean[] arr) {
    
    if (arr[0]) return 0;
    
    int lo = 0;
    int hi = arr.length - 1;
    int boundary = -1;
    
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (arr[mid]) {
            boundary = mid;
            hi = mid - 1;                
        } else {
            lo = mid + 1;
        }
    }
            
    return boundary;           
    
}

The solution splits arr[mid] >= target to two conditions, one is arr[mid]==target, another is the extra arr[mid] > target. This is used for test case where the array doesn’t contain the target value.

E.g. arr = [4 6 7 7 7 20], target =8, it should return -1, not 5 (corresponds to value 20).

So when this happens, if you still want to use the same “finding boundary” code here, you need to add a patch after the while left<=right loop ends:

if (arr[res] != target) {
res = -1;
}

Pretty simple solution:

def find_first_occurrence(arr: List[int], target: int) -> int:
    start, end, index = 0, len(arr)-1, -1
    
    
    while start <= end:
        mid = (start+end)//2
        
        if target <= arr[mid]:
            if target == arr[mid]:
                index = mid
            end = mid - 1
        else:
            start = mid + 1
        
        
    return index
Footer

share my Java codes as the following:

public static int findFirstOccurrence(List<Integer> arr, int target) {
    // WRITE YOUR BRILLIANT CODE HERE
    int left = 0;
    int right = arr.size();
    int mid = 0;
    int index = -1;
    while (left <= right) {
        mid = left + (right - left) / 2;
        if (arr.get(mid) > target) {
            right = mid - 1;
        } else if (arr.get(mid) < target) {
            left = mid + 1;
        } else if (arr.get(mid) == target) {
            index = mid;
            if (mid > 0 && arr.get(mid -1) > target) {
                return mid;
            }
            right = mid - 1;
        } 
    }
    return index;
}

Why can’t the feasible function be arr[mid] <= target ? We need to find the first occurrence, and when arr[mid] == target, then I could either move left or right - since it says ‘first’ occurrence, I consider moving toward left of the array.
Please help.

yes, that works too. it reduces to a TTTFFF array. We use >=3 so it’s FFFTTT to make it easier to understand cuz it’s the same as the template problem “First Tre in a Sorted Boolean Array”.

Thank you. But what’s interesting is that the corresponding tweak in the code (to update the boundary Index when arr[mid] == target followed by high = mid-1) isn’t successful - it fails all test cases where there are duplicates as it returns the last occurrence instead of the first occurence. I am unable to understand why that’s the case. Please help.

I think I understand a bit more now. In the template to find first true in a sequence of FFFTTT we discard the right and move towards the left when the feasible function returns true, so whenever we update the boundary index, we choose to perform high = mid-1. Discarding the right, seems intuitive to find the first occurrence.
Since I am moving towards the right in my code, I always end up getting the last occurrence as my answer.

:+1:

def find_first_occurrence(arr: List[int], target: int) → int:
final_range = [-1, -1]
altered_binary_search(arr, 0, len(arr) - 1, target, final_range, True)
altered_binary_search(arr, 0, len(arr) - 1, target, final_range, False)
return final_range[0]

def altered_binary_search(array, left, right, target, final_range, go_left):
while left <= right:
mid = left + (right - left) // 2
if array[mid] < target:
left = mid + 1
elif array[mid] > target:
right = mid - 1
else:
if go_left:
if mid == 0 or array[mid - 1] != target:
final_range[0] = mid
return
else:
right = mid - 1
else:
if mid == len(array) - 1 or array[mid + 1] == target:
final_range[1] = mid
return
else:
left = mid + 1
Time: O(log(n)) Space: O(1)

my commented code almost the same as First Element Not Smaller.

public static int findFirstOccurrence(List<Integer> arr, int target) {
    // WRITE YOUR BRILLIANT CODE HERE
    int start=0;
    int end=arr.size()-1;
    while (start<=end)
    {
        // prevent stackovertFlow
        int midle=end+ (start-end)/ 2;
        // we found our possible answer
        if(arr.get(midle)==target)
        {
            // this is our answer
            if(midle>0 && arr.get(midle-1)!=target)
            {
                return midle;
            }// there stilll ocurrences of the number
            else if(midle>0 && arr.get(midle-1)==target)
            {
                end=midle-1;
            }// corner case
            else if(midle==0)
            {
                return midle;
            }
        }else if(arr.get(midle)>target)
        {
            end=midle-1;
        }else {
            start=midle+1;
        }
    }
    return -1;
}