Median of Data Stream - Priority Queue / Heap / Multiple Heaps

https://algo.monster/problems/median_of_data_stream

I believe Comparator.reverseOrder() lets you effectively initialize a max heap in Java.

You can always pass the number to larger half, and only rebalance if the length of the larger half exceeds smaller half
class MedianOfStream:
def init(self):
self.larger_half=[] #min_heap
self.smaller_half=[]#max_heap
def add_number(self, num: float) → None:
# WRITE YOUR BRILLIANT CODE HERE
heappush(self.larger_half, num)
if len(self.larger_half) > len(self.smaller_half):
heappush(self.smaller_half, -1*heappop(self.larger_half))

def get_median(self) -> float:
    # ALSO HERE
    if len(self.larger_half) == len(self.smaller_half):
        return (self.larger_half[0]-self.smaller_half[0])/2
    else:
        return -1*self.smaller_half[0]

This is my solution, rebalance only happens when size of big pile is bigger than the size of the small pile, passed the test but failed i leetcode test case

["MedianFinder","addNum","findMedian","addNum","findMedian","addNum","findMedian","addNum","findMedian","addNum","findMedian"]
[[],[-1],[],[-2],[],[-3],[],[-4],[],[-5],[]]
```, 
not sure what I missed, 
my solution:

class MedianOfStream:
def init(self):
self.min_h = [] # big pile
self.max_h = [] # small pile

def add_number(self, num: float) -> None:
    # WRITE YOUR BRILLIANT CODE HERE
    if len(self.min_h) == 0 or num > self.min_h[0]:
        heappush(self.min_h, num)
    else:
        heappush(self.max_h, -num)
    if len(self.min_h) > len(self.max_h):
        self._rebalance()
        
def _rebalance(self):
    # pop the smallest from big pile and push it to the small pile
    smallest = heappop(self.min_h)
    heappush(self.max_h, -smallest)
    

def get_median(self) -> float:
    # ALSO HERE
    res = -1
    if len(self.min_h) == len(self.max_h):
        smallest = self.min_h[0]
        biggest = -self.max_h[0]
        res = (smallest + biggest) / 2
    else:
        res = -self.max_h[0]
    return res

I don’t understand the result of test cases. 6,1,2,3 leads to [2,1] and [3,6] heaps, median is (2+3)/2 => 2.5.
Then we add 4 leads to [3,2,1] and [4,6], median is 3.
Same calculations for the second test it also should be 2.5 and 3.
Why tests want it to be 2 and 2.5 and why java code from example shows 2 and 2.5?

Why this can’t be solved using a single heap and maths.

A heap does not give you any information about other elements than the min/max element.

the number in the first line is a number of lines to be parsed containing the test case

You are only rebalancing when min heap size is bigger. You need to consider cases where max_heap increases in size. e.g. input starting like 1000 1 2 ...

The single heap approach can’t be used because you need access to the middle two values in the case of an even number of items. You can get one of the middle values with a single heap, but the other can be nested deeply in the heap because it’s only “somewhat” sorted. If you manually looked into that heap, you would see that it would take an O(log(n)) algorithm to find that second value, which would break the restriction of finding the median in O(1) time.

I think to simplify this solution we can just immediately add every new element into our minHeap(bigger numbers).
And only if len(minHeap) > len(maxHeap) then pop minHeap’s min element into maxHeap.
This will always guarantee our two sides are balanced and by pushing to the right side portion first we will never have an issue where our left side portion(maxHeap) has an element that is greater than the min element in the minHeap.

Sorry, guys I was wrong. It won’t guarantee that we push the correct values to our sides.

So, if we wanted to push to our maxHeap we would have to push to their minHeap FIRST, so that when we pop from the minHeap we have the TRUE smallest number to push into our maxHeap

Similarly, if we wanted to push to our minHeap we would have to push to our maxHeap FIRST, so that when we pop from the maxHeap we are guaranteed to have the TRUE greatest number to push into our minHeap.

The time complexity is confusing and possibly inaccurate. The problem statement is asking for an O(1) solution, which seems impossible if we’re talking about worse-case time complexity. The solution claims O(q log q) time complexity. Shouldn’t it be O(q log n) instead? Or, more precisely, O(log n) per query and per insertion?

For positive numbers, can we use a single heap to track just the median and the number(s) directly to the right:

class MedianOfStream:
    h = []
    even = True
    def add_number(self, num: float) -> None:
        heappush(self.h, num)
        self.even = not self.even

        if not self.even and len(self.h) != 1:
            heappop(self.h)
        return

    def get_median(self) -> float:
        if not self.even:
            return self.h[0]
        return (self.h[0] + self.h[1])/2