This series of post is for the homework assignment of Algorithm Design of my college, which require us to post at least one solution for an arbitrary problem on Leetcode every week.
I’m not very good at algorithms now, and usually don’t come up with a best solution for a relatively simple question. But I’ll try to learn from every single problem as much as I can.
Here’s the problem of this week.
There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be
O(log(m+n))
.
My accepted codes (in C++) can be found in my github repository
An naive attempt
It looks like an ordinary binary search problem, but has many subtleties and complexities involved.
My basic thought is rather similar to the solution of “Kth Largest element” problem, that is we find the (m + n + 1) / 2
th largest element in two arrays. I should have a way to rule out some irrelevent element very quickly, here’s my first dummy attempt.
input: Two sorted arrays v1, v2
1. initialize range for v1, [low1, high1], v2, [low2, high2], and the target index "target"
1. pick the index m1 of middle element of v1 in [low1, high1]
2. find the index m2 of the last element of v2 in [low2, high2] that is less than or equal to v1[m1], if no such element exists, m2 = low2 - 1
// now two arrays should be divide by v1[m1] appropriately (not really, but very close)
3. adjust the range of v1 and v2, target index according to the amount of elements of two halves.
4. back to step 1
Now I should think about the termination condition of recursion. If the “target” index becomes 0, then return min(v1[low1], v2[low2]). If one the the ranges shrink to null, then we index the result directly from the other array.
Finally I solve the median problem in an naive way, if total amount of elements is even, then I repeat the process twice and compute the average.
Simple as this method is, it still takes me hours to debug the program and settle all the sutleties correctly. Time complexity of the algorithm would be O(log(m)log(n)) for worst case. (Binary search of v2 always shrink the range by 1), which is not quite satisfying.
The code of this method is under class Solution
a (much) improved algorithm
I’m not satisfied with my stupid algorithm at all. So I turn to the solution for a better algorithm, which is indeed neat and clean.
The algorithm basically say, since we know the index of element we are finding, that is if we partition two arrays into two halves with the median number of elements on the left and right are both known. And for each half, if we know how many element is originally in v1
, the rest must be in v2
. Since both arrays are sorted in the first place, we can partition two arrays simultaneously by giving only one partition point of either v1
or v2
.
And we examine what condition of (4) partitions of two array should satisfy. That is max(elements in left halves) <= min(elements in right halves). If we find the partition correctly, we can find the median in no time.
input: Two sorted arrays v1, v2
1. if v1.length > v2.length, swap v1, v2
2. initialize the binary search range on v1 [low, high]
3. for m1 = (low + high) / 2, calculate m2 to partition v2 so that total elements in left halves = total elements in right halves
4. check the correct partition condition, if satisfied, return appropriate median, otherwise adjust range of the binary search.
5. back to step 3
The only subtleties here is what if the partition point doesn’t exist in one array, that is we may fail the binary search with no correct partition point found. But thankfully during the process, if in one array no partition point could be found, the partition point in the other can still be set correctly, what we should do is only do bound checking when ever we want to access some element in the array and stop the binary search in an appropriate time. And all the details could be found in codes.
The time complexity of this algorithm is simply O(log(min(m, n))), since there are only one binary search on the shorter array.
Last
This is my first “record” about solving an algorithm problem, I didn’t explain every detail in the article but only express my basic thoughts or understanding of the problem.
To be honest I don’t think the reason that this problem took me so much time to think, to code, to debug, to think again is for the “Hard” label in the title, but for my “long-time-no-use” rusty brain. Even the simplest binary search took me a lot of time to settle the “off by 1” problem.
GRRR, I NEED PRACTICES BADLY.