I solve three string matching problem using similar searching technique (DFS :P). Problem 10 and problem 44 are basically the same problem, so between them I prefer talking about problem 44, since the solution is more concise and reflect the searching idea better.
97. Interleaving String
Given s1, s2, s3, find whether s3 is formed by the interleaving of s1 and s2.
For example, Given: s1 = “aabcc”, s2 = “dbbca”,
When s3 = “aadbbcbcac”, return true. When s3 = “aadbbbaccc”, return false.
both accepted codes can be found here
the DFS solution
This is the first problem I solve. The outline of the searching idea is relatively straightforward, here I express the algorithm using recursion:
isInterleave(s1: string, s2: string, s3: string):
if isEmpty(s1) && isEmpty(s2) then return True
else if isEmpty(s1) then return s2 == s3
else if isEmpty(s2) then return s1 == s3
else
let try1 = if s1[0] == s3[0]
then isInterleave(substr(s1, 1), s2, substr(s3, 1))
else False
let try2 = if s2[0] == s3[0]
then isInterleave(s1, substr(s2, 1), substr(s3, 1))
else False
return try1 || try2
The problem of this simple search algorithm is that the time complexity grows exponentially on worst case. Since the algorithm may compute the same “state” multiple times, that is we may compute isInterleave with same s1, s2, s3 for multiple time, similar to the traditional recursive method computing fibonacci number.
The solution is similar too, after we reach some state, we check whether we have reached here before, if so, it means that it was definitely a fail search, knowing that we can return false immediately, if not, we record that we have reached here.
So the final algorithm:
isInterleave(s1: string, s2: string, s3: string):
if length(s1) + length(s2) != length(s3) then return false
else let visited = bool[length(s1)][length(s2)]
fill(visited, false)
search(0, 0, 0)
search(i1: int, i2: int, i3: int):
if i1 == length(s1) && i2 == length(s2) then return true
else if i1 == length(s1) return substr(s2, i2) == substr(s3, i3)
else if i2 == length(s2) return substr(s1, i1) == substr(s3, i3)
else if visited[i1][i2] then return false
else
visited[i1][i2] = true
let try1 = if s1[i1] == s3[i3]
then search(i1 + 1, i2, i3 + 1)
else false
let try2 = if s2[i2] == s3[i3]
then search(i1, i2 + 1, i3)
else false
return try1 || try2
We won’t repeat the searching detail in this detail below :P.
The time complexity is about O(n * m)
in worst case (visit every state).
Dynamic Programming solution
After I work out the dfs solution. I found a neater solution in DP in discussion. Although the time complexity isn’t improved a lot, but both code taste and idea are better I think. So I mimic his(her) idea and implement myself.
the dp equation is that
let dp = bool[length(s1) + 1][length(s2) + 1]
dp[0][0] = true // empty strings
for every other cells
dp[i][j] =
(if i <= 0 then False
else dp[i - 1][j] && s1[i - 1] == s3[i - 1 + j])
||
(if j <= 0 then False
eles dp[i][j - 1] && s2[j - 1] == s3[i + j - 1])
result = dp[length(s1)][length(s2)]
Every cell dp[i][j] in dp array means the first (i + j) chars of s3 is the interleave string of first i chars in s1 and j chars in s2. And the dp equation is easy to understand, the key idea behind is equivalent to the searching algorithm but here is actually breadth first.
The time complexity is O(m * n)
44. Wildcard Matching
’?’ Matches any single character. ‘*’ Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).
The function prototype should be: bool isMatch(const char *s, const char *p)
Some examples:
isMatch(“aa”,”a”) → false
isMatch(“aa”,”aa”) → true
isMatch(“aaa”,”aa”) → false
isMatch(“aa”, “*”) → true
isMatch(“aa”, “a*”) → true
isMatch(“ab”, “?*”) → true
isMatch(“aab”, “cab”) → false
accepted code can be found here
And now we focus on the searching logic and leave the memorizing process aside.
Dealing with ?
is simple, simply let it match everything and we are good. *
is a little bit tricky. And we have three options
- match a single letter, advance both cursors
- match the text letter, advance text cursor only
- stop matching, advance pattern cursor only (skip the
*
)
And another subtlety lies in if we run out of text but not pattern, it doesn’t necessarily means a “not match”, if the letter under current pattern cursor is *
, we can match nothing on that, that is "abc*"
may match an "abc"
and here is the algorithm, memorizing process omitted:
isMatch(t: string, p: string):
search(0, 0)
search(i1: int, i2: int):
if i1 == length(t) && i2 == length(p) then return false
else if i2 == length(p) return false
if i1 == length(t) then
if p[i2] == '*' then return isMatch(i1, i2 + 1)
else return false
if p[i2] != '*' then
if match(t[i1], p[i2]) then return isMatch(i1 + 1, i2 + 1)
else return false
else
return search(i1 + 1, i2)
|| search(i1 + 1, i2 + 1)
|| search(i1, i2 + 1)
match(a: char, b: char):
return a == b || b == '?'
time complexity is O(m * n)
Conclusion
This week I practice the dfs memorizing search and a little dynamic programming techniques. And I found it handy in pattern matching problem. Although the algorithm don’t beat all people in time complexity, but it works reasonably well :)
So that’s it for this week.