当前位置:网站首页>Classical dynamic programming: longest common subsequence

Classical dynamic programming: longest common subsequence

2020-11-08 19:51:16 labuladong,

After reading this article , You can go and get the following questions :

1143. Longest common subsequence

-----------

Longest common subsequence (Longest Common Subsequence, abbreviation LCS) Is a very classic interview topic , Because its solution is a typical two-dimensional dynamic programming , Most of the more difficult string problems are similar to this problem , For example, editing distance . and , This algorithm can be used to solve other problems with a little modification , So LCS The algorithm is worth learning .

The title is to let us find two strings of LCS length :

 Input : str1 = "abcde", str2 = "ace" 
 Output : 3  
 explain :  The longest common subsequence is  "ace", Its length is  3

There must be readers who will ask , Why is this problem solved by dynamic programming ? Because of the type of subsequence , It's not easy to enumerate all possible outcomes , And the dynamic programming algorithm is exhaustive + prune , They're made for each other . So it can be said that as long as the subsequence problem is involved , Nine times out of ten, dynamic planning is needed to solve , It's right to think about this .

Now let's analyze it by hand , How to solve this problem with dynamic programming techniques .

One 、 Dynamic planning ideas

First step , Be clear dp The meaning of array . For the dynamic programming of two strings , The routine is universal .

For example, for Strings s1 and s2, Generally speaking, it is necessary to construct such a DP table:

For the convenience of understanding this table , For the time being, we think the index is from 1 At the beginning , Just make a little adjustment in the code later . among ,dp[i][j] The meaning is : about s1[1..i] and s2[1..j], Their LCS The length is dp[i][j].

For example, in the example above ,d[2][4] That means : about "ac" and "babc", Their LCS The length is 2. The final answer we want to get is dp[3][6].

The second step , Definition base case.

We've got the index to be 0 Rows and empty columns ,dp[0][..] and dp[..][0] Should be initialized to 0, This is it. base case.

for instance , According to just now dp Definition of array ,dp[0][3]=0 The meaning is : For strings "" and "bab", Its LCS The length of is 0. Because a string is empty , The length of their longest common subsequence should obviously be 0.

The third step , Find the state transition equation .

This is the most difficult step in dynamic planning , Fortunately, this string problem has the same pattern , Let's talk about the way to deal with this kind of problem .

State transition is simply about making choices , Let's say this question , Is to seek s1 and s2 The longest common subsequence of , Let's call this subsequence lcs. So for s1 and s2 Each character in , What's the choice ? It's simple , Two options , Either in lcs in , Or not .

This 「 stay 」 and 「 be not in 」 Is to choose , The key is , How to choose ? This needs a little brain work : If a character should be in lcs in , So this character must exist at the same time s1 and s2 in , because lcs It's the longest public Subsequences . So the idea of this question is like this :

With two Pointers i and j Go back and forth s1 and s2, If s1[i]==s2[j], So this character It must be lcs in ; Otherwise ,s1[i] and s2[j] These two characters At least one of them is not here lcs in , One needs to be discarded . Let's take a look at recursion , Easier to understand :

def longestCommonSubsequence(str1, str2) -> int:
    def dp(i, j):
        #  Empty string of  base case
        if i == -1 or j == -1:
            return 0
        if str1[i] == str2[j]:
            #  I found one here  lcs  The elements of , Keep looking for 
            return dp(i - 1, j - 1) + 1
        else:
            #  Who can make  lcs  The longest , Just listen to who 
            return max(dp(i-1, j), dp(i, j-1))
        
    # i  and  j  Initialize to the last index 
    return dp(len(str1)-1, len(str2)-1)

For the first case , Find one lcs The characters in , At the same time i j Move one bit forward , And give lcs Add one... To the length of ; For the latter , Then try two situations , Take a bigger result .

PS: I've seriously written about 100 Multiple original articles , Hand brush 200 Daoli is the subject , All published in labuladong A copy of the algorithm , Continuous updating . Recommended collection , Write the title in the order of my article , Master all kinds of algorithm set, then put into the sea of questions, like fish .

In fact, this code is a brute force solution , We can go through memos or DP table To optimize time complexity , For example, as described above DP table To solve :

def longestCommonSubsequence(str1, str2) -> int:
    m, n = len(str1), len(str2)
    #  structure  DP table  and  base case
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    #  Make a state transition 
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i - 1] == str2[j - 1]:
                #  Find one  lcs  The characters in 
                dp[i][j] = 1 + dp[i-1][j-1]
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
        
    return dp[-1][-1]

Two 、 Troubleshooting

about s1[i] and s2[j] Unequal situation , At least one The character is not in lcs in , Is it possible that neither character is there ? For example, the following situation :

So whether the code should consider this situation , To such :

if str1[i - 1] == str2[j - 1]:
    # ...
else:
    dp[i][j] = max(dp[i-1][j], 
                   dp[i][j-1],
                   dp[i-1][j-1])

I had this suspicion at the beginning , In fact, it can be changed in this way , You can also get the right answer , But it's unnecessary , because dp[i-1][j-1] Always the smallest of the three ,max It's impossible to get it at all .

The reason is that we are very concerned about dp Definition of array : about s1[1..i] and s2[1..j], Their LCS The length is dp[i][j].

Look at it like this , obviously dp[i-1][j-1] Corresponding lcs The length cannot be larger than the first two cases , So there's no need to participate in the comparison .

3、 ... and 、 summary

For the dynamic programming of two strings , Generally speaking, it is defined as in this article DP table, Because there's a benefit to this definition , It's easy to write state transition equations ,dp[i][j] The state of can be derived from the previous state :

The way to find the state transition equation is , Think about what each state has 「 choice 」, As long as we can make the right choice with the right logic , The algorithm works correctly .

_____________

my Online e-books Yes 100 Original articles , Hands with brushes 200 Daoli is the subject , Recommended collection ! Corresponding GitHub Algorithm Repository We've got it 70k star, Welcome to mark star !

版权声明
本文为[labuladong,]所创,转载请带上原文链接,感谢