# Classical dynamic programming: longest common subsequence

2020-11-08 19:51:16

1143. Longest common subsequence

-----------

Longest common subsequence （Longest Common Subsequence, abbreviation LCS） Is a very classic interview topic , Because its solution is a typical two-dimensional dynamic programming , Most of the more difficult string problems are similar to this problem , For example, editing distance . and , This algorithm can be used to solve other problems with a little modification , So LCS The algorithm is worth learning .

The title is to let us find two strings of LCS length ：

`````` Input : str1 = "abcde", str2 = "ace"
Output : 3
explain :  The longest common subsequence is  "ace", Its length is  3
``````

There must be readers who will ask , Why is this problem solved by dynamic programming ？ Because of the type of subsequence , It's not easy to enumerate all possible outcomes , And the dynamic programming algorithm is exhaustive + prune , They're made for each other . So it can be said that as long as the subsequence problem is involved , Nine times out of ten, dynamic planning is needed to solve , It's right to think about this .

Now let's analyze it by hand , How to solve this problem with dynamic programming techniques .

### One 、 Dynamic planning ideas

First step , Be clear `dp` The meaning of array . For the dynamic programming of two strings , The routine is universal .

For example, for Strings `s1` and `s2`, Generally speaking, it is necessary to construct such a DP table： For the convenience of understanding this table , For the time being, we think the index is from 1 At the beginning , Just make a little adjustment in the code later . among ,`dp[i][j]` The meaning is ： about `s1[1..i]` and `s2[1..j]`, Their LCS The length is `dp[i][j]`.

For example, in the example above ,d That means ： about `"ac"` and `"babc"`, Their LCS The length is 2. The final answer we want to get is `dp`.

The second step , Definition base case.

We've got the index to be 0 Rows and empty columns ,`dp[..]` and `dp[..]` Should be initialized to 0, This is it. base case.

for instance , According to just now dp Definition of array ,`dp=0` The meaning is ： For strings `""` and `"bab"`, Its LCS The length of is 0. Because a string is empty , The length of their longest common subsequence should obviously be 0.

The third step , Find the state transition equation .

This is the most difficult step in dynamic planning , Fortunately, this string problem has the same pattern , Let's talk about the way to deal with this kind of problem .

State transition is simply about making choices , Let's say this question , Is to seek `s1` and `s2` The longest common subsequence of , Let's call this subsequence `lcs`. So for `s1` and `s2` Each character in , What's the choice ？ It's simple , Two options , Either in `lcs` in , Or not . This 「 stay 」 and 「 be not in 」 Is to choose , The key is , How to choose ？ This needs a little brain work ： If a character should be in `lcs` in , So this character must exist at the same time `s1` and `s2` in , because `lcs` It's the longest public Subsequences . So the idea of this question is like this ：

With two Pointers `i` and `j` Go back and forth `s1` and `s2`, If `s1[i]==s2[j]`, So this character It must be `lcs` in ; Otherwise ,`s1[i]` and `s2[j]` These two characters At least one of them is not here `lcs` in , One needs to be discarded . Let's take a look at recursion , Easier to understand ：

``````def longestCommonSubsequence(str1, str2) -> int:
def dp(i, j):
#  Empty string of  base case
if i == -1 or j == -1:
return 0
if str1[i] == str2[j]:
#  I found one here  lcs  The elements of , Keep looking for
return dp(i - 1, j - 1) + 1
else:
#  Who can make  lcs  The longest , Just listen to who
return max(dp(i-1, j), dp(i, j-1))

# i  and  j  Initialize to the last index
return dp(len(str1)-1, len(str2)-1)
``````

For the first case , Find one `lcs` The characters in , At the same time `i` `j` Move one bit forward , And give `lcs` Add one... To the length of ; For the latter , Then try two situations , Take a bigger result .

PS： I've seriously written about 100 Multiple original articles , Hand brush 200 Daoli is the subject , All published in labuladong A copy of the algorithm , Continuous updating . Recommended collection , Write the title in the order of my article , Master all kinds of algorithm set, then put into the sea of questions, like fish .

In fact, this code is a brute force solution , We can go through memos or DP table To optimize time complexity , For example, as described above DP table To solve ：

``````def longestCommonSubsequence(str1, str2) -> int:
m, n = len(str1), len(str2)
#  structure  DP table  and  base case
dp = [ * (n + 1) for _ in range(m + 1)]
#  Make a state transition
for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i - 1] == str2[j - 1]:
#  Find one  lcs  The characters in
dp[i][j] = 1 + dp[i-1][j-1]
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])

return dp[-1][-1]
``````

### Two 、 Troubleshooting

about `s1[i]` and `s2[j]` Unequal situation , At least one The character is not in `lcs` in , Is it possible that neither character is there ？ For example, the following situation ： So whether the code should consider this situation , To such ：

``````if str1[i - 1] == str2[j - 1]:
# ...
else:
dp[i][j] = max(dp[i-1][j],
dp[i][j-1],
dp[i-1][j-1])
``````

I had this suspicion at the beginning , In fact, it can be changed in this way , You can also get the right answer , But it's unnecessary , because `dp[i-1][j-1]` Always the smallest of the three ,max It's impossible to get it at all .

The reason is that we are very concerned about dp Definition of array ： about `s1[1..i]` and `s2[1..j]`, Their LCS The length is `dp[i][j]`. Look at it like this , obviously `dp[i-1][j-1]` Corresponding `lcs` The length cannot be larger than the first two cases , So there's no need to participate in the comparison .

### 3、 ... and 、 summary

For the dynamic programming of two strings , Generally speaking, it is defined as in this article DP table, Because there's a benefit to this definition , It's easy to write state transition equations ,`dp[i][j]` The state of can be derived from the previous state ： The way to find the state transition equation is , Think about what each state has 「 choice 」, As long as we can make the right choice with the right logic , The algorithm works correctly .

＿＿＿＿＿＿＿＿＿＿＿＿＿

my Online e-books Yes 100 Original articles , Hands with brushes 200 Daoli is the subject , Recommended collection ！ Corresponding GitHub Algorithm Repository We've got it 70k star, Welcome to mark star ！