LCS and SCS¶
- LCS stands for Longest Common Subsequence and SCS stands for Shortest Common Supersequence of 2 strings.
- It's similar to LCM and HCF of 2 numbers.
Likea * b = LCM(a,b) * HCF(a,b),|str1| + |str2| = |LCS(str1, str2)| + |SCS(str1, str2)|.
Practical usescases¶
DNA / Genome Sequencing¶
- LCS helps compare two DNA sequences (A, C, G, T).
- Used to find similarities between genes, detect mutations, and measure evolutionary distance.
File Comparison Tools (diff utilities)¶
- Tools like
git diff,Linux diff, and version control systems internally use LCS. - Helps identify:
- Lines added
- Lines removed
- Lines unchanged
- LCS tells the largest set of lines that remain the same between two file versions.
Plagiarism Detection¶
- Used to measure similarity between two documents or code files.
- LCS finds longest matching sequences of tokens, characters, or words.
- If two submissions share long subsequences → high similarity score.
Spell Checkers & Smart Text Correction¶
- When comparing a typed word with dictionary words:
- LCS helps judge how similar the strings are.
- Used as part of edit-distance or correction ranking.
Algorithm to generate DP table¶
- Let
string1have lengthnandstring2have lengthm. We create a DP tablememoof size(m + 1) × (n + 1). - Each cell
memo[i][j]represents the solution to the subproblem: what is the LCS length of the firsticharacters ofstring2and the firstjcharacters ofstring1? - Initialize the entire first row and first column to 0. If either string length is 0, then
|LCS| = 0. - Now fill the table row by row (top $\rightarrow$ bottom) and column by column (left $\rightarrow$ right).
- For each cell
(i, j):- If the characters match:
string2[i-1] == string1[j-1],
then the LCS extends by 1, and we move diagonally to the smaller subproblem:memo[i][j] = 1 + memo[i-1][j-1]. - If the characters do not match, we choose the better of the two possibilities:
- Skip the last character of
string2$\rightarrow$ move up(i - 1, j). - Skip the last character of
string1$\rightarrow$ move left(i, j - 1).
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) - Skip the last character of
- If the characters match:
- Continue filling the table until reaching the last cell. The length of LCS of both
stringandstring2is located at the last cell.
In [1]:
class Solution:
def __init__(self, text1: str, text2: str):
self.memo = []
self.resultLCS = ""
self.resultSCS = ""
self.string1, self.string2 = text1, text2
def build_dp(self):
n = len(self.string1)
m = len(self.string2)
# row: string2, col: string1
self.memo = [[0]*(n+1) for _ in range(m+1)]
for i in range(1, m+1): # string2
for j in range(1, n+1): # string1
if self.string2[i-1] == self.string1[j-1]:
self.memo[i][j] = 1 + self.memo[i-1][j-1]
else:
self.memo[i][j] = max(self.memo[i-1][j], self.memo[i][j-1])
return self.memo
Algorithm to generate LCS¶
- Start from the bottom-right corner of the LCS memo table
(rowIndex, columnIndex). - If either index becomes negative, stop the recursion (base case).
- If the characters at the current indices match:
- Add this character to
self.result. - Move diagonally to
(rowIndex - 1, columnIndex - 1)— because a match means this character is part of the LCS.
- Add this character to
- If the characters do not match
- Compare:
- the value above:
memo[rowIndex - 1][columnIndex] - the value to the left:
memo[rowIndex][columnIndex - 1]
- the value above:
- If the value above is larger, move upward $\rightarrow$
(rowIndex - 1, columnIndex) - Otherwise, move left $\rightarrow$
(rowIndex, columnIndex - 1) - This moves in the direction from which the optimal LCS length came.
- Compare:
- Repeat the process until either indices reduces to 0.
In [2]:
def generateLCS(self):
# if result already calculated, return it ensuring idempotency
if len(self.resultLCS) > 0:
return self.resultLCS
i = len(self.string2) # rows
j = len(self.string1) # columns
while i > 0 and j > 0:
if self.string2[i-1] == self.string1[j-1]:
# MATCH: include this char
self.resultLCS = self.string1[j-1] + self.resultLCS
i -= 1; j -= 1
else:
# Move in the direction of bigger value
if self.memo[i-1][j] > self.memo[i][j-1]:
i -= 1 # move UP
else:
j -= 1 # move LEFT
return self.resultLCS
Algorithm to generate SCS¶
Start from the bottom-right corner of the LCS DP table
(rowIndex, columnIndex)— these represent the lengths of the two strings.If either index becomes 0, append the remaining characters of the other string directly to the SCS (because one string is already exhausted).
If the characters at the current indices match (
string1[columnIndex - 1] == string2[rowIndex - 1]):- Add this character to
self.resultSCS. - Move diagonally to
(rowIndex - 1, columnIndex - 1)— because this character is part of both strings and only appears once in SCS.
- Add this character to
- If the characters do not match, compare the values used to construct the LCS:
Check:
- the value above →
memo[rowIndex - 1][columnIndex] - the value to the left →
memo[rowIndex][columnIndex - 1]
- the value above →
If the value above is greater, move up (towards the better LCS path):
- Add
string2[rowIndex - 1]to SCS - Move to
(rowIndex - 1, columnIndex)
- Add
Otherwise (left is greater or equal), move left:
- Add
string1[columnIndex - 1]to SCS - Move to
(rowIndex, columnIndex - 1)
- Add
This ensures we always take the character from the direction that leads to the shortest possible supersequence while still allowing reconstruction of the full LCS.
- Continue this process until both indices reach 0.
In [3]:
def generateSCS(self):
# if result already calculated, return it ensuring idempotency
if len(self.resultSCS) > 0:
return self.resultSCS
i = len(self.string2)
j = len(self.string1)
while i > 0 and j > 0:
if self.string2[i-1] == self.string1[j-1]:
# MATCH: include once
self.resultSCS = self.string1[j-1] + self.resultSCS
i -= 1; j -= 1
else:
# choose direction of larger dp value
if self.memo[i-1][j] > self.memo[i][j-1]:
self.resultSCS = self.string2[i-1] + self.resultSCS # from string2
i -= 1
else:
self.resultSCS = self.string1[j-1] + self.resultSCS # from string1
j -= 1
# If any one string still left, append all remaining chars
while i > 0:
self.resultSCS = self.string2[i-1] + self.resultSCS
i -= 1
while j > 0:
self.resultSCS = self.string1[j-1] + self.resultSCS
j -= 1
return self.resultSCS
Binding the functions to the Solution class¶
In [4]:
Solution.generateSCS = generateSCS
Solution.generateLCS = generateLCS
Driver code¶
In [5]:
obj = Solution("AGGTAB", "GXTXAYB")
obj.build_dp()
lcs = obj.generateLCS()
scs = obj.generateSCS()
lcs, scs
Out[5]:
('GTAB', 'AGXGTXAYB')
Verification of the result¶
In [6]:
l1 = len(obj.string1) + len(obj.string2)
l2 = len(lcs) + len(scs)
l1, l2
Out[6]:
(13, 13)
Visualizing the DP table¶
In [7]:
from IPython.display import HTML, display
# Build HTML table
html = "<table border='1' style='border-collapse: collapse; text-align: center;'>"
# Header row
html += "<tr><th></th><th></th>"
for ch in obj.string1:
html += f"<th>{ch}</th>"
html += "</tr>"
# DP rows
for i, row in enumerate(obj.memo):
# First column header (string2 chars)
if i == 0:
html += "<tr><th></th>"
else:
html += f"<tr><th>{obj.string2[i-1]}</th>"
# Data cells
for val in row:
html += f"<td>{val}</td>"
html += "</tr>"
html += "</table>"
display(HTML(html))
| A | G | G | T | A | B | ||
|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| G | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
| X | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
| T | 0 | 0 | 1 | 1 | 2 | 2 | 2 |
| X | 0 | 0 | 1 | 1 | 2 | 2 | 2 |
| A | 0 | 1 | 1 | 1 | 2 | 3 | 3 |
| Y | 0 | 1 | 1 | 1 | 2 | 3 | 3 |
| B | 0 | 1 | 1 | 1 | 2 | 3 | 4 |