LCS and SCS¶

  • LCS stands for Longest Common Subsequence and SCS stands for Shortest Common Supersequence of 2 strings.
  • It's similar to LCM and HCF of 2 numbers.
    Like a * b = LCM(a,b) * HCF(a,b), |str1| + |str2| = |LCS(str1, str2)| + |SCS(str1, str2)|.

Practical usescases¶

DNA / Genome Sequencing¶

  • LCS helps compare two DNA sequences (A, C, G, T).
  • Used to find similarities between genes, detect mutations, and measure evolutionary distance.

File Comparison Tools (diff utilities)¶

  • Tools like git diff, Linux diff, and version control systems internally use LCS.
  • Helps identify:
    • Lines added
    • Lines removed
    • Lines unchanged
  • LCS tells the largest set of lines that remain the same between two file versions.

Plagiarism Detection¶

  • Used to measure similarity between two documents or code files.
  • LCS finds longest matching sequences of tokens, characters, or words.
  • If two submissions share long subsequences → high similarity score.

Spell Checkers & Smart Text Correction¶

  • When comparing a typed word with dictionary words:
  • LCS helps judge how similar the strings are.
  • Used as part of edit-distance or correction ranking.

Algorithm to generate DP table¶

  • Let string1 have length n and string2 have length m. We create a DP table memo of size (m + 1) × (n + 1).
  • Each cell memo[i][j] represents the solution to the subproblem: what is the LCS length of the first i characters of string2 and the first j characters of string1?
  • Initialize the entire first row and first column to 0. If either string length is 0, then |LCS| = 0.
  • Now fill the table row by row (top $\rightarrow$ bottom) and column by column (left $\rightarrow$ right).
  • For each cell (i, j):
    • If the characters match: string2[i-1] == string1[j-1],
      then the LCS extends by 1, and we move diagonally to the smaller subproblem: memo[i][j] = 1 + memo[i-1][j-1].
    • If the characters do not match, we choose the better of the two possibilities:
      • Skip the last character of string2 $\rightarrow$ move up (i - 1, j).
      • Skip the last character of string1 $\rightarrow$ move left (i, j - 1).
         dp[i][j] = max(dp[i-1][j], dp[i][j-1])
      
  • Continue filling the table until reaching the last cell. The length of LCS of both string and string2 is located at the last cell.

In [1]:
class Solution:
    def __init__(self, text1: str, text2: str):
        self.memo = []
        self.resultLCS = ""
        self.resultSCS = ""
        self.string1, self.string2 = text1, text2

    def build_dp(self):
        n = len(self.string1)
        m = len(self.string2)
    
        # row: string2, col: string1
        self.memo = [[0]*(n+1) for _ in range(m+1)]
    
        for i in range(1, m+1):       # string2
            for j in range(1, n+1):   # string1
                if self.string2[i-1] == self.string1[j-1]:
                    self.memo[i][j] = 1 + self.memo[i-1][j-1]
                else:
                    self.memo[i][j] = max(self.memo[i-1][j], self.memo[i][j-1])
    
        return self.memo

Algorithm to generate LCS¶

  • Start from the bottom-right corner of the LCS memo table (rowIndex, columnIndex).
  • If either index becomes negative, stop the recursion (base case).
  • If the characters at the current indices match:
    • Add this character to self.result.
    • Move diagonally to (rowIndex - 1, columnIndex - 1) — because a match means this character is part of the LCS.
  • If the characters do not match
    • Compare:
      • the value above: memo[rowIndex - 1][columnIndex]
      • the value to the left: memo[rowIndex][columnIndex - 1]
    • If the value above is larger, move upward $\rightarrow$ (rowIndex - 1, columnIndex)
    • Otherwise, move left $\rightarrow$ (rowIndex, columnIndex - 1)
    • This moves in the direction from which the optimal LCS length came.
  • Repeat the process until either indices reduces to 0.

In [2]:
def generateLCS(self):
    # if result already calculated, return it ensuring idempotency
    if len(self.resultLCS) > 0:
        return self.resultLCS
    
    i = len(self.string2)   # rows
    j = len(self.string1)   # columns
    
    while i > 0 and j > 0:
        if self.string2[i-1] == self.string1[j-1]:
            # MATCH: include this char
            self.resultLCS = self.string1[j-1] + self.resultLCS
            i -= 1; j -= 1
        else:
            # Move in the direction of bigger value
            if self.memo[i-1][j] > self.memo[i][j-1]:
                i -= 1     # move UP
            else:
                j -= 1     # move LEFT

    return self.resultLCS

Algorithm to generate SCS¶

  • Start from the bottom-right corner of the LCS DP table (rowIndex, columnIndex) — these represent the lengths of the two strings.

  • If either index becomes 0, append the remaining characters of the other string directly to the SCS (because one string is already exhausted).


  • If the characters at the current indices match (string1[columnIndex - 1] == string2[rowIndex - 1]):

    • Add this character to self.resultSCS.
    • Move diagonally to (rowIndex - 1, columnIndex - 1) — because this character is part of both strings and only appears once in SCS.

  • If the characters do not match, compare the values used to construct the LCS:
    • Check:

      • the value above → memo[rowIndex - 1][columnIndex]
      • the value to the left → memo[rowIndex][columnIndex - 1]
    • If the value above is greater, move up (towards the better LCS path):

      • Add string2[rowIndex - 1] to SCS
      • Move to (rowIndex - 1, columnIndex)
    • Otherwise (left is greater or equal), move left:

      • Add string1[columnIndex - 1] to SCS
      • Move to (rowIndex, columnIndex - 1)
    • This ensures we always take the character from the direction that leads to the shortest possible supersequence while still allowing reconstruction of the full LCS.


  • Continue this process until both indices reach 0.

In [3]:
def generateSCS(self):
    # if result already calculated, return it ensuring idempotency
    if len(self.resultSCS) > 0:
        return self.resultSCS
    
    i = len(self.string2)
    j = len(self.string1)

    while i > 0 and j > 0:
        if self.string2[i-1] == self.string1[j-1]:
            # MATCH: include once
            self.resultSCS = self.string1[j-1] + self.resultSCS
            i -= 1; j -= 1
        else:
            # choose direction of larger dp value
            if self.memo[i-1][j] > self.memo[i][j-1]:
                self.resultSCS = self.string2[i-1] + self.resultSCS   # from string2
                i -= 1
            else:
                self.resultSCS = self.string1[j-1] + self.resultSCS   # from string1
                j -= 1

    # If any one string still left, append all remaining chars
    while i > 0:
        self.resultSCS = self.string2[i-1] + self.resultSCS
        i -= 1

    while j > 0:
        self.resultSCS = self.string1[j-1] + self.resultSCS
        j -= 1

    return self.resultSCS

Binding the functions to the Solution class¶

In [4]:
Solution.generateSCS = generateSCS
Solution.generateLCS = generateLCS

Driver code¶

In [5]:
obj = Solution("AGGTAB", "GXTXAYB")
obj.build_dp()
lcs = obj.generateLCS()
scs = obj.generateSCS()

lcs, scs
Out[5]:
('GTAB', 'AGXGTXAYB')

Verification of the result¶

In [6]:
l1 = len(obj.string1) + len(obj.string2)
l2 = len(lcs) + len(scs)

l1, l2
Out[6]:
(13, 13)

Visualizing the DP table¶

In [7]:
from IPython.display import HTML, display

# Build HTML table
html = "<table border='1' style='border-collapse: collapse; text-align: center;'>"

# Header row
html += "<tr><th></th><th></th>"
for ch in obj.string1:
    html += f"<th>{ch}</th>"
html += "</tr>"

# DP rows
for i, row in enumerate(obj.memo):
    # First column header (string2 chars)
    if i == 0:
        html += "<tr><th></th>"
    else:
        html += f"<tr><th>{obj.string2[i-1]}</th>"
    # Data cells
    for val in row:
        html += f"<td>{val}</td>"
    html += "</tr>"

html += "</table>"

display(HTML(html))
AGGTAB
0000000
G0011111
X0011111
T0011222
X0011222
A0111233
Y0111233
B0111234