CYK Algorithm¶

  • This is a membership algorithm, i.e, it helps us to check if a given string belong to a particular grammar.
  • It is a dynamic programming approach.
  • Limitation of this algorithm is that it only works on CNF (Chomsky Normal Form).
  • If we're given a string $w$ of length $|w|$ then then we create a half-table of $|w|\times |w|$ dimensions.
  • When we're filling the cell $(u,v)$ of the parse table, we put in the value that derives the string $w[u:v]$.
  • If the first cell of this parse table contains the starting symbol (say $S$) then the string $w$ belongs to the grammar.

Steps to calculate the values of a cell¶

  • Determine all the possible partitions (if any) of the index $(u,v)$. Example:

    • $(1,3)$ can be divided into $(1,1).(2,3)$ and $(1,2).(3,3)$
    • $(1,4)$ can be divided into $(1,1).(2,4)$, $(1,2).(3,4)$ and $(1,3).(4,4)$

    Note: $.$ symbol indicates concatenation.

  • The value of the cell $(u,v)$ is the set of all the non-terminal symbols that derives any of the values resulting from the concatenation.


Function do display the parse table¶

In [1]:
from typing import List
from IPython.display import display, HTML

def displayTable(arr: List[List[int]]):
    n = len(arr[0])
    headers = f'''<thead><tr><td>&nbsp;</td>{''.join([f"<td>{n - i}</td>" for i in range(n)])}</tr></thead>'''
    htmlData = "<tbody>"
    for i in range(n):
        data = f"<tr><td>{i + 1}</td>"
        for cell in arr[i]:
            t = cell
            if cell == 0:
                t = "&nbsp;"
            if cell == '':
                t = "$\phi$"
            data += f"<td>{t}</td>"
        data += "</tr>"
        htmlData += data
    htmlData += "</tbody>"
    display(HTML(
        f"<table>{headers}{htmlData}</table><hr/>"
    ))

Initializing the solution class and implementing the algorithm¶

In [2]:
from typing import List, Tuple, Set

class Parser:
    def __init__(self, string_: str):
        self.string = string_
        self.parse_table = []
        self.table_size = len(string_)

        # initialize the parse-table
        k = self.table_size
        for i in range(self.table_size):
            arr = []
            for j in range(k):
                arr.append(0)
            k -= 1
            self.parse_table.append(arr[:])

        # function to map index to table
        self.map_index = lambda x, y: (x - 1, self.table_size - y)

    # setter function
    def setValue(self, x_: int, y_: int, val_: str):
        x, y = self.map_index(x_, y_)
        self.parse_table[x][y] = val_

    # getter function
    def getValue(self, x_: int, y_: int) -> str:
        x, y = self.map_index(x_, y_)
        return self.parse_table[x][y]

    # function to split an interval
    def split_interval(self, i: int, j: int, verbose: bool) -> List[Tuple[int]]:
        if verbose:
            print(f"\nSplitting ({i}, {j})")
        splits = []
        for k in range(i, j):
            split = ((i, k), (k + 1, j))
            if verbose:
                print(f"{split[0]} - {split[1]}")
            splits.append(split)
        return splits

    def all_concatenations(self, set1: str, set2: str, verbose: bool) -> Set[str]:
        symbol_set1 = set1.split(", ")
        symbol_set2 = set2.split(", ")
        result = set()
        for x in symbol_set1:
            for y in symbol_set2:
                result.add(x + y)
        if verbose:
            print(f"Concatenating :{set1} and {set2} = {result}")
        return result

    def createValue(self, x_: int, y_: int, grammar, verbose: bool):
        if x_ > y_ or not self.getValue(x_, y_) == 0:
            return

        if x_ == y_:
            value = ""
            t = self.string[x_ - 1]
            for production in grammar.keys():
                if t in grammar[production]:
                    value += f"{production}, "
            self.setValue(x_, y_, value[0: -2])
            return

        splits = self.split_interval(x_, y_, verbose)
        symbols = set()
        for split in splits:
            cell1, cell2 = split[0], split[1]
            concatenation = self.all_concatenations(
                self.getValue(cell1[0], cell1[1]),
                self.getValue(cell2[0], cell2[1]), verbose
            )
            for production in grammar.keys():
                intersection = set(grammar[production]) & concatenation
                if len(intersection) > 0:
                    symbols.add(production)
                    if verbose:
                        print(f"{production} derives {intersection}")
        self.setValue(x_, y_, ", ".join(list(symbols)))

    def __call__(self, grammar, verbose = False) -> bool:
        for k in range(self.table_size):
            if verbose:
                display(HTML(f"<h2>Step:- {k + 1}</h2>"))
            for i in range(1, self.table_size + 1):
                start, end = i, i + k
                if start > self.table_size or end > self.table_size:
                    continue
                self.createValue(start, end, grammar, verbose)
            if verbose:
                displayTable(self.parse_table)

        # assuming LHS of the first production rule is the start symbol
        return list(grammar.keys())[0] in self.parse_table[0][0]

Grammar and strings to test the algorithm¶

Grammar 1¶

$S\rightarrow AB$
$A\rightarrow BB / a$
$B\rightarrow AB / b$

String: $abbb$

Grammar 2¶

$S\rightarrow AB / BC$
$A\rightarrow BA/a$
$B\rightarrow CC/b$
$C \rightarrow AB/a$

String: $baaba$


In [3]:
grammar1 = {
    "S": ["AB"],
    "A": ["BB", "a"],
    "B": ["AB", "b"]
}
string1 = "abbb"

grammar2 = {
    "S": ["AB", "BC"],
    "A": ["BA", "a"],
    "B": ["CC", "b"],
    "C": ["AB", "a"]
}
string2 = "baaba"

Driver code¶

In [4]:
parser = Parser(string1)
res = parser(grammar1, verbose = True)
print(f'''The word '{string1}' {("belongs" if res else "doesn't belong")} to the grammar.''')

Step:- 1

 4321
1   A
2  B
3 B
4B

Step:- 2

Splitting (1, 2)
(1, 1) - (2, 2)
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}

Splitting (2, 3)
(2, 2) - (3, 3)
Concatenating :B and B = {'BB'}
A derives {'BB'}

Splitting (3, 4)
(3, 3) - (4, 4)
Concatenating :B and B = {'BB'}
A derives {'BB'}
 4321
1  B, SA
2 AB
3AB
4B

Step:- 3

Splitting (1, 3)
(1, 1) - (2, 3)
(1, 2) - (3, 3)
Concatenating :A and A = {'AA'}
Concatenating :B, S and B = {'BB', 'SB'}
A derives {'BB'}

Splitting (2, 4)
(2, 2) - (3, 4)
(2, 3) - (4, 4)
Concatenating :B and A = {'BA'}
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}
 4321
1 AB, SA
2B, SAB
3AB
4B

Step:- 4

Splitting (1, 4)
(1, 1) - (2, 4)
(1, 2) - (3, 4)
(1, 3) - (4, 4)
Concatenating :A and B, S = {'AB', 'AS'}
S derives {'AB'}
B derives {'AB'}
Concatenating :B, S and A = {'BA', 'SA'}
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}
 4321
1B, SAB, SA
2B, SAB
3AB
4B

The word 'abbb' belongs to the grammar.
In [5]:
parser = Parser(string2)
res = parser(grammar2, verbose = True)
print(f'''The word '{string2}' {("belongs" if res else "doesn't belong")} to the grammar.''')

Step:- 1

 54321
1    B
2   A, C
3  A, C
4 B
5A, C

Step:- 2

Splitting (1, 2)
(1, 1) - (2, 2)
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}

Splitting (2, 3)
(2, 2) - (3, 3)
Concatenating :A, C and A, C = {'AC', 'AA', 'CA', 'CC'}
B derives {'CC'}

Splitting (3, 4)
(3, 3) - (4, 4)
Concatenating :A, C and B = {'AB', 'CB'}
S derives {'AB'}
C derives {'AB'}

Splitting (4, 5)
(4, 4) - (5, 5)
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}
 54321
1   A, SB
2  BA, C
3 C, SA, C
4A, SB
5A, C

Step:- 3

Splitting (1, 3)
(1, 1) - (2, 3)
(1, 2) - (3, 3)
Concatenating :B and B = {'BB'}
Concatenating :A, S and A, C = {'AC', 'AA', 'SA', 'SC'}

Splitting (2, 4)
(2, 2) - (3, 4)
(2, 3) - (4, 4)
Concatenating :A, C and C, S = {'AC', 'AS', 'CS', 'CC'}
B derives {'CC'}
Concatenating :B and B = {'BB'}

Splitting (3, 5)
(3, 3) - (4, 5)
(3, 4) - (5, 5)
Concatenating :A, C and A, S = {'CS', 'AS', 'AA', 'CA'}
Concatenating :C, S and A, C = {'SA', 'CC', 'CA', 'SC'}
B derives {'CC'}
 54321
1  $\phi$A, SB
2 BBA, C
3BC, SA, C
4A, SB
5A, C

Step:- 4

Splitting (1, 4)
(1, 1) - (2, 4)
(1, 2) - (3, 4)
(1, 3) - (4, 4)
Concatenating :B and B = {'BB'}
Concatenating :A, S and C, S = {'AC', 'AS', 'SC', 'SS'}
Concatenating : and B = {'B'}

Splitting (2, 5)
(2, 2) - (3, 5)
(2, 3) - (4, 5)
(2, 4) - (5, 5)
Concatenating :A, C and B = {'AB', 'CB'}
S derives {'AB'}
C derives {'AB'}
Concatenating :B and A, S = {'BA', 'BS'}
A derives {'BA'}
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}
 54321
1 $\phi$$\phi$A, SB
2A, C, SBBA, C
3BC, SA, C
4A, SB
5A, C

Step:- 5

Splitting (1, 5)
(1, 1) - (2, 5)
(1, 2) - (3, 5)
(1, 3) - (4, 5)
(1, 4) - (5, 5)
Concatenating :B and A, C, S = {'BA', 'BS', 'BC'}
S derives {'BC'}
A derives {'BA'}
Concatenating :A, S and B = {'AB', 'SB'}
S derives {'AB'}
C derives {'AB'}
Concatenating : and A, S = {'A', 'S'}
Concatenating : and A, C = {'A', 'C'}
 54321
1A, C, S$\phi$$\phi$A, SB
2A, C, SBBA, C
3BC, SA, C
4A, SB
5A, C

The word 'baaba' belongs to the grammar.