CYK Algorithm¶
- This is a membership algorithm, i.e, it helps us to check if a given string belong to a particular grammar.
- It is a dynamic programming approach.
- Limitation of this algorithm is that it only works on CNF (Chomsky Normal Form).
- If we're given a string $w$ of length $|w|$ then then we create a half-table of $|w|\times |w|$ dimensions.
- When we're filling the cell $(u,v)$ of the parse table, we put in the value that derives the string $w[u:v]$.
- If the first cell of this parse table contains the starting symbol (say $S$) then the string $w$ belongs to the grammar.
Steps to calculate the values of a cell¶
Determine all the possible partitions (if any) of the index $(u,v)$. Example:
- $(1,3)$ can be divided into $(1,1).(2,3)$ and $(1,2).(3,3)$
- $(1,4)$ can be divided into $(1,1).(2,4)$, $(1,2).(3,4)$ and $(1,3).(4,4)$
Note: $.$ symbol indicates concatenation.
The value of the cell $(u,v)$ is the set of all the non-terminal symbols that derives any of the values resulting from the concatenation.
Function do display the parse table¶
In [1]:
from typing import List
from IPython.display import display, HTML
def displayTable(arr: List[List[int]]):
n = len(arr[0])
headers = f'''<thead><tr><td> </td>{''.join([f"<td>{n - i}</td>" for i in range(n)])}</tr></thead>'''
htmlData = "<tbody>"
for i in range(n):
data = f"<tr><td>{i + 1}</td>"
for cell in arr[i]:
t = cell
if cell == 0:
t = " "
if cell == '':
t = "$\phi$"
data += f"<td>{t}</td>"
data += "</tr>"
htmlData += data
htmlData += "</tbody>"
display(HTML(
f"<table>{headers}{htmlData}</table><hr/>"
))
Initializing the solution class and implementing the algorithm¶
In [2]:
from typing import List, Tuple, Set
class Parser:
def __init__(self, string_: str):
self.string = string_
self.parse_table = []
self.table_size = len(string_)
# initialize the parse-table
k = self.table_size
for i in range(self.table_size):
arr = []
for j in range(k):
arr.append(0)
k -= 1
self.parse_table.append(arr[:])
# function to map index to table
self.map_index = lambda x, y: (x - 1, self.table_size - y)
# setter function
def setValue(self, x_: int, y_: int, val_: str):
x, y = self.map_index(x_, y_)
self.parse_table[x][y] = val_
# getter function
def getValue(self, x_: int, y_: int) -> str:
x, y = self.map_index(x_, y_)
return self.parse_table[x][y]
# function to split an interval
def split_interval(self, i: int, j: int, verbose: bool) -> List[Tuple[int]]:
if verbose:
print(f"\nSplitting ({i}, {j})")
splits = []
for k in range(i, j):
split = ((i, k), (k + 1, j))
if verbose:
print(f"{split[0]} - {split[1]}")
splits.append(split)
return splits
def all_concatenations(self, set1: str, set2: str, verbose: bool) -> Set[str]:
symbol_set1 = set1.split(", ")
symbol_set2 = set2.split(", ")
result = set()
for x in symbol_set1:
for y in symbol_set2:
result.add(x + y)
if verbose:
print(f"Concatenating :{set1} and {set2} = {result}")
return result
def createValue(self, x_: int, y_: int, grammar, verbose: bool):
if x_ > y_ or not self.getValue(x_, y_) == 0:
return
if x_ == y_:
value = ""
t = self.string[x_ - 1]
for production in grammar.keys():
if t in grammar[production]:
value += f"{production}, "
self.setValue(x_, y_, value[0: -2])
return
splits = self.split_interval(x_, y_, verbose)
symbols = set()
for split in splits:
cell1, cell2 = split[0], split[1]
concatenation = self.all_concatenations(
self.getValue(cell1[0], cell1[1]),
self.getValue(cell2[0], cell2[1]), verbose
)
for production in grammar.keys():
intersection = set(grammar[production]) & concatenation
if len(intersection) > 0:
symbols.add(production)
if verbose:
print(f"{production} derives {intersection}")
self.setValue(x_, y_, ", ".join(list(symbols)))
def __call__(self, grammar, verbose = False) -> bool:
for k in range(self.table_size):
if verbose:
display(HTML(f"<h2>Step:- {k + 1}</h2>"))
for i in range(1, self.table_size + 1):
start, end = i, i + k
if start > self.table_size or end > self.table_size:
continue
self.createValue(start, end, grammar, verbose)
if verbose:
displayTable(self.parse_table)
# assuming LHS of the first production rule is the start symbol
return list(grammar.keys())[0] in self.parse_table[0][0]
In [3]:
grammar1 = {
"S": ["AB"],
"A": ["BB", "a"],
"B": ["AB", "b"]
}
string1 = "abbb"
grammar2 = {
"S": ["AB", "BC"],
"A": ["BA", "a"],
"B": ["CC", "b"],
"C": ["AB", "a"]
}
string2 = "baaba"
Driver code¶
In [4]:
parser = Parser(string1)
res = parser(grammar1, verbose = True)
print(f'''The word '{string1}' {("belongs" if res else "doesn't belong")} to the grammar.''')
Step:- 1
| 4 | 3 | 2 | 1 | |
| 1 | A | |||
| 2 | B | |||
| 3 | B | |||
| 4 | B |
Step:- 2
Splitting (1, 2)
(1, 1) - (2, 2)
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}
Splitting (2, 3)
(2, 2) - (3, 3)
Concatenating :B and B = {'BB'}
A derives {'BB'}
Splitting (3, 4)
(3, 3) - (4, 4)
Concatenating :B and B = {'BB'}
A derives {'BB'}
| 4 | 3 | 2 | 1 | |
| 1 | B, S | A | ||
| 2 | A | B | ||
| 3 | A | B | ||
| 4 | B |
Step:- 3
Splitting (1, 3)
(1, 1) - (2, 3)
(1, 2) - (3, 3)
Concatenating :A and A = {'AA'}
Concatenating :B, S and B = {'BB', 'SB'}
A derives {'BB'}
Splitting (2, 4)
(2, 2) - (3, 4)
(2, 3) - (4, 4)
Concatenating :B and A = {'BA'}
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}
| 4 | 3 | 2 | 1 | |
| 1 | A | B, S | A | |
| 2 | B, S | A | B | |
| 3 | A | B | ||
| 4 | B |
Step:- 4
Splitting (1, 4)
(1, 1) - (2, 4)
(1, 2) - (3, 4)
(1, 3) - (4, 4)
Concatenating :A and B, S = {'AB', 'AS'}
S derives {'AB'}
B derives {'AB'}
Concatenating :B, S and A = {'BA', 'SA'}
Concatenating :A and B = {'AB'}
S derives {'AB'}
B derives {'AB'}
| 4 | 3 | 2 | 1 | |
| 1 | B, S | A | B, S | A |
| 2 | B, S | A | B | |
| 3 | A | B | ||
| 4 | B |
The word 'abbb' belongs to the grammar.
In [5]:
parser = Parser(string2)
res = parser(grammar2, verbose = True)
print(f'''The word '{string2}' {("belongs" if res else "doesn't belong")} to the grammar.''')
Step:- 1
| 5 | 4 | 3 | 2 | 1 | |
| 1 | B | ||||
| 2 | A, C | ||||
| 3 | A, C | ||||
| 4 | B | ||||
| 5 | A, C |
Step:- 2
Splitting (1, 2)
(1, 1) - (2, 2)
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}
Splitting (2, 3)
(2, 2) - (3, 3)
Concatenating :A, C and A, C = {'AC', 'AA', 'CA', 'CC'}
B derives {'CC'}
Splitting (3, 4)
(3, 3) - (4, 4)
Concatenating :A, C and B = {'AB', 'CB'}
S derives {'AB'}
C derives {'AB'}
Splitting (4, 5)
(4, 4) - (5, 5)
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}
| 5 | 4 | 3 | 2 | 1 | |
| 1 | A, S | B | |||
| 2 | B | A, C | |||
| 3 | C, S | A, C | |||
| 4 | A, S | B | |||
| 5 | A, C |
Step:- 3
Splitting (1, 3)
(1, 1) - (2, 3)
(1, 2) - (3, 3)
Concatenating :B and B = {'BB'}
Concatenating :A, S and A, C = {'AC', 'AA', 'SA', 'SC'}
Splitting (2, 4)
(2, 2) - (3, 4)
(2, 3) - (4, 4)
Concatenating :A, C and C, S = {'AC', 'AS', 'CS', 'CC'}
B derives {'CC'}
Concatenating :B and B = {'BB'}
Splitting (3, 5)
(3, 3) - (4, 5)
(3, 4) - (5, 5)
Concatenating :A, C and A, S = {'CS', 'AS', 'AA', 'CA'}
Concatenating :C, S and A, C = {'SA', 'CC', 'CA', 'SC'}
B derives {'CC'}
| 5 | 4 | 3 | 2 | 1 | |
| 1 | $\phi$ | A, S | B | ||
| 2 | B | B | A, C | ||
| 3 | B | C, S | A, C | ||
| 4 | A, S | B | |||
| 5 | A, C |
Step:- 4
Splitting (1, 4)
(1, 1) - (2, 4)
(1, 2) - (3, 4)
(1, 3) - (4, 4)
Concatenating :B and B = {'BB'}
Concatenating :A, S and C, S = {'AC', 'AS', 'SC', 'SS'}
Concatenating : and B = {'B'}
Splitting (2, 5)
(2, 2) - (3, 5)
(2, 3) - (4, 5)
(2, 4) - (5, 5)
Concatenating :A, C and B = {'AB', 'CB'}
S derives {'AB'}
C derives {'AB'}
Concatenating :B and A, S = {'BA', 'BS'}
A derives {'BA'}
Concatenating :B and A, C = {'BA', 'BC'}
S derives {'BC'}
A derives {'BA'}
| 5 | 4 | 3 | 2 | 1 | |
| 1 | $\phi$ | $\phi$ | A, S | B | |
| 2 | A, C, S | B | B | A, C | |
| 3 | B | C, S | A, C | ||
| 4 | A, S | B | |||
| 5 | A, C |
Step:- 5
Splitting (1, 5)
(1, 1) - (2, 5)
(1, 2) - (3, 5)
(1, 3) - (4, 5)
(1, 4) - (5, 5)
Concatenating :B and A, C, S = {'BA', 'BS', 'BC'}
S derives {'BC'}
A derives {'BA'}
Concatenating :A, S and B = {'AB', 'SB'}
S derives {'AB'}
C derives {'AB'}
Concatenating : and A, S = {'A', 'S'}
Concatenating : and A, C = {'A', 'C'}
| 5 | 4 | 3 | 2 | 1 | |
| 1 | A, C, S | $\phi$ | $\phi$ | A, S | B |
| 2 | A, C, S | B | B | A, C | |
| 3 | B | C, S | A, C | ||
| 4 | A, S | B | |||
| 5 | A, C |
The word 'baaba' belongs to the grammar.