CYK algorithm

In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars, named after its inventors, John Cocke, Daniel Younger and Tadao Kasami. It employs bottom-up parsing and dynamic programming.Allows to recover the most probable parse given the probabilities of all productions. In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars, named after its inventors, John Cocke, Daniel Younger and Tadao Kasami. It employs bottom-up parsing and dynamic programming. The standard version of CYK operates only on context-free grammars given in Chomsky normal form (CNF). However any context-free grammar may be transformed to a CNF grammar expressing the same language (Sipser 1997). The importance of the CYK algorithm stems from its high efficiency in certain situations. Using Big O notation, the worst case running time of CYK is O ( n 3 ⋅ | G | ) {displaystyle {mathcal {O}}left(n^{3}cdot left|G ight| ight)} , where n {displaystyle n} is the length of the parsed string and | G | {displaystyle left|G ight|} is the size of the CNF grammar G {displaystyle G} (Hopcroft & Ullman 1979, p. 140). This makes it one of the most efficient parsing algorithms in terms of worst-case asymptotic complexity, although other algorithms exist with better average running time in many practical scenarios. The dynamic programming algorithm requires the context-free grammar to be rendered into Chomsky normal form (CNF), because it tests for possibilities to split the current sequence in half. Any context-free grammar that does not generate the empty string can be represented in CNF using only production rules of the forms A → α {displaystyle A ightarrow alpha } and A → B C {displaystyle A ightarrow BC} . The algorithm in pseudocode is as follows: In informal terms, this algorithm considers every possible substring of the input string and sets P [ l , s , v ] {displaystyle P} to be true if the substring of length l {displaystyle l} starting from s {displaystyle s} can be generated from nonterminal variable R v {displaystyle R_{v}} . Once it has considered substrings of length 1, it goes on to substrings of length 2, and so on. For substrings of length 2 and greater, it considers every possible partition of the substring into two parts, and checks to see if there is some production P → Q R {displaystyle P o Q;R} such that Q {displaystyle Q} matches the first part and R {displaystyle R} matches the second part. If so, it records P {displaystyle P} as matching the whole substring. Once this process is completed, the sentence is recognized by the grammar if the substring containing the entire input string is matched by the start symbol.

Parent Topic

Child Topic

No Parent Topic