language-icon Old Web
English
Sign In

CYK algorithm

In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars, named after its inventors, John Cocke, Daniel Younger and Tadao Kasami. It employs bottom-up parsing and dynamic programming.Allows to recover the most probable parse given the probabilities of all productions. In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars, named after its inventors, John Cocke, Daniel Younger and Tadao Kasami. It employs bottom-up parsing and dynamic programming. The standard version of CYK operates only on context-free grammars given in Chomsky normal form (CNF). However any context-free grammar may be transformed to a CNF grammar expressing the same language (Sipser 1997). The importance of the CYK algorithm stems from its high efficiency in certain situations. Using Big O notation, the worst case running time of CYK is O ( n 3 ⋅ | G | ) {displaystyle {mathcal {O}}left(n^{3}cdot left|G ight| ight)} , where n {displaystyle n} is the length of the parsed string and | G | {displaystyle left|G ight|} is the size of the CNF grammar G {displaystyle G} (Hopcroft & Ullman 1979, p. 140). This makes it one of the most efficient parsing algorithms in terms of worst-case asymptotic complexity, although other algorithms exist with better average running time in many practical scenarios. The dynamic programming algorithm requires the context-free grammar to be rendered into Chomsky normal form (CNF), because it tests for possibilities to split the current sequence in half. Any context-free grammar that does not generate the empty string can be represented in CNF using only production rules of the forms A → α {displaystyle A ightarrow alpha } and A → B C {displaystyle A ightarrow BC} . The algorithm in pseudocode is as follows: In informal terms, this algorithm considers every possible substring of the input string and sets P [ l , s , v ] {displaystyle P} to be true if the substring of length l {displaystyle l} starting from s {displaystyle s} can be generated from nonterminal variable R v {displaystyle R_{v}} . Once it has considered substrings of length 1, it goes on to substrings of length 2, and so on. For substrings of length 2 and greater, it considers every possible partition of the substring into two parts, and checks to see if there is some production P → Q R {displaystyle P o Q;R} such that Q {displaystyle Q} matches the first part and R {displaystyle R} matches the second part. If so, it records P {displaystyle P} as matching the whole substring. Once this process is completed, the sentence is recognized by the grammar if the substring containing the entire input string is matched by the start symbol.

[ "L-attributed grammar" ]
Parent Topic
Child Topic
    No Parent Topic