language-icon Old Web
English
Sign In

Formal grammar

In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) is a set of production rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) is a set of production rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. Formal language theory, the discipline that studies formal grammars and languages, is a branch of applied mathematics. Its applications are found in theoretical computer science, theoretical linguistics, formal semantics, mathematical logic, and other areas. A formal grammar is a set of rules for rewriting strings, along with a 'start symbol' from which rewriting starts. Therefore, a grammar is usually thought of as a language generator. However, it can also sometimes be used as the basis for a 'recognizer'—a function in computing that determines whether a given string belongs to the language or is grammatically incorrect. To describe such recognizers, formal language theory uses separate formalisms, known as automata theory. One of the interesting results of automata theory is that it is not possible to design a recognizer for certain formal languages.Parsing is the process of recognizing an utterance (a string in natural languages) by breaking it down to a set of symbols and analyzing each one against the grammar of the language. Most languages have the meanings of their utterances structured according to their syntax—a practice known as compositional semantics. As a result, the first step to describing the meaning of an utterance in language is to break it down part by part and look at its analyzed form (known as its parse tree in computer science, and as its deep structure in generative grammar). Pāṇini's treatise Astadyayi gives formal production rules and definitions to describe the formal grammar of Sanskrit. There are different uses of 'form' and 'formalism', which have changed over time, depending on the fields the relevant author was in contact with. A historical overview of the concept is given in A grammar mainly consists of a set of rules for transforming strings. (If it only consisted of these rules, it would be a semi-Thue system.) To generate a string in the language, one begins with a string consisting of only a single start symbol. The production rules are then applied in any order, until a string that contains neither the start symbol nor designated nonterminal symbols is produced. A production rule is applied to a string by replacing one occurrence of the production rule's left-hand side in the string by that production rule's right-hand side (cf. the operation of the theoretical Turing machine). The language formed by the grammar consists of all distinct strings that can be generated in this manner. Any particular sequence of production rules on the start symbol yields a distinct string in the language. If there are essentially different ways of generating the same single string, the grammar is said to be ambiguous. For example, assume the alphabet consists of a and b, the start symbol is S, and we have the following production rules: then we start with S, and can choose a rule to apply to it. If we choose rule 1, we obtain the string aSb. If we then choose rule 1 again, we replace S with aSb and obtain the string aaSbb. If we now choose rule 2, we replace S with ba and obtain the string aababb, and are done. We can write this series of choices more briefly, using symbols: S ⇒ a S b ⇒ a a S b b ⇒ a a b a b b {displaystyle SRightarrow aSbRightarrow aaSbbRightarrow aababb} . The language of the grammar is then the infinite set { a n b a b n ∣ n ≥ 0 } = { b a , a b a b , a a b a b b , a a a b a b b b , … } {displaystyle {a^{n}bab^{n}mid ngeq 0}={ba,abab,aababb,aaababbb,dotsc }} , where a k {displaystyle a^{k}} is a {displaystyle a} repeated k {displaystyle k} times (and n {displaystyle n} in particular represents the number of times production rule 1 has been applied). In the classic formalization of generative grammars first proposed by Noam Chomsky in the 1950s, a grammar G consists of the following components: A grammar is formally defined as the tuple ( N , Σ , P , S ) {displaystyle (N,Sigma ,P,S)} . Such a formal grammar is often called a rewriting system or a phrase structure grammar in the literature.

[ "Grammar", "Rule-based machine translation", "Syntax", "Formal language" ]
Parent Topic
Child Topic
    No Parent Topic