Quantitative linguistics and complex system studies
1996
Abstract Linguistic discourses treated as maximum entropy systems of words according to prescriptions of algorithmic information theory (Kolmogorov, Chaitin, & Zurek) are shown to give a natural explanation of Zipf's law with quantitative rigor. The pattern of word frequencies in discourse naturally leads to a distinction between two classes of words: content words (c‐words) and service words (s‐words). A unified entropy model for the two classes of words leads to word frequency distribution functions in accordance with data. The model draws on principles of classical and quantum statistical mechanics and emphasises general principles of classifying, counting and optimising their related costs for coding of sequential symbols, under certain obvious constraints; hence it is likely to be valid for diverse complex systems of nature. Unlike other models of Zipf s law, which require exponential distribution of word lengths, entropy models based on words as primary symbols do not restrict the word length distri...
Keywords:
- Combinatorics
- Complex system
- Linguistics
- Quantitative linguistics
- Word lists by frequency
- Principle of maximum entropy
- Zipf's law
- Quantum statistical mechanics
- Exponential distribution
- Algorithmic information theory
- Mathematics
- Natural language processing
- Coding (social sciences)
- Artificial intelligence
- Complex system
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
53
References
39
Citations
NaN
KQI