Kolmogorov structure function

In 1973 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity.The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.To each constructive object corresponds a function Φ x ( k ) {displaystyle Phi _{x}(k)} of a natural number k—the log of minimal cardinality of x-containing sets that allow definitions of complexity at most k. If the element x itself allows a simple definition, then the function Φ {displaystyle Phi } drops to 0 even for small k. Lacking such definition, the element is 'random' in a negative sense. But it is positively 'probabilistically random' only when function Φ {displaystyle Phi } having taken the value Φ 0 {displaystyle Phi _{0}} at a relatively small k = k 0 {displaystyle k=k_{0}} , then changes approximately as Φ ( k ) = Φ 0 − ( k − k 0 ) {displaystyle Phi (k)=Phi _{0}-(k-k_{0})} . In 1973 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity.The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all stochastic properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data. The Kolmogorov structure function is used in the algorithmic information theory, also known as the theory of Kolmogorov complexity, for describing the structure of a string by use of models of increasing complexity. The structure function was originally proposed by Kolmogorov in 1973 at a Soviet Information Theory symposium in Tallinn, but these results were not published p. 182. But the results were announced in in 1974, the only written record by Kolmogorov himself. One of his last scientific statements is (translated from the original Russian by L.A. Levin): It is discussed in Cover and Thomas. It is extensively studied in Vereshchagin and Vitanyi where also the main properties are resolved.The Kolmogorov structure function can be written as where x {displaystyle x} is a binary string of length n {displaystyle n} with x ∈ S {displaystyle xin S} where S {displaystyle S} is a contemplated model (set of n-length strings) for x {displaystyle x} , K ( S ) {displaystyle K(S)} is the Kolmogorov complexity of S {displaystyle S} and α {displaystyle alpha } is a nonnegative integer value bounding the complexity of the contemplated S {displaystyle S} 's. Clearly, this function is nonincreasing and reaches log ⁡ | { x } | = 0 {displaystyle log |{x}|=0} for α = K ( x ) + c {displaystyle alpha =K(x)+c} where c {displaystyle c} is the required number of bits to change x {displaystyle x} into { x } {displaystyle {x}} and K ( x ) {displaystyle K(x)} is the Kolmogorov complexity of x {displaystyle x} . We define a set S {displaystyle S} containing x {displaystyle x} such that The function h x ( α ) {displaystyle h_{x}(alpha )} never decreases more than a fixed independent constant below the diagonal called sufficiency line L defined by It is approached to within a constant distance by the graph of h x {displaystyle h_{x}} for certain arguments (for instance, for α = K ( x ) + c {displaystyle alpha =K(x)+c} ). For these α {displaystyle alpha } 's we have α + h x ( α ) = K ( x ) + O ( 1 ) {displaystyle alpha +h_{x}(alpha )=K(x)+O(1)} and the associated model S {displaystyle S} (witness for h x ( α ) {displaystyle h_{x}(alpha )} ) is called an optimal set for x {displaystyle x} , and its description of K ( S ) ≤ α {displaystyle K(S)leq alpha } bits is therefore an algorithmic sufficient statistic. We write `algorithmic' for `Kolmogorov complexity' by convention. The main properties of an algorithmic sufficient statistic are the following: If S {displaystyle S} is an algorithmic sufficient statistic for x {displaystyle x} , then That is, the two-part description of x {displaystyle x} using the model S {displaystyle S} and as data-to-model code the index of x {displaystyle x} in the enumeration of S {displaystyle S} in log ⁡ | S | {displaystyle log |S|} bits, is as concise as the shortest one-part code of x {displaystyle x} in K ( x ) {displaystyle K(x)} bits. This can be easily seen as follows:

Parent Topic

Child Topic

No Parent Topic