logo
    Intrinsic Losses Based on Information Geometry and Their Applications
    5
    Citation
    38
    Reference
    10
    Related Paper
    Citation Trend
    Abstract:
    One main interest of information geometry is to study the properties of statistical models that do not depend on the coordinate systems or model parametrization; thus, it may serve as an analytic tool for intrinsic inference in statistics. In this paper, under the framework of Riemannian geometry and dual geometry, we revisit two commonly-used intrinsic losses which are respectively given by the squared Rao distance and the symmetrized Kullback–Leibler divergence (or Jeffreys divergence). For an exponential family endowed with the Fisher metric and α -connections, the two loss functions are uniformly described as the energy difference along an α -geodesic path, for some α ∈ { − 1 , 0 , 1 } . Subsequently, the two intrinsic losses are utilized to develop Bayesian analyses of covariance matrix estimation and range-spread target detection. We provide an intrinsically unbiased covariance estimator, which is verified to be asymptotically efficient in terms of the intrinsic mean square error. The decision rules deduced by the intrinsic Bayesian criterion provide a geometrical justification for the constant false alarm rate detector based on generalized likelihood ratio principle.
    Keywords:
    Fisher information
    Kullback–Leibler divergence
    Minimum description length
    Intrinsic dimension
    Divergence (linguistics)
    This chapter discusses the theoretical detail to show modern arguments for the choice of relative entropy as difference measure on distributions. It shows a fundamentally derived variant of Expectation Maximization, referred to as "em". The application of differential geometry methods to the study of statistical models traces back 1945, when it was noted that families of probability distributions could be described by a manifold and that the Fisher information matrix might be taken as a metric on that manifold. The chapter argues that the "simplest" divergence, the Kullback–Leibler divergence, is selected when maximizing log likelihood during learning, and that this is, fundamentally, because of the shortest path, or projection theorem. The information geometry methods described for families of probability distributions can just as easily be applied to neural networks; where the parameters are the connection weights.
    Kullback–Leibler divergence
    Fisher information
    Statistical manifold
    Manifold (fluid mechanics)
    Differential entropy
    Maximization
    Divergence (linguistics)
    Citations (0)
    In this paper, the problem of bearings-only tracking with a single sensor is studied via the theory of information geometry, where Fisher information matrix plays the role of Riemannian metric. Under a given tracking scenario, the Fisher information distance between two targets is approximately calculated over the window of surveillance region and is compared to the corresponding Kullback Leibler divergence. It is demonstrated that both "distances" provide a contour map that describes the information difference between the location of a target and a specified point. Furthermore, an analytical result for the optimal heading of a given constant speed sensor is derived based on the the properties of statistical manifolds.
    Tracking (education)
    Citations (12)
    Variance and Fisher information are ingredients of the Cramér-Rao inequality. We regard Fisher information as a Riemannian metric on a quantum statistical manifold and choose monotonicity under coarse graining as the fundamental property of variance and Fisher information. In this approach we show that there is a kind of dual one-to-one correspondence between the candidates of the two concepts. We emphasize that Fisher information is obtained from relative entropies as contrast functions on the state space and argue that the scalar curvature might be interpreted as an uncertainty density on a statistical manifold.
    Fisher information
    Statistical manifold
    Manifold (fluid mechanics)
    Fisher kernel
    Citations (126)
    We show that an information theoretic distance measured by the relative Fisher information between canonical equilibrium phase densities corresponding to forward and backward processes is intimately related to the gradient of the dissipated work in phase space. We present a universal constraint on it via the logarithmic Sobolev inequality. Furthermore, we point out that a possible expression of the lower bound indicates a deep connection in terms of the relative entropy and the Fisher information of the canonical distributions.
    Fisher information
    Kullback–Leibler divergence
    Information Theory
    Citations (15)
    The Fisher-Rao metric from Information Geometry is related to phase transition phenomena in classical statistical mechanics. Several studies propose to extend the use of Information Geometry to study more general phase transitions in complex systems. However, it is unclear whether the Fisher-Rao metric does indeed detect these more general transitions, especially in the absence of a statistical model. In this paper we study the transitions between patterns in the Gray-Scott reaction-diffusion model using Fisher information. We describe the system by a probability density function that represents the size distribution of blobs in the patterns and compute its Fisher information with respect to changing the two rate parameters of the underlying model. We estimate the distribution non-parametrically so that we do not assume any statistical model. The resulting Fisher map can be interpreted as a phase-map of the different patterns. Lines with high Fisher information can be considered as boundaries between regions of parameter space where patterns with similar characteristics appear. These lines of high Fisher information can be interpreted as phase transitions between complex patterns.
    Fisher information
    Fisher equation
    Fisher kernel
    Information Theory
    Statistical Mechanics
    The Kullback-Leibler distance between two probability densities that are parametric perturbations of each other is related to the Fisher information. We generalize this relationship to the case when the perturbations may not be small and when the two densities are non-parametric. Index Terms Kullback-Leibler distance, Fisher information EDICS: 2-INFO
    Kullback–Leibler divergence
    Fisher information
    Citations (22)
    The nature of Bayesian Ying-Yang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum description length (MDL), Bayesian approach, the bit-back based MDL, Akaike information criterion (AIC), maximum likelihood, information geometry, Helmholtz machines, and variational approximation. Moreover, a generalized projection geometry is introduced for further understanding such a new mechanism. Furthermore, new algorithms are also developed for implementing Gaussian factor analysis (FA) and non-Gaussian factor analysis (NFA) such that selecting appropriate factors is automatically made during parameter learning.
    Akaike information criterion
    Minimum description length
    Bayesian information criterion
    Information Theory
    Citations (53)
    Shape matching plays a prominent role in the comparison of similar structures. We present a unifying framework for shape matching that uses mixture models to couple both the shape representation and deformation. The theoretical foundation is drawn from information geometry wherein information matrices are used to establish intrinsic distances between parametric densities. When a parameterized probability density function is used to represent a landmark-based shape, the modes of deformation are automatically established through the information matrix of the density. We first show that given two shapes parameterized by Gaussian mixture models (GMMs), the well-known Fisher information matrix of the mixture model is also a Riemannian metric (actually, the Fisher-Rao Riemannian metric) and can therefore be used for computing shape geodesics. The Fisher-Rao metric has the advantage of being an intrinsic metric and invariant to reparameterization. The geodesicâcomputed using this metricâestablishes an intrinsic deformation between the shapes, thus unifying both shape representation and deformation. A fundamental drawback of the Fisher-Rao metric is that it is not available in closed form for the GMM. Consequently, shape comparisons are computationally very expensive. To address this, we develop a new Riemannian metric based on generalized \phi-entropy measures. In sharp contrast to the Fisher-Rao metric, the new metric is available in closed form. Geodesic computations using the new metric are considerably more efficient. We validate the performance and discriminative capabilities of these new information geometry-based metrics by pairwise matching of corpus callosum shapes. We also study the deformations of fish shapes that have various topological properties. A comprehensive comparative analysis is also provided using other landmark-based distances, including the Hausdorff distance, the Procrustes metric, landmark-based diffeomorphisms, and the bending energies of the thin-plate (TPS) and Wendland splines.
    Fisher information
    Statistical manifold
    Fisher kernel
    Riemannian Geometry
    Citations (32)
    This paper presents the Bayes Fisher information measures, defined by the expected Fisher information under a distribution for the parameter, for the arithmetic, geometric, and generalized mixtures of two probability density functions. The Fisher information of the arithmetic mixture about the mixing parameter is related to chi-square divergence, Shannon entropy, and the Jensen-Shannon divergence. The Bayes Fisher measures of the three mixture models are related to the Kullback-Leibler, Jeffreys, Jensen-Shannon, Rényi, and Tsallis divergences. These measures indicate that the farther away are the components from each other, the more informative are data about the mixing parameter. We also unify three different relative entropy derivations of the geometric mixture scattered in statistics and physics literatures. Extensions of two of the formulations to the minimization of Tsallis divergence give the generalized mixture as the solution.
    Fisher information
    Kullback–Leibler divergence
    Divergence (linguistics)
    Information Theory
    Tsallis entropy
    Bayes error rate
    Citations (26)