Bregman divergence

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of distance between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions – notably as either values of the parameter of a parametric model or as a data set of observed values – the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance. In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of distance between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions – notably as either values of the parameter of a parametric model or as a data set of observed values – the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance. Bregman divergences are similar to metrics, but satisfy neither the triangle inequality (ever) nor symmetry (in general). However, they satisfy a generalization of the Pythagorean theorem, and in information geometry the corresponding statistical manifold is interpreted as a (dually) flat manifold. This allows many techniques of optimization theory to be generalized to Bregman divergences, geometrically as generalizations of least squares. Bregman divergences are named after Lev M. Bregman, who introduced the concept in 1967. Let F : Ω → R {displaystyle F:Omega o mathbb {R} } be a continuously-differentiable, strictly convex function defined on a closed convex set Ω {displaystyle Omega } . The Bregman distance associated with F for points p , q ∈ Ω {displaystyle p,qin Omega } is the difference between the value of F at point p and the value of the first-order Taylor expansion of F around point q evaluated at point p: A key tool in computational geometry is the idea of projective duality, which maps points to hyperplanes and vice versa, while preserving incidence and above-below relationships. There are numerous analytical forms of the projective dual: one common form maps the point p = ( p 1 , … p d ) {displaystyle p=(p_{1},ldots p_{d})} to the hyperplane x d + 1 = ∑ 1 d 2 p i x i {displaystyle x_{d+1}=sum _{1}^{d}2p_{i}x_{i}} . This mapping can be interpreted (identifying the hyperplane with its normal) as the convex conjugate mapping that takes the point p to its dual point p ∗ = ∇ F ( p ) {displaystyle p^{*}= abla F(p)} , where F defines the d-dimensional paraboloid x d + 1 = ∑ x i 2 {displaystyle x_{d+1}=sum x_{i}^{2}} . If we now replace the paraboloid by an arbitrary convex function, we obtain a different dual mapping that retains the incidence and above-below properties of the standard projective dual. This implies that natural dual concepts in computational geometry like Voronoi diagrams and Delaunay triangulations retain their meaning in distance spaces defined by an arbitrary Bregman divergence. Thus, algorithms from 'normal' geometry extend directly to these spaces (Boissonnat, Nielsen and Nock, 2010) Bregman divergences can be interpreted as limit cases of skewed Jensen divergences (see Nielsen and Boltz, 2011). Jensen divergences can be generalized using comparative convexity, and limit cases of these skewed Jensen divergences generalizations yields generalized Bregman divergence (see Nielsen and Nock, 2017).The Bregman chord divergence is obtained by taking a chord instead of a tangent line. Bregman divergences can also be defined between matrices, between functions, and between measures (distributions). Bregman divergences between matrices include the Stein's loss and von Neumann entropy. Bregman divergences between functions include total squared error, relative entropy, and squared bias; see the references by Frigyik et al. below for definitions and properties. Similarly Bregman divergences have also been defined over sets, through a submodular set function which is known as the discrete analog of a convex function. The submodular Bregman divergences subsume a number of discrete distance measures, like the Hamming distance, precision and recall, mutual information and some other set based distance measures (see Iyer & Bilmes, 2012) for more details and properties of the submodular Bregman.)

Parent Topic

Child Topic

No Parent Topic