In statistics and information geometry, divergence or a contrast function is a function which establishes the 'distance' of one probability distribution to the other on a statistical manifold. The divergence is a weaker notion than that of the distance, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality. In statistics and information geometry, divergence or a contrast function is a function which establishes the 'distance' of one probability distribution to the other on a statistical manifold. The divergence is a weaker notion than that of the distance, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality. Suppose S is a space of all probability distributions with common support. Then a divergence on S is a function D(· || ·): S×S → R satisfying The dual divergence D* is defined as Many properties of divergences can be derived if we restrict S to be a statistical manifold, meaning that it can be parametrized with a finite-dimensional coordinate system θ, so that for a distribution p ∈ S we can write p = p(θ). For a pair of points p, q ∈ S with coordinates θp and θq, denote the partial derivatives of D(p || q) as Now we restrict these functions to a diagonal p = q, and denote By definition, the function D(p || q) is minimized at p = q, and therefore where matrix g(D) is positive semi-definite and defines a unique Riemannian metric on the manifold S. Divergence D(· || ·) also defines a unique torsion-free affine connection ∇(D) with coefficients